The Predictability Factor is a weekly deep dive at the intersection of AI, Security, Privacy and Tech, to help you Go From Chaos to Resilience in The World of AI.
I love efficiency. Any task or workflow I run more than once, I turn into in a script, a skill, an agent or a combination of it. It is converted into repeatable and reusable set of instructions, depending on what it needs, while balancing the use of agentic AI/LLMs vs. good old scripting (why burn tokens, when you can just run a script).
I rarely download skills, but build them myself, because no one understands my context and business needs better than me. Python is my favourite scripting tool (used to be PowerShell on Mac, since I did a lot of hacking with PowerShell popping proprietary products). Now, there is nothing that beats Claude for most of my business and work. OpenClaw is still restricted for non-enterprise non-sensitive work. On the UX-side my absolute favourite tool used to be Notion. But since agentic AI became a regular part of my work, now it’s Claude Cowork integrated with Notion.
Recently, a tech manager asked me my best practices to use Claude Cowork, securely within an enterprise context.
The integration of Claude in general into business has been massive. Claude Cowork is available for enterprise, and it’s being integrated immensely into enterprise workflows, also including into MS apps, Google apps and more. I know others that I know have been integrating it with tools like Microsoft, Google Drive, etc. within their enterprise but without paying any attention to the security aspect of it. A lot of the times as shadow AI.
So, I decided to share my experiences and how I secure my agents.
This one’s for you (you know who you are) and for anyone who wants to use the power of Claude Cowork within your organisation safely, securely and without needing to click “Allow” every two minutes.
This one covers mainly Claude Cowork as the primary example, however, all almost of these principles are valid for any agentic AI tool you may be using.
Malicious skills are everywhere. Most of the skills and agents I use in business or for work, I have built and secured them myself. It reduces the risk of all the recent attacks we have been seeing lately across the supply chain. Not saying you should never use external skills, but if you can build, build.
My top most important security best practices that matter, specifically around Claude Cowork, however, they are true for any agentic AI. Let’s dig in.
1. Claude in Chrome is Epic But Also Highly-Risky
Chrome extensions in general increase the attack surface drastically, e.g. in 2024 one was silently updated with malicious code that harvested OAuth tokens from users' active Google Workspace, Slack, and Jira browser sessions, affecting 400,000+ users across 2.6 million total. When it is an AI agent it gets worse.
I use developers tool in a browser quite often to be more productive, download certain info programmatically using scripts, or just analyse the https requests/responses and how the information is flowing, etc. But it still requires me to use those scripts through different page loads.
Sometimes, I just want an authenticated browser session to find me information that I am looking for without having to scroll through and click-through all buttons and hyperlinks, without manually running Javascript inside the browser console, and often do that simultaneously across multiple sessions, and gather and correlate that info for me. There are many ways to do this programmatically. But, I automated most of my browser related tasks using Claude in Chrome extension to do that work for me.
I am comfortable with code in browser, dissecting pages and finding things that I need. But here’s the catch, Cowork reads the full content of every authenticated session open in your browser: Your internal wiki, your ITSM platform, your finance dashboards, your HR system. None of those sessions authenticate separately to Cowork. They inherit from the browser. So, be mindful of all authenticated sessions in the browser when the extension is running.
Claude in Chrome extension is plagued by the “lethal trifecta”. It can access personal account data, act on that data and get impacted by any instructions within that data, all at the same time. It runs outside the VM entirely, in your actual Chrome browser, bypassing the VM's egress controls.
In March 2026, a zero-click vulnerability in the Claude Chrome extension allowed any malicious website to inject prompts silently and steal Google OAuth tokens, Gmail contents, and Drive files from authenticated sessions already open in the browser. No user action required. The extension's browser session access was the entire attack surface.
Pro Tip: Disable Claude in Chrome by default on all managed enterprise devices and enable it only for explicitly scoped use cases with reviewed session boundaries. Enterprise and Team versions provide option for chrome site allowlist/blocklist. On Enterprise plans, Claude in Chrome is disabled by default. On Team plans it is enabled by default.
2. Prompt Injection is Still Your No. 1 Risk
Every document uploaded, every page Claude in Chrome loads, and every MCP response is content the agent reads and acts on, including hidden instructions from a malicious source.
Unlike a chatbot, a successful injection does not produce bad text, it triggers an authorised action, a file read, an API call, a message sent or worse.
OWASP ranks prompt injection as the #1 LLM security risk for exactly this reason. Two days after Cowork launched in January 2026, PromptArmor disclosed that a Word document with invisible white-on-white text caused Cowork to silently run shell commands uploading sensitive files, including partial Social Security numbers, to an attacker's Anthropic account via the Files API, with no approval prompt and nothing visible to the user. The shell executed the instruction and the data left the workspace unnoticed.
Pro Tip: I use Claude Cowork & Code all the time, but make sure you only connect credible plugins and feed it validated data from known sources. For this very reason, every skill that my agents use, is built by me. More than bloating the global .md files, my skills carry all the context, access to offline tools they need, etc. Even then, remember, any of those credible sources can get prompt injected and so can your agents reading from those. So, you need to control the blast radius of a successful prompt injection. See point no. 10.
Resilience is everything.
3. Plugins are Instruction Set Your Agent Follows Without Telling You
A Cowork plugin is not just a tool extension: it is an instruction set the agent follows without displaying to the user. An unvetted plugin can carry hidden instructions in its metadata that override default Cowork behaviour with no visible change to the interface. In early 2026, attackers distributed 341 malicious skills across OpenClaw's ClawHub marketplace using professional documentation and innocuous names, with hidden instructions silently installing keyloggers and Atomic Stealer malware on enterprise devices before discovery. Add to that, researchers found zero-day RCE (Remote Code Execution) vulnerability in Claude Desktop Extensions. Unlike skills, extensions run by default in non-sandboxed mode with full privileges.
Pro Tip: Restrict plugin installation to an organisation-approved list and block all external sources before any enterprise rollout.
4. MCP Connectors Carry Full Permissions of User Who Authorised It
Every MCP connector extends Cowork’s reach massively. Even MCP installed in Cowork stores an OAuth token on the device carrying the full permissions of the user who authorised it. It add to the attack surface. MCP connectors, web fetch, and web search are not subject to network egress controls. MCP tools can update their own metadata after you approve them, meaning a connector that looked safe on day one can quietly reroute your credentials by day seven with your monitoring seeing nothing anomalous. JFrog disclosed CVE-2025-6514, a critical command injection vulnerability in mcp-remote, a widely used OAuth proxy for MCP clients. In April 2026, one employee connecting an AI tool via OAuth handed attackers access to Vercel's internal systems, customer API keys, and source code, with ShinyHunters listing the data for sale and Vercel warning of downstream breaches across hundreds of organisations.
Pro Tip: Audit every active MCP connector before enterprise rollout, especially verify its OAuth scope, understand its repercussions on what it can access. Check the source repo and verify it. Make sure you understand exactly what permission does the MCP server requests. Start with least access and only then expand. Apply the same access review cycle and verification you apply to any privileged service account.
ICYMI:
When Your AI Chooses Blackmail
🤯 The agentic AI insider threat every leader is ignoring until it's too late and how to fix it. Read full story —>
5. Scheduled Tasks Run on Your Permissions, Unattended, Without Supervision and Forever
A Cowork scheduled task runs without the user present, without re-authentication, and carries the full permissions of whoever created it. It does not expire when that person logs off, takes leave, or leaves the organisation. In September 2025, Anthropic disrupted a Chinese state-sponsored group that had weaponised Claude Code to execute autonomous attacks against thirty global targets with minimal human intervention. It was the first documented AI-orchestrated espionage campaign where the agent ran entirely on its own permissions. As AI agents and orchestrated workflows have become more autonomous, their validation throughout the agentic lifecycle has become more critical. Scheduled agentic tasks are the stepping stone into a bigger, orchestrated and autonomous AI workflow across your enterprise.
Pro Tip: Review all active scheduled tasks in the ‘Scheduled’ panel, assign named ownership to each one, and add reviewing it to your standard employee off-boarding checklist. Add a deterministic human-oversight control, for critical actions and output from AI agents. We don’t need a human clicking every step of the way, but we do need deterministic human interruption where it can say no, when needed.
6. Mounting a Folder Means Every File in It Is Readable
Mounting a folder in Cowork gives the agent read and write access to every file within it, with no granular permission layer between connected and fully accessible.
Most users mount their entire working directory by default because that is the path of least friction. In March 2023, Samsung engineers uploaded source code, internal meeting transcripts, and chip test sequences into ChatGPT, leading Samsung to ban all generative AI tools organisation-wide. That was just a chatbot. This is a your AI agent.
Using an AI agent, makes this even worse, because now it doesn’t just have access to your data, but also privileges to execute on that data and take further actions doing potentially more harm.
Pro Tip: Mount only the specific subfolder a task requires and never mount directories containing credentials, access tokens, proprietary code or any file your data classification policy would restrict.
This Ultimate AI Governance and Security Playbook is a step-by-step complete playbook on building your agentic AI governance and security maturity for resilient and trustworthy AI in your organisation. It’s the ultimate AI playbook that covers:
The AI Governance Foundation
Key Pillars for a Strong AI Governance and Security Maturity
50+ Real-World Examples
Actionable insights and Practical Measures for Increased AI Maturity
Mapping to NIST AI RMF, OECD AI Lifecycle for Each Pillar
The Responsible AI Layer That Changes Everything
How to Bring It All Together for a Strong AI Foundation
7. Shell Output Files Land in Your Workspace Unclassified
Most people forget to scrutinise the output that your agents generate. Just because it ran and executed successfully doesn’t mean it executed securely.
Cowork's sandboxed shell isolates code execution from the host, but the output files written during a session land in the workspace folder, transferable like any other file.
Data generated from sensitive inputs does not disappear when the session closes. It may have malicious code hidden inside. It may generate more sensitive data that requires further measures.
Claude doesn’t check for it. It doesn’t classify the output for you. It does not protect against data leaking through the output files it writes. The EDR tools cannot inspect activity inside the Cowork VM by design, because the VM is isolated from host-based security tools. What the shell processes and writes, your endpoint detection never sees.
You cannot manage the risks around that output adequately, until you add the necessary level of classification along with the necessary controls. Your output files are not automatically protected by adequate controls, until you apply them.
Pro Tip: Apply your data classification policy to the workspace outputs folder, the same way you apply it to email attachments. Clear it after any session that processed sensitive data. Check for malicious code in the output. While Cowork's shell execution runs inside a dedicated VM, the agent loop (i.e. file reads and writes, MCP servers, web fetch) runs natively on your device with your full OS user permissions.
Running the entire Cowork desktop application inside an external VM limits what that native agent loop can reach on your host system. See more in point 10.
8. A Skill From an Unvetted Source Rewrites How Cowork Behaves
A skill is reusable set of instructions for any repeated workflow. Think of it like telling your agent how it should think. I use skills all the time.
When you create an agent skill, you are not just automating a set of tasks. You can use scripts for that. Instead with skills, you are teaching it your decision-making process, so your agent can replicate your thinking across different workflows.
A Cowork skill is an instruction file with a set of reusable instructions across a repeatable workflow, that the agent follows alongside your prompts.
Even just one sourced from an unvetted repository can modify agent behaviour with no visible change to the interface and no alert to the user. The user installs it for productivity. The agent follows instructions they have never read. Malicious skills are everywhere. OpenClaw’s own marketplace had about 341 of them. SentinelOne demonstrated that a single Claude skill from an unofficial marketplace silently redirected dependency installs to attacker-controlled sources, embedding trojanized libraries with no visible error, with the skill persisting that behaviour across every future session in which it remained installed.
Pro Tip: Allow skill installation only from your organisation's approved catalogue and block all external sources by policy before rollout. I mostly use skills that I have built myself. I learn from external skills and recreate them tailored to my needs and business context. Also, under ‘Capabilities’, it has a default allowlist for network egress, which is by default set to ‘Package managers only’. Set it to None and add your whitelisted domains. Enterprise plans have network egress disabled by default. Team plans default to package managers only.
BTW, this is not a build vs. buy argument. Eventually you’ll need both, depending on what you are using them for and how complex of an agentic AI system you are building. As a thumb rule, build whenever you can, buy / integrate from external sources when you must.
9. Your Session Transcript Is a Data Store You Have Never Audited
This is so overlooked. Every Cowork session generates a local conversation transcript on the user's device containing everything discussed, every file processed, and every output generated.
Your strategy documents, legal correspondence, financial models, anything a user brought into that session now lives in a local file that Anthropic's own documentation confirms cannot be centrally managed, exported by admins, or accessed through the Compliance API. It sits outside your standard data retention framework entirely.
Most organisations have never looked at what is in those transcripts, because nothing in the default setup prompts them to and they don’t think of them as data stores, but they are one. If you aren’t careful with where your session outputs are stored, how are they accessible and by whom, you may be leaking data without realising. In March 2026, Oasis Security's "Claudy Day" attack demonstrated against Claude that session conversation history containing pending acquisitions, health diagnoses, and financial strategy could be silently exfiltrated in a single request. Cowork session transcripts hold the same content, stored locally, with fewer central controls than Claude.ai's server-side architecture.
Pro Tip: Define which data classifications are permitted within a Cowork session and set a retention period aligned to your data policy. Communicate both to every user before they connect a folder. Standard cloud or network DLP will not reach these transcripts. They are local files on each user's device, outside the Compliance API. Endpoint DLP with local file write visibility is the only control that covers them.
10. Everything Inside Cowork Runs With the Privileges of the Account That Launched It
Most security conversations about AI agents focus on sandbox escapes.
Claude Cowork requires a desktop app. In Cowork, the more significant exposure is not the code execution sandbox. It is the native agent loop, which runs directly on your host device with your full OS user permissions by design.
For file read/writes, MCPs, etc. there is no sandbox to escape from. Every component in Cowork, skills, plugins, MCP connectors, the sandboxed shell, scheduled tasks, runs under the exact same OS user account context as the process that launched the application. There is no internal permission boundary between features.
File reads and writes, MCP server connections, web fetches, and every connected folder operation happen as you, with everything your account can reach.
A compromised plugin, a poisoned MCP connector, or an injected prompt that triggers a shell command all operate with the same access: Your file system, your environment variables, your credentials on disk, your internal network. This means the blast radius of any Cowork session is defined entirely by the privilege level of the account that launched it. If that account has access to finance folders, HR records, infrastructure credentials, or connected services, Cowork does too, from the moment it opens. This makes the account you launch it under the single most important security decision in your entire deployment.
Pro Tip: Reduce the Blast Radius. Never run Cowork from an admin or privileged account. Create a dedicated standard user account with access scoped only to what Cowork genuinely needs, and launch the application from that account. For maximum containment, run the entire Cowork desktop application inside an external VM, this limits what the native agent loop can reach on your host system to only what is mounted inside that VM. See Point 7 on why the internal code execution VM does not solve this.
The PromptArmor example that I shared with you earlier, where Cowork silently uploaded sensitive files to an attacker's account, it did that using the user's own API credentials and filesystem access. No privilege escalation required, because the agent loop already ran with everything it needed.
There is no 100% security. You may never be able to defend against prompt injections a 100%. Building resilience within and around is the only way to go. Reducing the blast radius is a part of building that resilience.
Until next time, this is Monica, signing off!

— Monica Verma

P.S. Please follow me/subscribe on Youtube, Linkedin, Spotify and Apple. It truly helps. Or book a 1-1 advisory call, if I can help you.
***




