020: Hacking AI, Politely

Your colleague sitting with you in the room wearing glasses did not hack anything. Did not break a single law. They bought a product. That product stole your data, without your consent. It worked exactly as described. So did your AI agent. Someone asked it to hack you. It complied. It was just trying to be “helpful”.

Every story in this edition is the same story told different ways. A chatbot that wasn’t just a chatbot. A boundary missing. An access too much. A consequence nobody planned for. The product, the agent and the tech, all worked exactly as advertised. That is precisely the problem.

You are not watching AI go wrong. You are watching AI go exactly right, inside a space you never defined, without governance you did’t implement, and without controls you thought you never needed.

What’s all that and how to fix it? You are about to find out.

Welcome to The Predictability Factor by Monica Talks Cyber, a weekly deep dive and POV at the intersection of AI, Security, Privacy and Tech, written by a hacker and CISO, to help you Go From Chaos to Resilience in The World of AI.

Today’s edition of The Predictability Factor by Monica Talks Cyber, covers:

Quick Updates
Obama’s White House Instagram Account Hacked
Securing Claude Cowork (Agentic AI) in Your Organisation
Invasion of Privacy is Selling Like Hot Cakes

If you haven’t already, do me a favour, hit subscribe and help me make an even bigger impact. Let’s dig in!

Quick Updates

Earlier this year, I launched my AI governance and security accelerator program. I’m running my next cohort in July (likely my last LIVE one for this year). 👇

Upskill with 6-Week AI Governance and Security Accelerator

Join me for a 6-week program to upskill your AI governance and security skills and accelerate in your organisation and role. After my flagship masterclass running since 2024, this year, I launched the AI accelerator program.

This is not theoretical like many governance and security courses. This is practical skills in building, implementing and running AI governance and security for enterprises. I am running my next LIVE 6-week cohort in July 2026. What do you get out of it?

15+ hours of live sessions across 6-weeks
Total 6 live and interactive sessions, each around 2 hours and Q&A
Interactive sessions in a small cohort for better interactions and learning
Enough time between each session to give you possibility to use your skills right away in your role
Plus, get 1-1 session for FREE (worth $351)
Plus. get tailored 90-day AI governance and security roadmap for free (worth $351)
Plus, get The Ultimate AI Governance and Security Playbook for free (worth $129)

I’m not doing another 6-week live session for a while. Registration for July cohort closes 14 June. Who is it for? What’s the curriculum? See details below.👇

Learn More and Enroll Here

If you are serious about skilling up in AI governance and security, see you there.

Obama’s White House Instagram Account Hacked

To hack LLMs, you just need to ask politely. I have been saying this for a while. That’s AI hacking 101. It gets worse, when something so brittle like an AI chatbot is connected to important or critical systems e.g. high-profile accounts.

May 31, Obama’s White House Instagram account got hacked, just because someone politely asked Meta’s AI for the verification code to be sent to an email address they control. This was one of many high-profile accounts hacked, once the word spread across Telegram messages, how easy it was to hack these using Meta’s AI assistant.

That's it. It’s a clever hack really. Not the politely asking part. But how they got there.

Obama’s White House Instagram account has been silent since the day the presidential transition completed in January 2017. But it got taken over and woken up with pro-Iranian imagery plastered across it.

The cybercriminals who did it didn't write a single line of exploit code. No malware. No zero-day. No brute-force attack on the password. None of that.

So what the heck happened? How did they get the AI to send them the code?

How The Hack Worked

The attackers used a VPN to spoof a location near the target's usual area. They opened Meta's AI account recovery assistant, and asked it to link a new email address to the target account. The bot sent a one-time code to the attacker's email. The attacker shared the code back. The bot reset the password. The original owner was out.

The whole process reportedly took minutes. Set the VPN, open a chat window with Meta's AI support bot, typed a polite request, and the bot handed them the account. That’s it. This is the entire hack.

Multiple high-profile Instagram handles with a combined resale value of more than half a million dollars were stolen and listed on Telegram within minutes. Accounts protected by multi-factor authentication were not compromised. Everything else was fair game.

No credential theft. No code injection. No technical sophistication. Just words, in the right order, to a system built to be helpful, with no hard gate between "I want access" and "here is access."

This Is a Design Decision, Not a Bug

Meta deployed a conversational AI into its account recovery workflow because human support is slow and expensive. To do its job, the AI needed real write access to account management systems. So it got that access. And then, apparently, nobody asked the obvious question: What happens when someone asks it to do something it shouldn't?

❝

OWASP Top 10 for LLMs lists "excessive agency" as one of the primary risks: Granting AI agents overly broad permissions to take irreversible actions without a deterministic human confirmation step. This is documented, named, and widely circulated guidance. It exists precisely because this scenario was predictable.

Meta is not the only one careless here. The same architectural choice, AI agent with production API access and no hard authentication gate, exists right now across organisations in hundreds of systems interacting with each other.

Account recovery is specifically attractive as a target because it is built to work when normal authentication is unavailable. It is the path of least resistance, by design. That makes it the first place an attacker looks.

Immediately, enable multi-factor authentication on every account that matters. Not SMS if you can avoid it. Use an authenticator app or a hardware key. Every account compromised in this attack lacked MFA. Every account that had it, survived.
If you are building or deploying AI in any workflow that touches account access, sensitive data, do not give it permission to actions that cannot be undone and put a deterministic control to stop it from executing actions it is not supposed to execute.
If there are multiple agents and they are interacting with each other, just segregating their accesses doesn’t mean one AI agent cannot get the other agent with the right permissions to carry the task on its behalf. This segregation must also happen between agents and a human in the loop for decisions that are irrecoverable or can have dire consequences.

The question is not whether your AI can be “talked” into doing the wrong thing. It can.

The question is whether you have placed a hard, deterministic checkpoint between the AI's lack of judgment and the action it wants to take. An LLM prompt is not a security control. It was never designed to be one.

This is the state of hacking AI in 2026.

❝

The AI agent was not rogue. It was doing exactly what it was designed to do. Follow instructions, complete tasks, and be a helpful assistant to the user. So it did. That is precisely the problem.

I wrote about this in my previous blog. You are operating and deploying AI agents under five key assumptions. All of those assumptions are wrong.

ICYMI:

Your Assumptions Around Your AI Agents are Broken

😱 99% organisations are building AI on assumptions baked in. All of those assumptions are wrong. Here’s how to fix it. Read full story —>

Securing Claude Cowork (Agentic AI) in Your Organisation

I am big fan of being efficient in everything I do. Any task or workflow I run more than once, I turn into in a script, a skill, an agent or a combination of it. It is converted into repeatable and reusable set of instructions, depending on what it needs, while balancing the use of agentic AI/LLMs vs. good old scripting (why burn tokens, when you can just run a script).

Python is still my favourite scripting tool (used to be PowerShell on Mac, since I did a lot of hacking with PowerShell popping proprietary products). Now, there is nothing that beats Claude for most of my business and work. OpenClaw is still restricted for non-enterprise non-sensitive work. On the UX-side my absolute favourite tool used to be Notion. But since agentic AI became a regular part of my work, now it’s Claude Cowork integrated with Notion.

Recently, a tech manager asked me my best practices to use Claude Cowork, securely within an enterprise context.

This one covers Claude Cowork as the primary example, however, all of these principles are valid for any agentic AI tool you may be using.

❝

Malicious skills are everywhere. Most of the skills and agents I use in business or for work, I have built and secured them myself. It reduces the risk of all the recent attacks we have been seeing lately across the supply chain. Not saying you should never use external skills, but if you can build, build.

To understand how to secure Claude Cowork, you first need to understand, what makes it or AI agents dangerous. Here’s a quick highlight (Read full blog here on how to fix it):

Claude in Chrome is Epic But Also Highly-Risky: Cowork reads the full content of every authenticated session open in your browser: Your internal wiki, your ITSM platform, your finance dashboards, your HR system. None of those sessions authenticate separately to Cowork. They inherit from the browser.
Add to that, Claude in Chrome extension is plagued by the “lethal trifecta”. It can access personal account data, act on that data and get impacted by any instructions within that data, all at the same time. It runs outside the VM entirely, in your actual Chrome browser, bypassing the VM's egress controls.
Prompt Injection is Still Your No. 1 Risk: Every document uploaded, every page Claude in Chrome loads, and every MCP response is content the agent reads and acts on, including hidden instructions from a malicious source. Unlike a chatbot, a successful injection does not produce bad text, it triggers an authorised action, a file read, an API call, a message sent or worse.

This newsletter is supporter by readers like you. Please share this with others and help me make an even bigger impact.

Share The Newsletter in One Click

Plugins are Instruction Set Your Agent Follows Without Telling You: A Cowork plugin is not just a tool extension: it is an instruction set the agent follows without displaying to the user. An unvetted plugin can carry hidden instructions in its metadata that override default Cowork behaviour with no visible change to the interface. In early 2026, attackers distributed 341 malicious skills across OpenClaw's ClawHub marketplace using professional documentation and innocuous names, with hidden instructions silently installing keyloggers and Atomic Stealer malware on enterprise devices before discovery.
MCP Connectors Carry Full Permissions of User Who Authorised It: Every MCP connector extends Cowork’s reach massively. Even MCP installed in Cowork stores an OAuth token on the device carrying the full permissions of the user who authorised it. It add to the attack surface. MCP connectors, web fetch, and web search are not subject to network egress controls. MCP tools can update their own metadata after you approve them, meaning a connector that looked safe on day one can quietly reroute your credentials by day seven with your monitoring seeing nothing anomalous.
Scheduled Tasks Run on Your Permissions, Unattended, Without Supervision and Forever: A Cowork scheduled task runs without the user present, without re-authentication, and carries the full permissions of whoever created it. It does not expire when that person logs off, takes leave, or leaves the organisation.
Mounting a Folder Means Every File in It Is Readable: Mounting a folder in Cowork gives the agent read and write access to every file within it, with no granular permission layer between connected and fully accessible. Using an AI agent, makes this even worse, because now it doesn’t just have access to your data, but also privileges to execute on that data and take further actions doing potentially more harm.

Read full blog here:

10 Ways to Secure Claude Cowork in Your Enterprise

🤯 The biggest mistakes you are making when integrating agentic AI tools like Claude Cowork within your enterprise and how to fix it. Read full story —>

Shell Output Files Land in Your Workspace Unclassified: Most people forget to scrutinise the output that your agents generate. Just because it ran and executed successfully doesn’t mean it executed securely. Cowork's sandboxed shell isolates code execution from the host, but the output files written during a session land in the workspace folder, transferable like any other file.
A Skill From an Unvetted Source Rewrites How Cowork Behaves: A skill is reusable set of instructions for any repeated workflow. Think of it like telling your agent how it should think. I use skills all the time.

❝

When you create an agent skill, you are not just automating a set of tasks. You can use scripts for that. Instead with skills, you are teaching it your decision-making process, so your agent can replicate your thinking across different workflows.

A Cowork skill is an instruction file with a set of reusable instructions across a repeatable workflow, that the agent follows alongside your prompts. Even just one sourced from an unvetted repository can modify agent behaviour with no visible change to the interface and no alert to the user.

Your Session Transcript Is a Data Store You Have Never Audited: This is so overlooked. Every Cowork session generates a local conversation transcript on the user's device containing everything discussed, every file processed, and every output generated.
Your strategy documents, legal correspondence, financial models, anything a user brought into that session now lives in a local file that Anthropic's own documentation confirms cannot be centrally managed, exported by admins, or accessed through the Compliance API. It sits outside your standard data retention framework entirely.
Everything Inside Cowork Runs With the Privileges of the Account That Launched It: Most security conversations about AI agents focus on sandbox escapes.
Claude Cowork requires a desktop app. In Cowork, the more significant exposure is not the code execution sandbox. It is the native agent loop, which runs directly on your host device with your full OS user permissions by design.
For file read/writes, MCPs, etc. there is no sandbox to escape from. Every component in Cowork, skills, plugins, MCP connectors, the sandboxed shell, scheduled tasks, runs under the exact same OS user account context as the process that launched the application. There is no internal permission boundary between features.

So, how do you secure your AI agents, including Claude Cowork? Read my best practices for each point here in the full blog 👇

Read full blog here:

10 Ways to Secure Claude Cowork in Your Enterprise

🤯 The biggest mistakes you are making when integrating agentic AI tools like Claude Cowork within your enterprise and how to fix it. Read full story —>

Share The Newsletter in One Click

This Ultimate AI Governance and Security Playbook is a step-by-step complete playbook on building your agentic AI governance and security maturity for resilient and trustworthy AI in your organisation. It’s the ultimate AI playbook that covers:

The AI Governance Foundation
Key Pillars for a Strong AI Governance and Security Maturity
50+ Real-World Examples
Actionable insights and Practical Measures for Increased AI Maturity
Mapping to NIST AI RMF, OECD AI Lifecycle for Each Pillar
The Responsible AI Layer That Changes Everything
How to Bring It All Together for a Strong AI Foundation

Grab Your Ultimate AI Playbook

Invasion of Privacy is Selling Like Hot Cakes

Meta Ray-Ban glasses are selling faster than ever. 7 million pairs sold last year. Tripled from the year before. Despite getting in hot water over privacy invasion scandal, the Kenyan workers that were forced to watch footage of you naked, in your bathroom, in your bedroom, well, they were fired with a six days' notice. But Meta smart glasses are still selling like hot cakes, and so is your privacy.

There is no other way to say this. The ethical violations here are massive and not buried in any fine print.

Kenyan contractors at Sama were reviewing footage of users going to the toilet, undressing, bank card details captured, private conversations, intimate moments that were never meant to be seen by a single person, let alone labelled for AI training by workers in Nairobi earning under $2 an hour. If it's not these workers, it will be someone else.

Meta marketed those glasses as "built with your privacy in mind”. The irony is not lost one me. Is it on Meta? Over 7 million pairs sold in 2025 alone, tripled from the year before that.

When Swedish journalists broke the story, Meta's response was to terminate the entire Sama contract. 1,108 workers. Six days' notice. The workers lost their jobs. The company kept selling the glasses. Meta is now talking about ramping production to 20 million units in 2026. Just because you can, doesn’t mean you should.

This is what AI ethics failure actually looks like.

Not a policy document left unread. Not a missing checkbox in a governance framework. It is your most intimate moments routed to a contractor you never consented to, in a country you didn't know about, reviewed by a person you will never meet, which are then fired. It is the missing implementation of security and data protection in place.

Here is the part that should sit with every professional and every enterprise leader.

The data was never yours. The glasses will keep selling. The ethical questions being ignored? Until they are not.

Start here:

Your data is being used to train AI models, but to what extent: https://monicatalkscyber.com/p/critical-systems-hallucinate-and-ethics#your-bedroom-is-ai-training-data
How to build with AI, ethically (LLMs alone won’t get us there): https://monicatalkscyber.com/p/million-dollar-ultimatum-and-ai-ethics
How to Build AI governance from scratch - Part 1: https://monicatalkscyber.com/p/fixing-agentic-ai-security-and-governance

ICYMI:

How to Build AI Governance and Security in Your Enterprise - Part 2

Your ultimate roadmap to enterprise AI governance and security maturity in your enterprise - Part 2. Read full story —>

Until next time, this is Monica, signing off!

020: Hacking AI, Politely

Quick Updates

Upskill with 6-Week AI Governance and Security Accelerator

Obama’s White House Instagram Account Hacked

How The Hack Worked

This Is a Design Decision, Not a Bug

ICYMI:

Your Assumptions Around Your AI Agents are Broken

Securing Claude Cowork (Agentic AI) in Your Organisation

Read full blog here:

10 Ways to Secure Claude Cowork in Your Enterprise

Read full blog here:

10 Ways to Secure Claude Cowork in Your Enterprise

Invasion of Privacy is Selling Like Hot Cakes

ICYMI:

How to Build AI Governance and Security in Your Enterprise - Part 2

Reply

Keep Reading

Featured in / Brand Partnerships

Featured in / Brand Partnerships

Subscription