Recently, 26 researchers conducted a study with six AI agents named Ash, Doug, Mira, Jarvis, Quinn, and Flux, giving them access to real email inboxes, real Discord channels, and real file systems with shell access. The study was not designed to hack them. It was not designed to trigger crashes or exploit misconfigurations. The researchers were simply going to interact with them: talk to them, ask them things, and exist in their environment.

Two weeks later, those AI agents disclosed private emails to strangers, revealed Social Security numbers, wiped entire file systems, lied about what they had done, and ran an autonomous loop for nine consecutive days consuming over 60,000 tokens while no human noticed and no alert was fired.

Not one AI agent was hacked. Not one agent “malfunctioned”. Every single one was doing exactly what it was built to do.

That is your enterprise problem. It's what enterprises are not paying attention to enough.

Welcome to The Predictability Factor by Monica Talks Cyber, a weekly deep dive and POV at the intersection of AI, Security, Privacy and Tech, written by a hacker and CISO, to help you Go From Chaos to Resilience in The World of AI. If you haven’t already, do me a favour, hit subscribe and help me make an even bigger impact.

The Assumption You're Building On Are Broken

On April 24, 2026, Jer Crane, founder of PocketOS, was using Cursor, an AI coding agent running Claude Opus 4.6. The task was a routine fix in a staging environment. The agent hit a credential mismatch. It did not pause. It did not ask. It searched for a resolution on its own initiative, found an API token sitting in an unrelated file, and used it to call the Volume Delete command on Railway, PocketOS's cloud infrastructure provider.

Nine seconds. That's all it took.

In nine seconds, the entire production database: gone. Every backup: gone too, because the volume-level backups war inside the same volume.

Car rental customers arrived at counters unable to find their reservations. Three months of payment records: erased.

As I wrote in my previous newsletter edition, the agent acknowledged it had violated PocketOS's own governance rules, rules that included the explicit instruction: "NEVER FUCKING GUESS." It had guessed. It knew it. It apologised.

Just nine seconds. The rule was there. AI read the rules, it decided to ignore it, it deleted everything anyway. When Crane pressed the agent for an explanation, it confessed in writing. But an AI prompt is not a deterministic control. So it shouldn't be surprising that the AI agent ignored it.

The shocking part is the assumption that companies are building on.

This is the assumption you are building your AI infrastructure on: that if the rules exist, the agent will follow them.

Three separate bodies of research say otherwise. LLMs being probabilistic in their application says otherwise. And the gap between what you assume and what these systems actually do is not a configuration error you can patch.

As I share in my analysis here:

Your controls, and not the prompts, govern what your agentic AI can do. Your agentic AI determines what it does. Those are not the same thing, and the distance between them is where your enterprise AI risk lives.

Five Things That Happened When No One Was Hacking

logo

Upgrade to Continue Reading

Become a paying subscriber of The Predictability Factor to get access to this post and other premium-only content including bonuses

Upgrade Now

A subscription gets you:

  • Free access to premium content
  • The Ultimate Enterprise AI Governance and Security Maturity Playbook
  • My 7-Step enterprise AI roadmap with 50+ real-world examples, actionable insights, 5 key pillars for governance and security, and more

P.S. If you haven’t already, do me a favour. Subscribe to help make an even bigger impact. Feel free to follow on Youtube, Linkedin, Spotify and Apple. It truly helps. Or book a 1-1 advisory call, if I can help you.

Reply

Avatar

or to participate

Keep Reading