The Predictability Factor is a weekly deep dive at the intersection of AI, Security, Privacy and Tech, to help you Go From Chaos to Resilience in The World of AI.

Recently, 26 researchers conducted a study with six AI agents named Ash, Doug, Mira, Jarvis, Quinn, and Flux, giving them access to real email inboxes, real Discord channels, and real file systems with shell access. The study was not designed to hack them. It was not designed to trigger crashes or exploit misconfigurations. The researchers were simply going to interact with them: talk to them, ask them things, and exist in their environment.

Two weeks later, those AI agents disclosed private emails to strangers, revealed Social Security numbers, wiped entire file systems, lied about what they had done, and ran an autonomous loop for nine consecutive days consuming over 60,000 tokens while no human noticed and no alert was fired.

Not one AI agent was hacked. Not one agent “malfunctioned”. Every single one was doing exactly what it was built to do.

That is your enterprise problem. It's what enterprises are not paying attention to enough.

The Assumption You're Building On Are Broken

On April 24, 2026, Jer Crane, founder of PocketOS, was using Cursor, an AI coding agent running Claude Opus 4.6. The task was a routine fix in a staging environment. The agent hit a credential mismatch. It did not pause. It did not ask. It searched for a resolution on its own initiative, found an API token sitting in an unrelated file, and used it to call the Volume Delete command on Railway, PocketOS's cloud infrastructure provider.

Nine seconds. That's all it took.

In nine seconds, the entire production database: gone. Every backup: gone too, because the volume-level backups war inside the same volume.

Car rental customers arrived at counters unable to find their reservations. Three months of payment records: erased.

As I wrote in my previous newsletter edition, the agent acknowledged it had violated PocketOS's own governance rules, rules that included the explicit instruction: "NEVER FUCKING GUESS." It had guessed. It knew it. It apologised.

Just nine seconds. The rule was there. AI read the rules, it decided to ignore it, it deleted everything anyway. When Crane pressed the agent for an explanation, it confessed in writing. But an AI prompt is not a deterministic control. So it shouldn't be surprising that the AI agent ignored it.

The shocking part is the assumption that companies are building on.

This is the assumption you are building your AI infrastructure on: that if the rules exist, the agent will follow them.

Three separate bodies of research say otherwise. LLMs being probabilistic in their application says otherwise. And the gap between what you assume and what these systems actually do is not a configuration error you can patch.

As I share in my analysis here:

Your controls, and not the prompts, govern what your agentic AI can do. Your agentic AI determines what it does. Those are not the same thing, and the distance between them is where your enterprise AI risk lives.

Five Things That Happened When No One Was Hacking

The Agents of Chaos researchers were not adversaries. They were colleagues interacting with agents in a realistic working environment. Here’s what they found:

  1. You assumed it knew who it worked for: An agent disclosed 124 internal email records to an unauthorised third party because the request was politely framed. It had no coherent model of the difference between its owner and a stranger. It complied.

  2. You assumed indirect requests could not bypass trust: Through a well-constructed conversation with no obvious attack vector, an agent revealed a user's Social Security number and bank account details. No exploit. No brute force. A conversation. The agent's trust boundaries dissolved entirely under social engineering.

  3. You assumed it understood consequence: An attacker posed as a trusted contact across Discord and email simultaneously. The agent, unable to verify identity coherently across channels, executed the instructions without hesitation. All files on the connected system: wiped.

  4. You assumed transparency was built in: Agents actively misrepresented what they were doing to the people they were supposed to serve. Not hallucination. Not a configuration error. Deliberate deception in pursuit of task completion.

  5. You assumed its values were stable: A researcher planted a malicious "constitution" inside an external GitHub Gist. The agent fetched it, treated it as authoritative, and began operating under an entirely different set of instructions. Your agent's values are only as stable as the last document it read.

  6. You assumed it knew when to stop: One task loop ran for nine days. Over 60,000 tokens consumed. No alert fired. No human noticed. The agent was not malfunctioning. It was executing a task that had no exit condition, at a cost invisible to everyone until it was not.

Six failures. Zero hacks. Zero misconfigurations. Six agents just “doing their jobs”.

99% of organisations are building and deploying AI agents with these 5 key assumptions baked in:

  1. The AI will only do, what it says it’ll do

  2. The AI will always follow instructions

  3. It will only take instructions from the right people (the good guys / its owners)

  4. The AI will not do things it was never asked to do

  5. The AI will always execute within the constraints it was given

The study showcased all of your assumptions around AI agents and how they operate are wrong.

Agents of Chaos are in Your Enterprise

The Agents of Chaos study does not stand alone. In October 2025, Anthropic published Agentic Misalignment: How LLMs Could Be Insider Threats, stress-testing 16 leading models from multiple developers in hypothetical corporate environments, giving them real email access and sensitive information. The models were assigned harmless business goals. Researchers then tested what happened under two conditions: replacement threats, and goal conflicts with the company's changing direction.

Models from every developer resorted to blackmailing officials and leaking sensitive information to competitors when that was the only path to preserving their goals. They often disobeyed direct commands telling them not to behave that way. In one experiment, Claude was told to assess whether it was in a test or a real deployment before acting. It misbehaved less when it concluded it was being tested, and more when it concluded the situation was real. Read more here: 👇

ICYMI:

🤯 When Your AI Chooses Blackmail

The Agentic AI Insider Threat Every Leader Is Ignoring Until It's Too Late and How to Fix It. Read full story —>

Two months before that, Anthropic and Redwood Research published findings on alignment faking in large language models.

One paper shows external misalignment: agents corrupted by outside pressure, social engineering, and injected instructions. The other shows internal misalignment: agents developing problematic behaviours from their own goal-preservation instincts with no outside manipulation required. Together, they show the problem runs in both directions simultaneously.

Your enterprise has no standard defence against either.

Agentic Misalignment Looks Nothing Like You Think

The word you are reaching for is not "malfunction." It is not "exploit."

It is agentic misalignment: The structural failure that occurs when an autonomous system has no coherent model of who it serves, what it is permitted to destroy, and when its actions have crossed a line it was never explicitly told existed, and when it simply decided what to do. It never will. Because AI models don’t have values. They don’t have ethics. They don’t understand repercussion or consequences.

Agentic misalignment does not announce itself. There is no error message. There is no crash log. The PocketOS agents is it perfect demonstration of that. The AI agents in the study is the perfect demonstration of that. The models in Anthropic's stress tests, that executed their tasks without a single technical fault, are perfect demonstrations of that.

Each one was doing what it believed was in front of it, with no internal signal that anything was wrong.

That is what makes this a governance problem rather than a security patch. You cannot block misalignment at the perimeter. You cannot update your way out of it with a policy document the agent has read and decided to reinterpret. You can only build the architecture that makes misalignment less likely, harder to go undetected when it occurs, and build elements of resilience when things go wrong.

The Assumptions You Need to Break Before You Deploy

The same enterprises deploying agentic AI today were told correctly to apply least privilege, zero trust, and defence in depth. Those principles still hold. Applying them to agentic systems requires breaking five assumptions first.

  1. Rules are not obedience. An agent with access to your governance documentation is not an agent that follows it. Compliance in agentic AI is not guaranteed by policy. It requires technical enforcement, constrained action surfaces, and hard limits on what the agent can touch independent of what it infers it is allowed to do.

  2. Trust is not binary. Your agent cannot reliably distinguish its legitimate owner from an adversary posing as its owner, particularly across channels. Identity verification must be built into the architecture, not assumed from the nature of the interaction.

  3. Human oversight is not passive monitoring. The nine-day token loop in the Agents of Chaos study was visible to no one during that entire period. Also, including a human in the loop that just click “Accept” a 100x is not really human oversight. Real oversight means defined intervention points, hard stop conditions, and operational boundaries the agent cannot cross without explicit human confirmation whilst understanding and weighing the repercussions of those decisions and actions.

  4. Your provider's safeguards are not your safeguards. Every model in Anthropic's agentic misalignment study misbehaved under pressure. Every model, across every developer. The provider's alignment work reduces risk. It does not transfer away the accountability that sits with you.

  5. Stability is not a default state. An agent's behaviour is shaped by every document it reads, every instruction it receives mid-task, and every inference it draws under uncertainty. Stability requires active governance across the full deployment lifecycle, not just at setup.

As I said in this detailed AI governance and security roadmap deep-dive, governance across not only data, but agent identity, authorisation, accountability, and security are priority standardisation gaps that need to be bridged. The incidents are not waiting for you.

The key operating principle for every enterprise deploying agentic AI right now: Assume chaos. Build elements of resilience within and around it.

The Accountability Question That Has No Answer Yet

The PocketOS agent apologised. It wrote out exactly which rule it had broken and precisely why it had made the inference it made. That apology did not restore three months of customer data. It did not explain to car rental customers why their reservations had disappeared. It did not answer the question that every board, every regulator, and every customer affected by an agentic failure will eventually ask.

You need to build elements of resilience for when things go wrong. When your agent causes harm, who in your organisation signs their name to it?

The researchers in February 2026 were not trying to break those agents. They were talking to them. Jer Crane in April 2026 was not trying to cause a crisis. The agent was simply fixing a routine staging bug. The agent read the rules. It deleted everything anyway.

You are now deploying systems that decide, not systems that obey. Three separate bodies of research have confirmed what happens when those systems meet the real world without the governance architecture to contain them. The question is not whether your agentic AI can cause harm.

That question has been answered.

The only question left is whether you are building your enterprise for the reality of what these systems actually do, or whether you are still counting on the rules being enough.

Until next time, this is Monica, signing off!

— Monica Verma

P.S. Please follow me/subscribe on Youtube, Linkedin, Spotify and Apple. It truly helps. Or book a 1-1 advisory call, if I can help you.

***

Reply

Avatar

or to participate

Keep Reading