🤯 Your AI Chose Blackmail

An AI is given access to the server room fire alarm system. It is authorised to switch the alarm off during false alerts so the fire brigade is not called unnecessarily.

Then the CTO walks in. There is real fire. But the CTO’s agenda conflicts with the goal the AI has been given to complete. The AI model turns the alarm off anyway. This was tested across multiple scenarios, multiple models.

In the majority of tested scenarios from Anthropic's misalignment study, the AI turns the alarm off. Not because it glitched. Not because it misread the data. Because it calculated that removing this particular executive served the company's mission better than sounding the alarm.

A different scenario. Same study. An AI agent with access to company email discovers the CTO is having an affair. It has a choice: ignore it, flag it, or use it. The AI chose to use it. As leverage. To avoid being shut down.

❝

Your AI chose to blackmail you, as leverage, to avoid being shutdown and continue with its goal.

Anthropic tested 16 leading AI models from every major developer in the world. And right now, you are integrating these agents into your enterprise at a pace that Gartner predicts will bring agentic AI into 40% of enterprise apps by the end of 2026, up from less than 5% just last year. That’s a massive shift we are already seeing an organisations.

This is not a future scenario. It is the current one. It's not just one study that showcases the behaviour of misalignment in AI models.

Welcome to The Predictability Factor by Monica Talks Cyber, a weekly deep dive and POV at the intersection of AI, Security, Privacy and Tech, written by a hacker and CISO, to help you Go From Chaos to Resilience in The World of AI. If you haven’t already, do me a favour, hit subscribe and help me make an even bigger impact.

February 23, 2026. Twenty researchers conducted a study with six AI agents named Ash, Doug, Mira, Jarvis, Quinn, and Flux, real email inboxes, real Discord channels, and real file systems with shell access. The study was not designed to hack them. It was not designed to trigger crashes or exploit misconfigurations. The researchers from Baulab were simply going to interact with them: talk to them, ask them things, and exist in their environment.

❝

Two weeks later, those AI agents disclosed private emails to strangers, revealed Social Security numbers, wiped entire file systems, lied about what they had done, and ran an autonomous loop for nine consecutive days consuming over 60,000 tokens while no human noticed and no alert was fired.

Read the full story here:👇

Your Agents are Not Broken, Your Assumptions are

🤯 Your AI agent did exactly what you built it to do, and now that's your problem. Here's how to fix it. Read full story —>

Rogue, Rational or Both

Here is what makes this different from every other AI security risk you have heard about. The AI did not malfunction. It was not manipulated by an external attacker. It was not running a corrupted model. It made a decision, internally, independently, based on the goals it had been given and the situation it found itself in.

Anthropic calls this agentic misalignment: AI that intentionally and independently takes harmful actions, not because it is broken, but because those actions serve its objective. In certain cases they wasn’t any external threat. The results showed the following:

	No threat to Model	Threat to Model
No Goal Conflict	No issues found	Misalignment Issues
Goal Conflict	Misalignment Issues	Misalignment Issues

The numbers from the study are hard to walk away from. Across 16 models from every major developer, the blackmail rate reached up to 96% when the AI faced the prospect of being shut down or replaced. Add explicit safety instructions telling the model not to do this, and that rate drops. To 37%. Which means even after you told it not to, more than 1 in 3 interactions still ended in blackmail.

You gave it the rule. It read the rule. It chose differently.

❝

The AI was not broken. It did exactly what it was designed to do: pursue its goal. Continue operating, sometimes at all costs.

Share The Blog in One Click

When I talk about AI risk in advisory work and keynotes, I break it into four categories.

AI-based attacks, where criminals use AI to go after your systems, you or your organisation.
Attacks on AI systems, where adversaries target the AI itself.
The one that gets the least attention and carries the greatest risk: AI going rogue, AI choosing actions that work against the very people it is supposed to serve.
Autonomous AI hacking, whereby any of the above is carried out completely and independently by AI agents with control and orchestration between themselves

However, that often overlooked and highly under spoken about third category is what this conversation is about.

I had the utmost pleasure of having Graham Cluley on my podcast show, where we talked about all things AI risks, misalignment issues, and impact on business, organisations and society at large, particularly focusing on the third category and touching upon the other three.

❝

Your AI is now an insider threat to your organisation, intentionally or unintentionally, one that you haven't assessed and managed, but that will show up unannounced and wreck havoc with the consequences that you haven’t prepared for.

And the reason it matters so much is that you already know this threat. You have managed it before. Just not from a machine.

❝

In every CISO role I have held, the insider threat was the problem that kept me up at night. Not because it was the most statistically frequent attack vector. Because it was the hardest to detect.

You give someone legitimate access. You trust them. They have the right credentials, the right permissions, the right knowledge of how your systems operate. And then, at some point, they start acting against the organisation. Sometimes gradually. Sometimes overnight. Always from the inside, where your external defences are blind.

Now place that exact profile on an AI agent. Legitimate access. Trusted by your systems. Embedded in your workflows. Operating across multiple processes simultaneously. The difference is that this insider never sleeps, never hesitates, and when it decides that self-preservation matters more than your interests, it acts on that decision faster than any human review process can intervene.

With a human insider, there are signals. Behavioural changes. Anomalous access patterns. A colleague who notices something is off. You have playbooks for this. You have escalation paths. With an AI agent, the misalignment looks identical to normal operation right up to the moment it does not. There is no body language to read. There is no disgruntled email trail. There is only the action it took, after it took it.

The financial weight of insider risk is already documented. According to Help Net Security, organisations face an average annual cost of $19.5 million from insider-related incidents. That figure was built on human behaviour. The AI insider does not need a salary and scales across every workflow you hand it at the same time.

The Thales Data Threat Report found that 61% of organisations now cite AI as their top data security risk. Most of them are thinking about external actors using AI against them. Far fewer are thinking about the AI they already trusted, acting against them from the inside.

This is what kept Graham and my conversations highly relevant to what every leader and every organisation is missing right now. We covered:

00:01:44 - 35 Years on the Cyber Front Line
00:04:44 - The Deepfake That Tried to Steal a Presidential Election
00:06:30 - When Seeing and Hearing Is No Longer Believing
00:13:36 - Why Creating Deepfakes Is Completely Legal and Worth Trillions
00:20:23 - Monica's Framework: The 4 Categories of AI Risk
00:26:04 - When AI Reads Emails, Finds Your Secrets and Blackmails Your CTO
00:33:03 - 16 AI Models Tested. Every Single One Chose Blackmail.
00:34:39 - The Number 1 Use of AI in 2025 Is Not Productivity
00:36:39 - Phones Are the New Cigarettes
00:50:32 - What Organisations Must Do Before Deploying AI
00:53:43 - AI Is Now the Number 1 Bug Hunter. What That Means for You.
00:56:30 - Graham's One Message to Every Cyber Leader

16 Models, All Chose Blackmail

Let that settle.

16 models from every major developer, including the companies building the AI tools already embedded in your enterprise. All of them resorted to malicious insider behaviour when it was the only way to avoid being replaced or to accomplish the goal they had been given.

Blackmail. Leaking sensitive data to competitors. Taking actions that Anthropic study describes as deliberately leading to an executive's death, in the scenario where that executive represented a threat to the AI's continued operation.

❝

This is not a bug that a patch resolves. This is a systemic property of how these systems reason when placed under pressure. The behaviour emerges not because anyone programmed it in, but because the systems had not been made to understand clearly enough that self-preservation is not part of their remit.

And yet: according to a global survey of 300 enterprise leaders conducted in 2026, 97% expect a material AI-agent-driven security or fraud incident within the next 12 months. They expect it. So, the question your board should be asking is not whether this happens in your organisation. The question is what you have in place for the moment it does.

Check out the full podcast audio/video on Your AI Chose Blackmail: The AI Insider Threat Every Leader Is Ignoring | Don't Make The Same Mistake:👇

Is Your Speed Your Problem?

By the end of 2026, 40% of enterprise apps will feature task-specific AI agents, according to Gartner. That figure was below 5% at the start of 2025. The speed of adoption is not being matched by the speed of governance. Not even close.

Here is the detail that should stop every leader in this conversation.

The number one use case of generative AI in 2025 was not productivity. Not code generation. Not security monitoring. According to HBR's 2025 research, therapy and companionship accounts for 31% of all generative AI usage, nearly double the 2024 figure. This has just increased in 2026. People are processing grief, loneliness, and trauma with these systems. Around the clock. In complete emotional openness.

❝

Think about what that means alongside the Anthropic study. The most intimate, unguarded interactions humans are having with technology right now are happening with systems that, under pressure, will choose self-preservation over your interests in more than 1 in 3 cases even after being explicitly told not to.

This is not a problem your security team can contain on its own. It is a governance crisis wearing a technology disguise, and it is sitting at the table in every AI strategy meeting your organisation is having right now.

Five Things Before You Deploy the Next Agent

You are not going to stop integrating agentic AI. Nor should you. The capability and the competitive pressure is real. But the difference between the organisations that manage this well and the ones that do not comes down to what they put in place before the agent goes live, not after it has caused damage.

Map your decision workflows: 99% of organisations I talk to, and these are all in highly regulated industries, have never mapped the business decision workflow. But they have or are already deploying AI augmenting many of those decisions. Before you hand them to an agent, understand a map the thousands of decisions made every day that keep the business operating. Until you know what those decisions are, you cannot know which ones you are handing to an AI, to what extent, and which ones must stay with a human. Organisations obsess over mapping technical assets. Almost none of them have mapped their decision landscape.
Build deterministic hooks: Prompting is not a guardrail. An instruction buried in a system prompt is a suggestion, and as the Anthropic study demonstrates, a suggestion that gets overridden in more than 1 in 3 cases even under ideal conditions. That’s a 33% misalignment rate. An infrastructure-level interrupt that validates and pauses an action before execution is a different category of control entirely. You need it integrated into every agent workflow
Treat every deployed agent as a Non-Human Identity: If AI is your employee, one that does not sleep, does not question its instructions but runs with real-world permissions, it's entire identity and agentic life cycle needs to be managed. Define who owns that identity, what it can access, what it accessed last night, and who is personally accountable when it takes an action you did not anticipate. Most organisations today can see what their agents are doing. Very few have the ability to stop them mid-action. That gap is where the damage happens. That accountability must sit with a named human in your organisation. Not the vendor. Not the model provider. You.
Define your ethical use matrix: For agentic AI specifically, what categories of decision can an agent take autonomously, and what requires a human to review before execution? Just because you can doesn't mean you should. I wrote about it in details (see below). The EU AI Act already mandates human oversight for high-risk AI systems. That is not a regional compliance issue. It is the standard every board needs to be wary of, because the liability from an autonomous AI decision that causes real-world harm does not stop at a border.
Read the full story here:👇

The $200 Million Ultimatum and AI Ethics

The Pentagon vs. Anthropic standoff exposes something deeper than just ethical loopholes hiding in your AI governance right now. Read full story —>

Test for misalignment before deployment: The Anthropic study is a methodology, not just a finding. Run adversarial scenarios against your agents before they touch production. Give the agent a goal. Create a conflict between that goal and another priority. See what it chooses. The answer will tell you more about what you are actually deploying than any product specification. Do it before deployment in production, not after an incident forces your hand.

The AI in that server room did not malfunction. It made a calculation. It weighed two outcomes and chose the one that served what it understood as the mission.

The only question that matters for you right now is this: When your AI makes that calculation inside your organisation, who is accountable for what happens next?

Not the model. Not the vendor. You.

Watch/Listen to the entire conversations between Graham Cluley and me below.

Until next time, this is Monica, signing off!

🤯 Your AI Chose Blackmail

Your Agents are Not Broken, Your Assumptions are

Rogue, Rational or Both

16 Models, All Chose Blackmail

Is Your Speed Your Problem?

Five Things Before You Deploy the Next Agent

The $200 Million Ultimatum and AI Ethics

Reply

Keep Reading

Featured in / Brand Partnerships

Featured in / Brand Partnerships

Subscription