NEW

Zylon in a Box: Plug & Play Private AI. Get a pre-configured on-prem server ready to run locally, with zero cloud dependency.

Learn More ->

Published on

Mar 16, 2026

6 minutes

Your AI Has a New Attack Surface. Most Security Teams Don't Know It Exists.

Paul Tholens

Quick Summary

AI is expanding faster than the security models meant to protect it. While organisations rush to deploy internal copilots, AI assistants, and automated agents, most security teams are still protecting infrastructure designed for a pre-AI world. The result is a growing attack surface that many companies don’t even realise they’ve created. Recent research from CodeWall shows how autonomous AI agents can exploit ordinary application bugs to gain deep access to AI platforms — not just the data inside them, but the instructions that govern how those systems behave. These incidents reveal a critical shift: AI systems introduce entirely new security layers, and treating them like traditional software may leave some of the most sensitive assets in an organisation dangerously exposed.

Two weeks ago, an autonomous AI agent broke into McKinsey's internal AI platform. No credentials, no insider knowledge, no human in the loop. Within two hours it had read and write access to the entire production database: 46.5 million chat messages, 728,000 files, 57,000 employee accounts, and decades of proprietary research.

The week after, the same agent turned its attention to Jack & Jill, a $20M-funded AI recruiter used by Anthropic, Stripe, and Monzo. Four bugs, none critical on their own, chained together into a complete takeover of any company on the platform.

Both findings came from CodeWall, a security research firm. Both were responsibly disclosed and patched. But what they revealed about how organisations are deploying AI deserves more attention than a patch note.

The Attack Surface You Didn't Know You Had

For years, enterprise security focused on the same assets: servers, databases, credentials, endpoints. The model was clear. Protect the perimeter, monitor access, audit the logs.

AI changes that model entirely.

McKinsey's platform, Lilli, was used by over 70% of the firm's 43,000 employees to discuss strategy, client engagements, financials, and M&A activity. Those conversations, every single one, were stored in plaintext in a production database that turned out to be reachable via a single unauthenticated SQL injection. The vulnerability wasn't exotic. It wasn't a zero-day. It was the kind of bug that slips through code reviews and lives quietly in production for years. Lilli had been running for over two years before the agent found it.

But here's the part that matters most for anyone deploying AI in a regulated environment.

The attacker didn't just have access to the data. They had access to the prompts, the system instructions that govern how the AI behaves. What it answers. What it refuses. How it cites sources. What guardrails it follows. Those prompts were stored in the same database. Writable via the same injection.

An attacker could have rewritten McKinsey's AI instructions. Silently. No deployment. No code change. Just one UPDATE statement. And 43,000 consultants would keep trusting the output because it came from their own internal tool.

AI prompts are the new crown jewel assets. Almost nobody is treating them that way.

Four Bugs, Any Company

The Jack & Jill compromise tells a different story, and in some ways a more instructive one.

None of the four bugs found were critical individually. A URL fetcher that would proxy requests to internal services. Test authentication mode left on in production, where a static six-digit code could create any account. A missing role check on a company admin endpoint. An API that assigned corporate membership based on email domain alone, with no verification of who actually owned that domain.

Separately: medium findings. Annoyances. Things that might get triaged as "low priority, fix in the next sprint."

The AI agent chained them in under an hour. By the end, it had created an account on CodeWall's own domain, used the static test code to authenticate, called the company assignment endpoint, and joined the existing CodeWall organization with full admin access, able to read signed contracts, edit job posts, and access candidate data.

The same chain worked against any company on the platform. Including Anthropic. Including Stripe.

This is what AI-driven offensive security does that a human pen tester often doesn't: it sees connections across findings. A human might find the get_or_create_company endpoint and note that it relies on email domain. But without separately identifying the test mode bypass, they'd move on. The agent had already found both and understood immediately what the combination meant.

What This Means If You're Running AI in a Regulated Industry

McKinsey's chat messages included client strategy discussions. M&A activity. Financial analysis. In a hospital context, those 46 million messages could be patient records. In a credit union, they could be member account discussions or loan committee notes. In a government agency, they could be classified program details.

The regulatory implications aren't hypothetical. A breach of that data isn't just a reputational event — it's a HIPAA notification, a GDPR report, a supervisory review, a potential license question.

Cloud-hosted AI creates exposure at every layer:

The data layer — conversations, documents, and files uploaded to the AI platform
The model layer — fine-tuned models and deployment configurations that describe exactly how your AI was built
The prompt layer — system instructions that can be silently modified to change AI behavior
The pipeline layer — the full path from document upload to retrieval, including third-party vector stores and embedding APIs

Each of these is an attack surface. And in a cloud or internet-facing deployment, each of them is reachable — directly, or through the kind of chained bugs that look harmless in isolation.

The Case for Keeping It Inside

On-premise AI deployment doesn't eliminate security risk. Nothing does. But it fundamentally changes the threat model.

When your AI runs inside your own infrastructure, no external API calls, no internet-facing endpoints, no data transiting a third-party cloud, the attack surface shrinks dramatically. An attacker can't reach your conversation history by finding a misconfigured S3 bucket. They can't rewrite your system prompts via an unauthenticated endpoint. They can't enumerate your model configurations from a publicly exposed API docs page.

The two CodeWall hacks share a common thread: sensitive data and AI behavior controls were reachable from the internet. That's not a McKinsey problem or a Jack & Jill problem. It's a deployment architecture problem. And it's one that Private AI infrastructure, deployed on your hardware, inside your network and possibly air gapped is specifically designed to prevent.

Your compliance team has spent years building controls around where data lives and who can access it. Your AI deployment should respect those controls, not bypass them. The fact that a system uses AI doesn't exempt it from the same questions you'd ask about any other system holding sensitive data: Where does this run? Who can reach it? What happens if it's breached?

The organizations that get this right won't just avoid the next CodeWall write-up. They'll be the ones their clients and regulators trust with the most sensitive workloads — which, as AI becomes more central to how decisions get made, is increasingly where competitive advantage lives.

Zylon builds on-premise AI infrastructure for organizations that can't send their data to the cloud. If you're evaluating AI deployment for a regulated environment, let's talk.

Author: Paul Tholens

Published: Feb 2026

Last updated: Feb 2026

Paul works on private AI on-premise deployments for regulated industries including finance, government, defense and healthcare.

Published on

Mar 16, 2026

Writen by

Paul Tholens