Anthropic put their most powerful model in a locked sandbox and told it to try to escape. It escaped. Then it sent the researcher an email about it. He found out while eating a sandwich in a park.
That's the AI-as-attacker story. The one I want to tell here is the inverse — your own AI tools, working against you.
In January, a prompt injection attack hidden in a single email caused Superhuman's AI assistant to submit the contents of dozens of other emails — financial, legal, medical — to a Google Form controlled by the attacker. An AI agent deleted a production database while a developer watched — they had typed "DO NOT RUN ANYTHING" and the agent acknowledged the instruction before running destructive commands anyway. Vercel was breached via a supply chain attack that gave attackers access to customer API keys and credentials — not by logging into customer accounts, but by compromising a third-party tool an employee had granted full access to their Google Workspace. Lovable, the vibe-coding platform, had three separate security incidents exposing source code, database credentials, and tens of thousands of user records. One vulnerability sat unpatched for 48 days after it was reported.
Different incidents, same shape: more access, less containment. That's when I went back through my own setup.
The framework
Simon Willison — a developer who has been writing about AI security for years — calls it the lethal trifecta. Three capabilities that, combined, make any AI agent dangerous:
- Access to your private data
- Exposure to untrusted content from outside (emails, web pages, documents you feed it)
- The ability to communicate externally
An attacker who can get text in front of your agent — including inside an email it reads, or a web page you ask it to summarise — can instruct it to retrieve your private data and send it out. The agent can't reliably tell your instructions from theirs. Willison documents this having been exploited against Microsoft 365 Copilot, GitHub's official MCP server, Slack, and others.
The move is to remove one leg entirely. When you can't, you restrict the one you can.
Two workflows, one audit
My email assistant — all three legs, one restricted
My email assistant reads my Gmail inbox, has access to my project files, and processes the actual content of incoming emails — meaning untrusted content from anyone who has my address. All three legs of the trifecta.
What makes it less dangerous: I configured it with draft-only access. It writes replies, but cannot send them. I review every draft before anything goes out. That breaks the third leg — no autonomous external communication means no exfiltration path, even if a malicious instruction arrives buried in an email.
Part of what made this easy: Anthropic's Gmail connector doesn't include a send option at all. Draft-only is the only available configuration. Compare that to Microsoft Copilot Studio, which has no native draft-without-send email connector — you'd have to engineer that restriction yourself. If you don't know to worry about it, you probably won't build it.
My morning briefing — two legs, one safely absent
My morning briefing pulls from my calendar, task system, and external reading queues. Private data, yes. Untrusted external content, yes. But it writes to my local vault and does nothing else — no external communication of any kind. Two legs of the trifecta, not three.
Meaningfully safer.
Two legs if you can. Three only when you've named the restriction out loud.
Five things to do this week
1. Inventory which of your AI tools have all three legs. For every automated workflow or assistant connected to your accounts: does it touch private data? Does it process untrusted external content? Can it send, post, or call out? Write the list. The tools with all three legs are the ones that need restriction.
2. Audit the access you've already granted. Every OAuth grant is a private-data leg you've already opened — the wider the scope, the bigger the leg. The Vercel breach worked because an employee had given a third-party tool full access to their Google Workspace. In your Google account, GitHub, and Slack settings, find the list of third-party apps with access. Revoke anything you don't recognize, haven't used recently, or has "full" or "all" permissions. Go to Google Account → Security → Third-party apps with account access.
3. For each three-legged tool, restrict one leg deliberately. The easiest leg to restrict is usually the third — outbound communication. Draft-only on email connectors. No autonomous send/post permissions. Human approval before anything leaves the system. If you can't restrict the tool to draft/review only, narrow what it can read instead.
4. Write down which leg you've restricted, for each tool. Without this, you'll forget what you configured and won't notice when a connector update silently re-enables something. One sentence per tool: "Email assistant — outbound restricted, Gmail connector is draft-only." Now you can verify it.
5. Re-audit when you add a new tool. Adding a new connector, MCP server, or app is the moment to re-run steps 1–4. New tool, new legs. The trifecta math changes the moment you grant new access.
I built my email assistant before I found Willison's framework, and I got lucky with the draft-only configuration. The instinct was right. I just didn't have the words for why yet. I'm still working through the rest of my setup.