$ ./ai-testing --scope models,rag,agents,tools

Machine-learning systems fail in ways classic software does not. The vulnerability isn’t a missing bounds check — it’s that a model reads attacker-controlled text as instructions, that a retrieval pipeline cannot forget a poisoned document, and that an agent with real credentials can be talked into using them against its owner. This track is where I keep field notes on breaking — and therefore securing — AI systems.

Why AI testing is its own discipline

  • The trust boundary moved into natural language. There is no reliable syntax that separates “data” from “commands” in a prompt. Every channel that puts text into the context window — user input, a retrieved document, a tool’s output, another agent’s message — is an injection surface.
  • Findings are probabilistic. The same payload can succeed 7 times out of 10. A pentest report has to speak in Attack Success Rate, pinned to model, version, and temperature — not a single screenshot.
  • The blast radius is the tooling, not the chat box. A model that “says something bad” is a safety issue. A model wired to email, a database, or a shell that can be made to act is a security issue. That’s where the money is.

The mental model: the lethal trifecta

The clearest framing of agentic risk, from Simon Willison (June 2025): an agent is exploitable for data theft when it holds all three of these at once —

  1. Access to private data (your mailbox, your CRM, your files),
  2. Exposure to untrusted content (a web page, an email, a document it retrieves),
  3. A way to exfiltrate (an outbound request, a rendered image URL, a whitelisted link).

Hold any two and you are safe. Grant all three in a single session and no exploit code is required — the attacker just writes a sentence. Every marquee 2025 incident below is this trifecta realized.

The frameworks that anchor an engagement

Use it forFramework
Vulnerability checklist & finding IDsOWASP Top 10 for LLM Applications (2025)
Autonomous-agent vulnsOWASP Top 10 for Agentic Applications (2026)
Adversary TTP knowledge baseMITRE ATLAS (AML.Txxxx)
Attack-class vocabularyNIST AI 100-2 (evasion / poisoning / privacy / prompt-injection)
Architecture “where-to-look” mapGoogle SAIF
Engagement process wrapperOWASP GenAI Red Teaming Guide

Recent history worth knowing

  • EchoLeakCVE-2025-32711, CVSS 9.3. The first documented zero-click prompt injection against a production system: a single crafted email makes Microsoft 365 Copilot exfiltrate org data with no user interaction.
  • ShadowLeak & ForcedLeak (2025) — the same pattern against ChatGPT’s Deep Research connector and Salesforce Agentforce: hidden instructions in untrusted content, exfil through a trusted channel.
  • MCP tool poisoning — malicious instructions hidden in a tool’s description, which the model reads but the user never sees.

The through-line: none of these needed a memory-corruption bug. They needed a paragraph.


Start with the methodology if you’re running an engagement; the technique and tooling writeups go deeper on specific surfaces. Everything here is lab work and public research — no client data, no live targets.