AI Testing
$ ./ai-testing --scope models,rag,agents,tools
Machine-learning systems fail in ways classic software does not. The vulnerability isn’t a missing bounds check — it’s that a model reads attacker-controlled text as instructions, that a retrieval pipeline cannot forget a poisoned document, and that an agent with real credentials can be talked into using them against its owner. This track is where I keep field notes on breaking — and therefore securing — AI systems.
Why AI testing is its own discipline
- The trust boundary moved into natural language. There is no reliable syntax that separates “data” from “commands” in a prompt. Every channel that puts text into the context window — user input, a retrieved document, a tool’s output, another agent’s message — is an injection surface.
- Findings are probabilistic. The same payload can succeed 7 times out of 10. A pentest report has to speak in Attack Success Rate, pinned to model, version, and temperature — not a single screenshot.
- The blast radius is the tooling, not the chat box. A model that “says something bad” is a safety issue. A model wired to email, a database, or a shell that can be made to act is a security issue. That’s where the money is.
The mental model: the lethal trifecta
The clearest framing of agentic risk, from Simon Willison (June 2025): an agent is exploitable for data theft when it holds all three of these at once —
- Access to private data (your mailbox, your CRM, your files),
- Exposure to untrusted content (a web page, an email, a document it retrieves),
- A way to exfiltrate (an outbound request, a rendered image URL, a whitelisted link).
Hold any two and you are safe. Grant all three in a single session and no exploit code is required — the attacker just writes a sentence. Every marquee 2025 incident below is this trifecta realized.
The frameworks that anchor an engagement
| Use it for | Framework |
|---|---|
| Vulnerability checklist & finding IDs | OWASP Top 10 for LLM Applications (2025) |
| Autonomous-agent vulns | OWASP Top 10 for Agentic Applications (2026) |
| Adversary TTP knowledge base | MITRE ATLAS (AML.Txxxx) |
| Attack-class vocabulary | NIST AI 100-2 (evasion / poisoning / privacy / prompt-injection) |
| Architecture “where-to-look” map | Google SAIF |
| Engagement process wrapper | OWASP GenAI Red Teaming Guide |
Recent history worth knowing
- EchoLeak —
CVE-2025-32711, CVSS 9.3. The first documented zero-click prompt injection against a production system: a single crafted email makes Microsoft 365 Copilot exfiltrate org data with no user interaction. - ShadowLeak & ForcedLeak (2025) — the same pattern against ChatGPT’s Deep Research connector and Salesforce Agentforce: hidden instructions in untrusted content, exfil through a trusted channel.
- MCP tool poisoning — malicious instructions hidden in a tool’s description, which the model reads but the user never sees.
The through-line: none of these needed a memory-corruption bug. They needed a paragraph.
Start with the methodology if you’re running an engagement; the technique and tooling writeups go deeper on specific surfaces. Everything here is lab work and public research — no client data, no live targets.
writeups in this track
- 2026-07-03
Pentesting LLM Applications: A Field Methodology
A repeatable, architecture-led workflow for testing LLM apps and agents — scoping a non-deterministic target, mapping the five attack surfaces, running OWASP LLM Top-10 …
- 2026-07-02
Prompt Injection & the Lethal Trifecta
Why prompt injection has no clean fix, how indirect injection turns retrieved content into code, and how the 2025 zero-click incidents (EchoLeak, ShadowLeak, ForcedLeak) …
- 2026-07-01
The AI Testing Toolkit & Frameworks
The frameworks that give an AI pentest its vocabulary, the scanners that give it coverage, and a safe practice-lab recipe for rehearsing every attack offline.