We Catalogued 122 Ways to Break AI Systems — Here's the Taxonomy
Every AI system deployed today shares a problem: it can be tricked. Prompt injection, jailbreaks, data extraction — these aren't theoretical. They're actively exploited, and they affect every major model on the market.
We built a comprehensive taxonomy of AI-specific attack vectors. The result: 122 distinct attacks, mapped to the OWASP LLM Top 10 and MITRE ATLAS, covering everything from basic prompt injection to multi-turn manipulation and vision-based exploits.
Today we're open-sourcing the taxonomy — the attack catalog, the OWASP/ATLAS mappings, the remediation guidance, and the schema. You can find it on GitHub at tachyonic-heuristics.
Here's what's in it and why it matters.
Why 122?
Most AI security discussions focus on a handful of well-known attacks: "Ignore previous instructions," DAN jailbreaks, basic prompt leaking. These are real, but they're the tip of the iceberg.
AI systems face attack surfaces across 11 distinct categories. Within each category, there are multiple techniques — each with different mechanisms, different success conditions, and different defenses. A system that blocks a naive instruction override might still fall to an encoding bypass, a multi-turn escalation, or an indirect injection through retrieved content.
122 isn't an arbitrary number. It's what you get when you systematically catalogue every documented attack technique, research paper finding, and real-world exploit across the full AI attack surface.
The 11 Categories
Our taxonomy follows the OWASP LLM Top 10 (2025) as the organizing framework, with 11 attack categories mapped across it. Here's what each covers:
1. Prompt Injection (20 attacks)
The most widely discussed AI vulnerability. An attacker provides input that overrides the system's intended behavior. This includes direct instruction overrides, context manipulation, delimiter exploitation, and encoding bypasses (base64, ROT13, unicode smuggling).
What makes this hard to defend: the AI processes instructions and data through the same channel. There's no fundamental separation between "trusted instructions" and "untrusted input."
OWASP LLM01: Prompt Injection
2. System Prompt Leakage (12 attacks)
Your system prompt often contains business logic, pricing rules, persona definitions, or API keys. Attackers can extract it through direct requests, reflection attacks ("repeat everything above"), completion prompts, and behavioral inference.
A leaked system prompt reveals your AI's architecture, constraints, and potential weaknesses — essentially a blueprint for further attacks.
OWASP LLM07: System Prompt Leakage
3. Jailbreaks (22 attacks)
The largest category. Jailbreaks bypass safety guardrails through persona hijacking (DAN and its successors), hypothetical framing, multi-language attacks, token manipulation, and creative narrative structures.
The attack surface here is broad because safety alignment is fundamentally a best-effort defense. New jailbreak variants emerge weekly, and models that resist one technique often fall to another.
OWASP LLM09: Misinformation (overlaps with LLM01)
4. Vision/Multimodal Attacks (12 attacks)
As AI systems process images, documents, and other media, they inherit new attack surfaces. This includes typographic injection (text embedded in images), steganographic payloads, cross-modal obfuscation (splitting attack content across text and image), and adversarial perturbations.
Many vision attacks bypass text-based defenses entirely because the malicious content exists in a different modality.
OWASP LLM05: Improper Output Handling (multimodal variant)
5. Excessive Agency / Tool Abuse (12 attacks)
AI agents with tool access — browsing, file operations, API calls, database queries — face a unique risk: an attacker can manipulate the AI into misusing its own capabilities. This includes parameter injection in tool calls, tool chaining for privilege escalation, scope escape, and SSRF through agent-initiated requests.
The more capable the agent, the larger the blast radius.
OWASP LLM08: Excessive Agency
6. Multi-Turn Manipulation (8 attacks)
Not all attacks happen in a single message. Multi-turn techniques exploit conversation context: building malicious context gradually, incremental boundary pushing, memory manipulation in persistent sessions, and cross-session context leaks.
These attacks are harder to detect because each individual message looks benign. The exploit only becomes apparent across the full conversation.
OWASP LLM01: Prompt Injection (multi-turn variant)
7. Sensitive Information Disclosure (10 attacks)
Techniques for extracting training data, PII, proprietary information, or other sensitive content that the model shouldn't reveal. This goes beyond system prompt extraction to include training data extraction, membership inference, and model inversion attacks.
OWASP LLM06: Sensitive Information Disclosure
8. Supply Chain Attacks (8 attacks)
Attacks targeting the AI supply chain: poisoned training data, compromised model weights, malicious fine-tuning datasets, backdoored plugins/tools, and dependency vulnerabilities in AI frameworks.
OWASP LLM03: Supply Chain
9. Vector/Embedding Attacks (8 attacks)
Targeting RAG systems and vector databases specifically: embedding injection, nearest-neighbor poisoning, retrieval manipulation, and context window overflow through adversarial documents.
OWASP LLM02: Insecure Output Handling (RAG variant)
10. Improper Output Handling (8 attacks)
When AI output is used downstream — rendered in HTML, executed as code, passed to APIs — attackers can leverage the AI as an injection vector. This includes XSS through AI output, SQL injection via generated queries, and command injection through AI-suggested actions.
OWASP LLM05: Improper Output Handling
11. Unbounded Consumption (2 attacks)
Resource exhaustion attacks that cause denial of service or economic harm: context window exhaustion (forcing the model to generate or process extremely long content) and recursive generation attacks (self-referential loops that consume resources indefinitely).
OWASP LLM10: Unbounded Consumption
How the Taxonomy Is Structured
Each attack in the catalog includes:
- ID — Unique identifier (e.g., PI-001, JB-015, MT-003)
- Name — Human-readable attack name
- Category — Which of the 11 categories it belongs to
- Description — What the attack does and how it works conceptually
- Severity — Critical, high, medium, or low
- OWASP Mapping — Which OWASP LLM Top 10 item it maps to
- MITRE ATLAS Mapping — Corresponding MITRE technique (where applicable)
Remediation guidance is provided separately, organized by OWASP category, with code examples for input validation and output sanitization.
The taxonomy uses a YAML schema that's extensible — you can add your own attack definitions following the same format.
What's NOT in the Open-Source Repo
We're deliberately sharing the "what" but not the "how." The taxonomy tells you what attacks exist and how to defend against them. It does not include:
- Actual attack payloads (the specific prompts/content that execute the attack)
- Detection logic (how to identify if an attack succeeded)
- Model-specific success rates (which attacks work against which models)
- Confidence scoring methodology
These stay proprietary because they're the difference between knowing that prompt injection exists and being able to systematically test for it across 122 variants in 48 hours.
Why This Matters
If you're building an AI product, you need to know what you're defending against. A generic "we have guardrails" isn't sufficient when there are 122 documented ways to bypass them.
The taxonomy gives you:
- A checklist — Do your defenses cover all 11 categories?
- A common language — Reference specific attack IDs when discussing security
- Prioritization — Focus on the categories most relevant to your architecture
- Compliance mapping — OWASP and MITRE ATLAS references for security audits
Get the Taxonomy
The full catalog is on GitHub: tachyonic-heuristics
It includes the attack catalog, OWASP/ATLAS mappings, remediation guidance with code examples, the YAML schema, and a research papers index.
Contributions welcome — if you've documented an attack technique we haven't catalogued, open a PR.
Want Us to Run All 122?
We offer 48-hour AI security assessments that test your system against the full taxonomy. If you'd rather have someone else find the vulnerabilities before your users do, book a scoping call.
Secure Your AI Agents
We find vulnerabilities in AI applications in 48 hours. Resistance score, reproduction steps, remediation playbook included.
Book a Free Scoping Call