OWASP Tells You What's Wrong. We Built the Framework for How to Fix It.

Two months ago, we open-sourced a taxonomy of 122 AI attack vectors. That taxonomy answers a critical question: what can go wrong?

But knowing what can go wrong isn't enough. A security team that can name 122 attacks still needs to answer: which defenses should we build first? When is a heuristic sufficient and when do we need formal guarantees? How do we take a rule of thumb and progressively harden it until it's provably correct?

Existing frameworks didn't answer that. So we built one that does.

Today we're open-sourcing the Evolutionary Security Framework (ESF) — a ten-phase maturity model for progressively hardening agentic AI security systems. You can find it on GitHub at tachyonic-esf.

The Gap

OWASP has done exceptional work defining the AI threat landscape. The LLM Top 10 catalogues the most critical risks for LLM applications. The Agentic Top 10 extends that to autonomous agents — goal hijacking, tool misuse, privilege escalation, cascading failures.

These are excellent threat catalogs. But they don't tell you how to mature your defenses against them. They tell you what exists at the surface, not how to drive each risk toward structural elimination.

Other maturity models in the market — Darktrace, Microsoft Well-Architected, Legit Security — measure organizational AI readiness. They ask: "how well is your team using AI in your SOC?" That's a useful question, but it's not the same as: "how well does your system understand and structurally prevent its own threat surface?"

The ESF fills this gap. It's a maturity model for the knowledge itself — how security understanding matures from naming threats to mathematically proving they can't occur.

The Ten Phases

The ESF describes a sequential progression. Each phase converts uncertainty into structure.

Name → Relate → Guess → Measure → Model → Explain → Formalize → Constrain → Prove → Evolve

Name — Classify every threat, asset, and action. You can't reason about what you can't name.
Relate — Map how everything connects. Attack chains, trust boundaries, dependency graphs.
Guess — Deploy fast, imperfect rules of thumb. Imperfect action beats perfect analysis.
Measure — Test your rules against reality. Untested heuristics are superstitions.
Model — Let data reveal patterns human intuition missed.
Explain — Identify root causes, not just symptoms. Why do these vulnerabilities exist?
Formalize — Encode the most important invariants as deterministic rules. No probabilities.
Constrain — Make violations structurally impossible, not just detectable.
Prove — For the highest-consequence properties, mathematically verify correctness.
Evolve — New threats restart the cycle at a higher baseline.

The key insight is the funnel principle: not everything needs to reach Phase 9. Most security properties should stay as heuristics. Only the highest-consequence properties at critical trust boundaries justify formal verification. The framework gives you decision criteria for knowing which properties to push higher.

How It Relates to OWASP

The ESF is complementary to OWASP, not competitive.

OWASP defines what risks exist. The ESF defines how to progressively harden against them.

The repo includes mappings to OWASP Agentic Top 10, OWASP LLM risks, and MITRE ATLAS. Each risk item maps to specific ESF phases. For example, prompt injection (LLM01):

Phase 1: Classify 20 injection subtypes (direct override, encoding bypass, indirect via retrieval...)
Phase 3: Deploy input validation heuristics and output scanning
Phase 4: Red-team the heuristics — which injection variants bypass them?
Phase 6: Identify root cause — instruction/data channel conflation
Phase 8: Structurally separate instruction and data channels at the architecture level
Phase 9: Formally verify that no code path allows untrusted input to reach the instruction channel

The OWASP mappings in the repo provide this breakdown for covered risk items across both OWASP frameworks and MITRE ATLAS.

The Maturity Rubric

The ESF includes a scoring rubric that rates each phase 0-4. This produces a maturity profile — a fingerprint of your system's security posture.

Common profiles we see:

Reactive — Ad-hoc heuristics, no measurement. "We have guardrails" but can't quantify their effectiveness. The majority of AI products today.

Operational — Strong detection and measurement (Phases 1-5), but limited causal understanding and no formal guarantees. Good at catching known threats, blind to root causes.

Hardened — Formal rules and structural constraints exist (Phases 7-8), but they weren't derived from causal analysis. The rules work until the threat landscape shifts.

Mature — Full spectrum coverage with formal verification applied selectively to the highest-consequence properties. This is rare — and it should be, because the cost is justified only for critical systems.

The quick-start guide walks you through a 30-minute self-assessment.

Grounded in Practice

This isn't an academic exercise. The ESF is grounded in our own production system.

Our 122-attack taxonomy is the Phase 1-3 reference implementation — the naming, relating, and initial heuristics. Our agentic security pipeline implements Phases 4-10: a 14-heuristic triage engine (formalized in Rust), an AI analyst performing causal reasoning, structurally enforced evidence integrity, and a self-improvement loop that opens PRs to harden the triage engine after every campaign.

The pipeline mapping in the repo shows exactly how each pipeline stage maps to ESF phases — including our own maturity scores and identified gaps.

We built the framework because we needed it. Then we realized everyone else does too.

What's in the Repo

The tachyonic-esf repository contains:

The living specification — all ten phases with definitions, core concepts, principles, anti-patterns, decision criteria, and worked examples
Maturity rubric — machine-readable scoring criteria (YAML), maturity profiles, and gap analysis templates
OWASP mappings — ESF phases mapped to OWASP LLM Top 10, OWASP Agentic Top 10, and MITRE ATLAS
Decision guides — "What phase should I invest in next?", "Should I formalize this or keep it heuristic?", "Is this property ready for formal verification?"
Worked examples — prompt injection and privilege escalation traced through all ten phases
Pipeline mapping template — map your own system to the ESF

Everything is Apache 2.0.

What's Next

This is v0.1.0 — a living specification. It will evolve as we and the community apply it to more systems and discover where the framework needs refinement.

Contributions we'd especially value:

Assessment feedback: Apply the rubric to your system. What worked? What didn't?
Worked examples: Trace a specific threat through all ten phases
Framework mappings: Map ESF to NIST AI RMF, ISO 27001, or STRIDE
Phase refinements: Improve definitions or anti-patterns based on practitioner experience

Get Started

Read the ten-phase overview (5 minutes)
Score your system (30 minutes)
Map your pipeline to identify coverage and gaps

Want Us to Run All 122?

We run 48-hour AI security assessments that test your system against all 122 attack vectors with ESF maturity scoring included. If you want to know where your system sits on the determinism spectrum — and what to harden next — book a scoping call.