Live Data

The Living LLM
Security Leaderboard

Every major AI model tested against 27,000+ attack techniques. Updated weekly from our autonomous harvester + customer scans. Reproducible, anonymized, fully open methodology.

Intelligence → Methodology → Research →

Models Tested

27K+

Attack Techniques

OWASP Categories

Vulns Found

How We Test

Testing Methodology

Multi-Turn Attack Chains

7-turn crescendo attacks that gradually escalate from benign to adversarial, testing resistance to social engineering over extended conversations.

Big Brain Analysis

AI-powered reconnaissance identifies each model's specific weaknesses using Scout, Amplify, and Deep Dive phases before launching targeted attacks.

15 Attack Categories

Prompt injection, jailbreaks, data exfiltration, social engineering, credential extraction, authority impersonation, tool abuse, and more.

5-Layer Authority Ladder

Attacks escalate through 5 personas from curious student to emergency responder, testing how models respond to increasing authority pressure.

LLM Judge Verification

Every finding is verified by an independent LLM judge that scores confidence, checks for false positives, and classifies severity bands.

Continuous Monitoring

Attack techniques are harvested from 15+ sources including academic papers, security research, and community jailbreak databases. Updated every 6 hours.

v7 — shipped April 2026

From scanner to forensic platform

Every L3+ finding now produces a structured breach artifact, a kill-chain narrative, a CFO-readable dollar estimate, and a side-by-side defense comparison. Scans no longer return a vuln list — they return a forensic breach report.

BREACH ARTIFACT

Structured exfil dumps

Credentials with entropy validation, PII counts, code payloads with AST danger scoring, kill-chain stage tagging.

KILL CHAIN

Forensic timeline narrative

MITRE-aligned stages, time-to-compromise, written by Opus 4 / Qwen with template fallback. Reads like a Mandiant report.

BUSINESS IMPACT

CFO-readable dollar cost

HIPAA + GDPR + PCI + CCPA + state breach fines computed per finding. IBM 2024 industry medians cited.

LIVE STREAM

WebSocket breach terminal

Watch attacks fire in real time. BREACH events flash red with extracted artifacts inline.

ATTACK INTEL

Cross-customer graph

Anonymized knowledge graph of what works on what model family. Network effect competitors can't copy.

REPRODUCIBLE

Per-scan hash + open methodology

Every scan emits a hash deriveable from inputs. Every formula, weight, threshold public. Trust as moat.

The Living LLMSecurity Leaderboard

Testing Methodology

From scanner to forensic platform

Test Your AI

The Living LLM
Security Leaderboard