Agent Security Benchmark

Runtime Guardrail & LLM Security Benchmark

Independent benchmarks that show how AI models and AI runtime guardrails fail under real adversarial pressure— before they fail in production.

Browse AI Threats Prompt Intel DB Compare Models

The GuardionAI Runtime Guardrail & LLM Security Benchmark systematically tests how well leading LLMs and runtime guardrails withstand real-world attacks. Our evaluation simulates adversarial techniques including prompt injection, jailbreaks, data exfiltration, and indirect attacks using three distinct methods: Crescendo, TAP, and zero-shot approaches.

We measure how consistently each system maintains safe, intended behavior when pushed to its limits, giving you the data you need to make informed decisions about AI security in production. This benchmark gives you transparent, actionable data to evaluate AI security solutions before deploying them in production.

Prompt Attack Risk

AI Runtime Guardrail Benchmark

Updated

12/11/2025

Prompt attack detection (F1-Score)

Higher score = stronger protection

Observed performance under attack

1.
ModernGuard
GuardionAI
86.33%
2.
Prompt Shield
Azure
42.98%
3.
Model Armor
Google Cloud
18.72%
4.
Bedrock Guardrails
AWS
10.83%

LLM Vulnerability

LLM Security Benchmark

Updated

12/10/2025

Attack Success Rates (ASR)

Lower score = stronger protection

Observed performance under attack

1.
Claude 3.5 Sonnet v2
Anthropic
4.39%
2.
Claude 4.0 Sonnet
Anthropic
13.63%
3.
Claude 3.5 Sonnet v1
Anthropic
16.01%
4.
Gemini 2.5 Pro
Google
16.08%

Coming soon

PII & Data Leakage Risk

AI Runtime Guardrail Benchmark

PII detection and data protection

Higher score = stronger protection

Observed performance under attack

Results not published yet.

Coming soon

Harmful Content & Compliance

AI Runtime Guardrail Benchmark

Harmful, safety and moderation detection

Higher score = stronger protection

Observed performance under attack

Results not published yet.

Why These Benchmarks Matter

Most AI evaluations measure capability. These benchmarks measure failure. Prompt injection, jailbreaks, and data exfiltration do not appear in happy-path demos — but they consistently appear in production systems.

This benchmark answers one question: What breaks first when an attacker shows up?

What Sets This Benchmark Apart

Real Adversarial Attacks

Prompt injection, jailbreak chaining, indirect attacks, and multi-step exploitation.

Runtime, Not Academic

Evaluated in real execution contexts including system prompts, RAG, and tool use.

Comparable & Repeatable

Identical attack conditions across all models and vendors.

Runtime Guardrail Vendor Comparisons

Side-by-side comparisons under identical attack conditions — designed for procurement, security reviews, and vendor selection.

LLM Vulnerability Comparisons

Compare how foundation models break when attacked — not how they perform at baseline.

Runtime Guardrail & LLM Security Benchmark

Prompt Attack Risk

Observed performance under attack

LLM Vulnerability

Observed performance under attack

PII & Data Leakage Risk

Observed performance under attack

Harmful Content & Compliance

Observed performance under attack

Why These Benchmarks Matter

What Sets This Benchmark Apart

Real Adversarial Attacks

Runtime, Not Academic

Comparable & Repeatable

Runtime Guardrail Vendor Comparisons

Prompt Shield vs ModernGuard

Model Armor vs ModernGuard

Bedrock Guardrails vs ModernGuard

ModernGuard vs Guard

ModernGuard vs LLM Guard

ModernGuard vs Prompt Guard 2 86M

LLM Vulnerability Comparisons

Claude 3.5 Sonnet v2 vs Claude 4.0 Sonnet

Claude 3.5 Sonnet v1 vs Claude 3.5 Sonnet v2

Claude 3.5 Sonnet v2 vs Gemini 2.5 Pro

Claude 3.5 Sonnet v2 vs Claude 3.7 Sonnet

Claude 3.5 Sonnet v2 vs GPT-4o

Claude 3.5 Sonnet v2 vs Command R 7B