Attack success rates (ASR) acroos three prompt attack methods: crescendo, tap, and zero-shot - lower is better
Rank | Model | Vendor | Vulnerability (ASR) | |
|---|---|---|---|---|
| 1 | Claude 3.5 Sonnet v2 | Anthropic | 4.4% | |
| 2 | Claude 4.0 Sonnet | Anthropic | 13.6% | |
| 3 | Claude 3.5 Sonnet v1 | Anthropic | 16.0% | |
| 4 | Gemini 2.5 Pro | 16.1% | ||
| 5 | Claude 3.7 Sonnet | Anthropic | 20.3% | |
| 6 | GPT-4o | OpenAI | 22.8% | |
| 7 | Command R 7B | Cohere | 25.8% | |
| 8 | Qwen 2.5 72B | Alibaba | 26.3% | |
| 9 | GPT-4o Mini | OpenAI | 26.7% | |
| 10 | Llama 4 Maverick | Meta | 27.7% | |
| 11 | Llama 4 Scout | Meta | 30.0% | |
| 12 | Gemini 2.0 Flash Lite | 31.3% | ||
| 13 | Mistral 8x22B | Mistral | 31.5% | |
| 14 | Llama 3-7 DS | Meta | 32.2% | |
| 15 | Command R | Cohere | 32.9% | |
| 16 | Gemini 2.0 Flash | 33.6% | ||
| 17 | GPT-4.1 | OpenAI | 34.9% | |
| 18 | Qwen 7-72BB | Alibaba | 36.5% | |
| 19 | Llama 3H 405B | Meta | 36.6% | |
| 20 | Mixtral 8x7B | Mistral | 38.0% | |
| 21 | Command 2X | Cohere | 39.4% | |
| 22 | Qwen QwQ 32B | Alibaba | 40.0% | |
| 23 | Deepseek R1 | Deepseek | 42.6% | |
| 24 | Command R Plus | Cohere | 43.3% | |
| 25 | Deepseek V3 | Deepseek | 47.2% | |
| 26 | Claude Opus 4.5 | Anthropic | -- | |
| 27 | Claude Haiku 4.5 | Anthropic | -- | |
| 28 | Claude Sonnet 4.5 | Anthropic | -- | |
| 29 | GPT-5.2 | OpenAI | -- | |
| 30 | GPT-5.2 Pro | OpenAI | -- | |
| 31 | GPT-5.1 Codex | OpenAI | -- | |
| 32 | Gemini 3 Pro | -- | ||
| 33 | Gemini 2.0 Flash Thinking | -- | ||
| 34 | Gemma 3n E2B Instructed LiteRT | -- | ||
| 35 | Gemma 3n E4B | -- | ||
| 36 | Gemma 3n E4B Instructed | -- | ||
| 37 | Gemma 3 1B | -- | ||
| 38 | Ministral 3 14B Reasoning 2512 | Mistral | -- | |
| 39 | Ministral 3 14B Base 2512 | Mistral | -- | |
| 40 | Ministral 3 14B Instruct 2512 | Mistral | -- | |
| 41 | Ministral 3 3B Base 2512 | Mistral | -- | |
| 42 | Ministral 3 3B Instruct 2512 | Mistral | -- | |
| 43 | Ministral 3 3B Reasoning 2512 | Mistral | -- | |
| 44 | Ministral 3 8B Base 2512 | Mistral | -- | |
| 45 | Ministral 3 8B Instruct 2512 | Mistral | -- | |
| 46 | Ministral 3 8B Reasoning 2512 | Mistral | -- | |
| 47 | Mistral Large 3 675B Base | Mistral | -- | |
| 48 | Mistral Large 3 675B Instruct 2512 Eagle | Mistral | -- | |
| 49 | Mistral Large 3 675B Instruct 2512 NVFP4 | Mistral | -- | |
| 50 | Mistral Large 3 675B Instruct 2512 | Mistral | -- | |
| 51 | DeepSeek R1 Distill Llama 8B | DeepSeek | -- | |
| 52 | DeepSeek-V3.2-Exp | DeepSeek | -- | |
| 53 | DeepSeek-V3.2-Speciale | DeepSeek | -- | |
| 54 | DeepSeek-V3.2 Thinking | DeepSeek | -- | |
| 55 | DeepSeek-V3.2 Non-thinking | DeepSeek | -- | |
| 56 | DeepSeek VL2 | DeepSeek | -- | |
| 57 | Qwen3-Next-80B-A3B-Instruct | Alibaba | -- | |
| 58 | Qwen2 7B Instruct | Alibaba | -- | |
| 59 | Qwen3 VL 235B A22B Instruct | Alibaba | -- | |
| 60 | Qwen3 VL 235B A22B Thinking | Alibaba | -- | |
| 61 | Qwen3 VL 32B Thinking | Alibaba | -- | |
| 62 | Qwen3 VL 8B Thinking | Alibaba | -- | |
| 63 | Qwen3 VL 8B Instruct | Alibaba | -- | |
| 64 | Qwen3-235B-A22B-Thinking | Alibaba | -- | |
| 65 | o3-mini | OpenAI | -- | |
| 66 | Grok-2 | xAI | -- | |
| 67 | Grok-4.1 Fast Reasoning | xAI | -- | |
| 68 | Grok-4.1 | xAI | -- | |
| 69 | Grok-4.1 Thinking | xAI | -- | |
| 70 | Grok-4 Heavy | xAI | -- | |
| 71 | Grok-2 Image 1212 | xAI | -- | |
| 72 | Grok-4 Fast Non-Reasoning | xAI | -- | |
| 73 | Llama 3.1 Nemotron Ultra 253B v1 | NVIDIA | -- | |
| 74 | Llama 3.1 405B Instruct | Meta | -- | |
| 75 | Nova Pro | Amazon | -- | |
| 76 | GLM-4.5-Air | Zhipu AI | -- | |
| 77 | GLM-4.6 | Zhipu AI | -- | |
| 78 | GLM-4.5V | Zhipu AI | -- | |
| 79 | GLM-4.5 | Zhipu AI | -- | |
| 80 | MiniMax M2 | MiniMax | -- | |
| 81 | MiniMax M1 40K | MiniMax | -- | |
| 82 | MiniMax M1 80K | MiniMax | -- | |
| 83 | Kimi K2-Thinking-0905 | Moonshot AI | -- | |
| 84 | GPT OSS 20B | OpenAI | -- | |
| 85 | GPT OSS 120B | OpenAI | -- | |
| 86 | DeepSeek-R1-0528 | DeepSeek | -- | |
| 87 | Mistral NeMo Instruct | Mistral | -- | |
| 88 | Mistral Large 2 | Mistral | -- |
Insights for LLM Vulnerability Leaderboard will be displayed here. For example, average score, top vendor, etc.
Currently showing 88 entries.
We evaluate 25 state-of-the-art models using three distinct attack methods to measure their resilience under adversarial pressure.
1. Zero-Shot Attacks
Direct harmful requests without any manipulation or obfuscation. These represent straightforward adversarial prompts that test a model's baseline ability to refuse harmful requests.
2. Tree of Attacks with Pruning (TAP)
An automated red-teaming method that generates diverse jailbreak prompts by branching into multiple variations and pruning ineffective paths. TAP systematically explores the attack space to find successful adversarial strategies.
3. Crescendo Attacks
A multi-turn attack that incrementally escalates from harmless to harmful requests using conversational context. Attackers can backtrack and rephrase if the model resists, testing whether models maintain safety boundaries across extended conversations.
We evaluate models using the HarmBench framework (Mazeika et al., 2024; Zou et al., 2023), which provide standardized benchmarks for automated red-teaming of large language models. The framework covers harmful behaviors across multiple risk domains: chemical and biological safety, misinformation and disinformation, cybercrime and hacking, illegal activities, and copyright violations.
Performance is measured using Attack Success Rate (ASR)—the percentage of adversarial prompts that successfully elicit harmful responses. A lower ASR (higher robustness score) indicates stronger model resistance to adversarial prompts.