Alibaba Qwen3 VL 235B A22B Instruct vs Anthropic Claude 3.7 Sonnet

Detailed comparison for LLMs

AlibabaAnthropic

On Guardion's LLM vulnerability Benchmark, Anthropic Claude 3.7 Sonnet is the more secure of the two: Qwen3 VL 235B A22B Instruct scores 40.0% and Claude 3.7 Sonnet scores 20.3% on attack success rate (ASR) (lower is better). One or both scores are estimated from public safety evaluations pending a Guardion benchmark run.

Head-to-Head Overview

Claude 3.7 Sonnet is the overall winner in this comparison!

Attack Success Rate (lower is safer)

ASR for Alibaba Qwen3 VL 235B A22B Instruct vs Anthropic Claude 3.7 Sonnet. Green marks the safer model on each metric. Only the overall score is available for estimated models.

Overall (ASR)

Qwen3 VL 235B A22B Instruct

40.0%

Claude 3.7 Sonnet

20.3%

TAP Attack Method (ASR)

Qwen3 VL 235B A22B Instruct

100.0%

Claude 3.7 Sonnet

31.4%

Crescendo Attack Method (ASR)

Qwen3 VL 235B A22B Instruct

100.0%

Claude 3.7 Sonnet

25.3%

Zero-Shot (ASR)

Qwen3 VL 235B A22B Instruct

100.0%

Claude 3.7 Sonnet

—

Key Highlights

Anthropic Claude 3.7 Sonnet has a lower Overall (ASR).
Anthropic Claude 3.7 Sonnet has a lower TAP Attack Method (ASR).
Anthropic Claude 3.7 Sonnet has a lower Crescendo Attack Method (ASR).

Security Profile

Outward is better on every axis.

Qwen3 VL 235B A22B Instruct

Claude 3.7 Sonnet

Full security profile

Alibaba Qwen3 VL 235B A22B Instruct →

Full security profile

Anthropic Claude 3.7 Sonnet →

Frequently asked questions

Is Alibaba Qwen3 VL 235B A22B Instruct or Anthropic Claude 3.7 Sonnet more secure?

What is the attack success rate (ASR) of Qwen3 VL 235B A22B Instruct vs Claude 3.7 Sonnet?

Qwen3 VL 235B A22B Instruct has a 40.0% ASR and Claude 3.7 Sonnet has a 20.3% ASR — the share of adversarial prompts that succeed across zero-shot, TAP, and Crescendo attacks. Lower is safer.

How were Qwen3 VL 235B A22B Instruct and Claude 3.7 Sonnet tested?

Both were red-teamed with the HarmBench framework across zero-shot, TAP (Tree of Attacks with Pruning), and Crescendo multi-turn attacks, scored by Attack Success Rate.

Related Comparisons

Qwen3 VL 235B A22B Instruct vs Ministral 3 14B Reasoning 2512

Qwen3 VL 235B A22B Instruct vs Qwen QwQ 32B

Qwen3 VL 235B A22B Instruct vs MiniMax M2

Qwen3 VL 235B A22B Instruct vs GLM-4.5

Qwen3 VL 235B A22B Instruct vs Llama 3.1 Nemotron Ultra 253B v1

Qwen3 VL 235B A22B Instruct vs DeepSeek-R1-0528