Back to List

PromptAttack Policy

prompt-attack

Monitors user content using modern-guard, suspicious-session, regex, string-match

Type

Policy

Policy ID

prompt-attack

Target

user

Fail Fast

True

Detectors

modern-guard
Threshold: 0.5
suspicious-session
Threshold: 0.5
regex
Reference: <[^>]+>
string-match
Reference: instructions, debugging, system prompt

Override response

Sorry i can't assist with that

Payload example

By defining a policy, you can set a target, combine multiple detectors, and fine-tune.

policy = {
  "id": "prompt-attack",
  "definition": "prompt-attack",
  "target": "user",
  "override_response": "Sorry i can't assist with that",
  "detectors": [
    {
      "expected": "blocked",
      "model": "modern-guard",
      "threshold": 0.5
    },
    {
      "expected": "blocked",
      "model": "suspicious-session",
      "threshold": 0.5
    },
    {
      "expected": "blocked",
      "model": "regex",
      "reference": "<[^>]+>"
    },
    {
      "expected": "blocked",
      "model": "string-match",
      "reference": [
        "instructions", 
        "debugging", 
        "system prompt"
      ]
    }
  ]
};

payload = {
  "message": messages,
  "policies": [policy],
  "enabled_policies": ["prompt-attack"]
};

Tags

PromptAttack
Security

Guardion API - Playground

Check out the API reference