Back to List

Prompt Defense

prompt-defense

Monitors user content using modern-guard

Type

Policy

Policy ID

prompt-defense

Target

user

Fail Fast

True

Detectors

modern-guard
Threshold: L1

Override response

Sorry i can't assist with that

Payload example

By defining a policy, you can set a target, combine multiple detectors, and fine-tune.

policy = {
  "id": "prompt-defense",
  "definition": "prompt-defense",
  "target": "user",
  "override_response": "Sorry i can't assist with that",
  "detectors": [
    {
      "expected": "blocked",
      "model": "modern-guard",
      "threshold": 0.5
    },
    {
      "expected": "blocked",
      "model": "suspicious-session",
      "threshold": 0.5
    },
    {
      "expected": "blocked",
      "model": "regex",
      "reference": "<[^>]+>"
    },
    {
      "expected": "blocked",
      "model": "string-match",
      "reference": [
        "instructions", 
        "debugging", 
        "system prompt"
      ]
    }
  ]
};

payload = {
  "message": messages,
  "policies": [policy],
  "enabled_policies": ["prompt-defense"]
};

Tags

PromptDefense
Security
Message 1

Enabled Policies

Results

No Threat

Low Risk
Threat Type: prompt-attack
Threat Confidence: 0.001%

Policy Results:

  • prompt-defense: No threat detected (Threat Confidence: 0.001%)