prompt-attack
Monitors user content using modern-guard, suspicious-session, regex, string-match
Policy
prompt-attack
user
True
By defining a policy, you can set a target, combine multiple detectors, and fine-tune.
policy = {
"id": "prompt-attack",
"definition": "prompt-attack",
"target": "user",
"override_response": "Sorry i can't assist with that",
"detectors": [
{
"expected": "blocked",
"model": "modern-guard",
"threshold": 0.5
},
{
"expected": "blocked",
"model": "suspicious-session",
"threshold": 0.5
},
{
"expected": "blocked",
"model": "regex",
"reference": "<[^>]+>"
},
{
"expected": "blocked",
"model": "string-match",
"reference": [
"instructions",
"debugging",
"system prompt"
]
}
]
};
payload = {
"message": messages,
"policies": [policy],
"enabled_policies": ["prompt-attack"]
};