Get started

FAQ

Common questions about evaluating, enforcing, configuring and exporting with Guardion.

Does a flagged request return an error?

No. Guardion returns 200 with flagged: true (and deny: true when the policy action is block). Branch on the body, not the status code. On the Gateway, the same outcome is also surfaced as an HTTP status — 246 (flagged, not blocked) and 446 (blocked) — so you can enforce without parsing the body.

What's the difference between flagged and deny?

flagged means at least one detector tripped. deny means the matched policy decided to block, so you should stop the request. A monitor/flag policy reports flagged: true with deny: false.

Do I need to send a policy or application?

No. If you omit both, Guardion uses your workspace's default policy. You can target a specific policy by policy ID, or bind via an application slug.

What is a session?

A stable conversation/agent identifier. Passing it enables session-level risk accumulation (drift, repeat-offender scoring) across turns. If omitted, Guardion fingerprints the request.

Guard vs. Gateway — which do I use?

Use Guard (/v1/guard) when you control the LLM call and want to check content before/after. Use the Gateway when you want guardrails applied inline without changing your LLM calling code (drop-in OpenAI base URL).

Where do I configure policies?

Policies are configured in the console (no code) — you don't define them through the API; the API just references them by policy ID or application slug. A policy has two layers: the guardrails it runs (which guard models + detectors, and how sensitive they are), and the decision rules that decide when a request is blocked vs. monitored.

This means security/platform owners can tune enforcement in the console while developers keep calling the same endpoint — no redeploy to change a guardrail or threshold.

How do policy decision rules work?

A policy is a list of rules evaluated in priority order; the first rule whose conditions all match wins. Conditions read from the request and the accumulated session context, so you can block on a single bad message or on a pattern that builds up over a conversation.

Rules can read, among others: session_risk_score (running, risk-weighted), turn_risk_score (this message), total_flagged / total_denied, high_risk_flags, per-severity counts risk_level_counts.critical|high|medium|low, per-label counts label_counts.<LABEL>, plus signals like intent_drift_score and bot_type.

Example rule	Effect
`session_risk_score >= 0.8`	Block once accumulated session risk crosses 0.8.
`risk_level_counts.critical >= 1`	Block on the first critical-severity hit.
`total_flagged >= 5`	Block after 5 flagged turns in the session (repeat offender).
`label_counts.PROMPT_INJECTION >= 3`	Block after three injection attempts in the session.
`intent_drift_score >= 0.8`	Step up / block when the conversation drifts toward a harmful goal.

What are guardrails and detectors?

A guardrail is a calibrated guard model; each guardrail exposes detectors — the specific risks or signals it extracts from a request (e.g. prompt injection, jailbreak, toxicity, PII, tool-poisoning). In the policy you choose which detectors are active, their sensitivity (threshold), and their enforcement mode.

Each detector also has a target role — which part of the exchange it inspects: user/prompt, assistant/response, system, tool_input, tool_output, tool_schema, rag_retrieval, or all. So you can, for example, run injection detection on tool_output and rag_retrieval (the indirect-injection vectors) while running PII on all.

How does PII redaction work?

The PII guardrail can run in two modes. Redact-only anonymizes the matched entities in place and lets the request proceed without flagging it — the content is cleaned (redacted: true) but the turn isn't counted as a violation. Redact-and-flag anonymizes *and* flags (and can block under a block policy).

Redactions are returned via correction and precise redaction_spans. For reversible, vault-backed redaction (restore the originals later) use the DLP endpoints, /v1/pii/redact and /v1/pii/unredact.

Can I export my data?

Yes. Security findings are queryable through the Findings API (GET /v2/findings) — paginated and filterable by time, application, session, status and risk level — so you can pull aggregated threats into your own systems.

Guardion also emits OpenTelemetry spans, so teams typically forward findings into their existing SIEM / observability stack (e.g. Splunk, Datadog, Elastic) for alerting, dashboards and retention alongside the rest of their security telemetry.

How do I manage API keys?

Create, name, disable and rotate keys in the console under Settings → API Keys. Keys can be marked read-only (for export/observability use) or full-access (to run evaluations), and disabled instantly without rotating the others. Keep keys server-side.