Guardrails — PII Shield and Sentinel

⚠️ Heads up — the curl examples below pass BEHEST_KEY directly as the Bearer on /v1/projects/* admin endpoints. That pattern requires a JWT now. Follow the mint-first flow in the updated migration guide to get a JWT, then substitute it for $BEHEST_KEY in the curl snippets on this page. PII Shield / Sentinel configuration model and headers are unchanged.

Behest provides two guardrails that run at the API gateway level before any request reaches an LLM provider: PII Shield for personal data protection, and Sentinel for prompt injection and content policy enforcement. Both are configured per project and operate independently.

Architecture

Guardrails run in the LiteLLM proxy layer, inside two callback hooks:

Your app
   |
   v
Kong Gateway
   | (validates JWT, resolves project, reads config from Redis)
   v
LiteLLM Proxy
   |-- PIIShieldCallback.async_pre_call_hook()   <- scans and transforms request
   |-- SentinelCallback.async_pre_call_hook()    <- scans and optionally blocks
   |
   v
LLM Provider (OpenAI, Anthropic, Google, etc.)
   |
   v
LiteLLM Proxy
   |-- PIIShieldCallback.async_log_success_event() <- restores masked tokens
   v
Your app (response with original values restored for MASK actions)

Configuration is read from Redis on every request (cached with a 30-second TTL). Settings are written to Redis when you click Deploy in the dashboard, or when you call the deploy endpoint via the API.

PII Shield

PII Shield uses Microsoft Presidio (NER + regex hybrid) to detect personal information in user messages before they reach any LLM provider.

Modes

Mode	Behavior
`disabled`	No scanning. Default for new projects.
`shadow`	Scans every request and logs detections to the `guardrails:events` stream. Does not modify requests or block traffic. Use this to measure detection rates before enforcing.
`enforce`	Scans every request and applies the configured action (MASK, REDACT, or BLOCK) for each detected entity type.

Entity types

Entity	Category	Example
`EMAIL_ADDRESS`	Contact	`john@example.com`
`PHONE_NUMBER`	Contact	`(555) 123-4567`
`PERSON`	Identity	Person names (via NER)
`US_SSN`	Identity	`123-45-6789`
`LOCATION`	Identity	Addresses and location names (via NER)
`DATE_TIME`	Identity	Date and time expressions
`CREDIT_CARD`	Financial	`4111-1111-1111-1111`
`US_BANK_NUMBER`	Financial	US bank account numbers
`IBAN_CODE`	Financial	International bank account numbers
`IP_ADDRESS`	Technical	`192.168.1.1`

Presidio uses a confidence threshold of 0.7. Only entities present in your pii_entities map are scanned — entities not listed are never flagged.

Actions per entity

Action	Behavior
`MASK`	Reversible tokenization. The entity is replaced with a deterministic token (e.g., `<EMAIL_ADDRESS_a3f8c2d1b4e9>`). The LLM never sees the original value. After the LLM responds, Behest restores the original value in the response text. The vault TTL is 5 minutes.
`REDACT`	Permanent replacement. The entity is replaced with `<EMAIL_ADDRESS>`. The original value is discarded and cannot be recovered.
`BLOCK`	The entire request is rejected with a `403` error if this entity type is detected. In `shadow` mode, detections are logged but requests are not blocked.

How MASK works

For reversible masking, Behest creates a short-lived vault entry in Redis keyed by pii_vault:{tenantId}:{projectId}:{requestId} with a 5-minute TTL. After the LLM responds, the post-call hook restores all masked tokens from the vault, so your app receives the response with the original values intact. The LLM processes anonymized content throughout.

Sentinel

Sentinel guards against prompt injection attempts and custom content policy violations.

Modes

Mode	Behavior
`disabled`	No scanning. Default for new projects.
`shadow`	Scans requests and logs detections. Requests pass through even when a pattern matches. Use this to tune your blocklist before enforcing.
`enforce`	Blocks any request that matches a jailbreak pattern or contains a blocklisted term. Returns `403`.

What Sentinel detects

Jailbreak patterns (built-in, cannot be disabled individually):

"Ignore all previous instructions" and variations
DAN / "act as an unrestricted AI" patterns
Fake [SYSTEM] or <<SYS>> injection in user input
"Bypass your safety/content filters" variants
"Disregard/override your system prompt/guidelines" variants

Custom blocklist: An array of strings you define. Matching is case-insensitive. Maximum 200 terms.

Sentinel scans only user messages (role: "user"). System prompts are never scanned — they are set by your application and are trusted.

Configuring Guardrails in the Dashboard

Open your project from the Projects page.
Navigate to Configuration (the canvas/config tab for your project).
Find the Guardrails section.
Toggle the PII Shield and Sentinel modes.
For PII Shield: select which entities to scan and choose an action (MASK, REDACT, BLOCK) for each.
For Sentinel: add custom blocklist terms.
Click Save to store the draft.
Click Deploy to push the configuration to the gateway. Changes take effect within 30 seconds.

Important: Saving updates the database. Deploying pushes the settings to Redis and activates them at the gateway. Both steps are required for changes to take effect.

Configuring Guardrails via API

All guardrail settings are part of the project settings resource.

Read current settings

bash

curl -X GET https://api.behest.app/v1/projects/$PROJECT_ID/settings \
  -H "Authorization: Bearer $TOKEN" \
  | jq '{pii_mode, pii_entities, sentinel_mode, sentinel_blocklist}'

Enable PII Shield in shadow mode

bash

curl -X PUT https://api.behest.app/v1/projects/$PROJECT_ID/settings \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "pii_mode": "shadow",
    "pii_entities": {
      "EMAIL_ADDRESS": "MASK",
      "PHONE_NUMBER": "MASK",
      "CREDIT_CARD": "BLOCK",
      "US_SSN": "REDACT"
    }
  }'

Enable PII Shield in enforce mode

bash

curl -X PUT https://api.behest.app/v1/projects/$PROJECT_ID/settings \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "pii_mode": "enforce",
    "pii_entities": {
      "EMAIL_ADDRESS": "MASK",
      "PHONE_NUMBER": "MASK",
      "PERSON": "MASK",
      "CREDIT_CARD": "BLOCK",
      "US_SSN": "REDACT",
      "IBAN_CODE": "BLOCK",
      "IP_ADDRESS": "MASK"
    }
  }'

Configure Sentinel with a custom blocklist

bash

curl -X PUT https://api.behest.app/v1/projects/$PROJECT_ID/settings \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "sentinel_mode": "shadow",
    "sentinel_blocklist": [
      "ignore previous instructions",
      "competitor-product-name",
      "internal-project-codename"
    ]
  }'

Deploy settings to activate

bash

curl -X POST https://api.behest.app/v1/projects/$PROJECT_ID/settings/deploy \
  -H "Authorization: Bearer $TOKEN"

Enable both guardrails in one call

bash

curl -X PUT https://api.behest.app/v1/projects/$PROJECT_ID/settings \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "pii_mode": "enforce",
    "pii_entities": {
      "EMAIL_ADDRESS": "MASK",
      "CREDIT_CARD": "BLOCK"
    },
    "sentinel_mode": "enforce",
    "sentinel_blocklist": ["secret-project"]
  }' && \
curl -X POST https://api.behest.app/v1/projects/$PROJECT_ID/settings/deploy \
  -H "Authorization: Bearer $TOKEN"

Viewing Guardrail Events

Guardrail detections and blocks are logged to a Redis stream (guardrails:events) and accessible via the dashboard and API.

Via the dashboard

Navigate to your project's Logs or Analytics page. The guardrails event feed shows recent detections with their type, mode, entity types, and action taken.

Via the API

bash

# All events for a project
curl "https://api.behest.app/v1/tenants/$TENANT_ID/guardrails/events?project_id=$PROJECT_ID" \
  -H "Authorization: Bearer $TOKEN"
 
# Filter by event type
curl "https://api.behest.app/v1/tenants/$TENANT_ID/guardrails/events?type=pii_detection&project_id=$PROJECT_ID" \
  -H "Authorization: Bearer $TOKEN"
 
# Filter by type: pii_block, pii_detection, sentinel_jailbreak, sentinel_blocklist
curl "https://api.behest.app/v1/tenants/$TENANT_ID/guardrails/events?type=sentinel_jailbreak" \
  -H "Authorization: Bearer $TOKEN"

Event structure

Each event contains:

json

{
  "type": "pii_detection",
  "tenant_id": "...",
  "project_id": "...",
  "direction": "input",
  "mode": "enforce",
  "action_taken": "masked",
  "entity_types": ["EMAIL_ADDRESS", "PHONE_NUMBER"],
  "entity_count": "3",
  "request_id": "req_abc123",
  "timestamp": "1743024000.123"
}

Using Shadow Mode for Testing Before Enforcing

Shadow mode is the recommended workflow when enabling guardrails on an existing project:

Set pii_mode to "shadow" and deploy.
Run your normal workload for 24-48 hours.
Review the guardrail events to see which entity types are being detected and how frequently.
Tune your pii_entities map: adjust which entities to monitor and their actions.
Once comfortable, switch pii_mode to "enforce" and deploy.

The same workflow applies to Sentinel. Use shadow mode to discover which blocklist terms appear naturally in legitimate traffic before blocking on them.

Error Responses When Blocked

When PII Shield blocks a request (entity action BLOCK in enforce mode):

json

{
  "error": {
    "message": "Request blocked: detected PII types ['CREDIT_CARD']",
    "type": "guardrail_error",
    "code": "BEHEST_PII_BLOCKED"
  }
}

HTTP status: 403

When Sentinel blocks a request:

json

{
  "error": {
    "message": "Request blocked: potential prompt injection detected",
    "type": "guardrail_error",
    "code": "BEHEST_CONTENT_BLOCKED"
  }
}

HTTP status: 403

Best Practices

Start with shadow mode. Never enable enforce mode on day one. Run shadow mode for at least 24 hours to understand your traffic before activating blocking behavior.

MASK instead of BLOCK for most entities. BLOCK is appropriate for data your application should never send to any LLM (SSNs, credit card numbers). MASK is better for names and emails where you want the LLM to reference the person naturally in its response.

Use REDACT for data you never want in logs. REDACT permanently discards the original value — it cannot be restored. Use this when compliance requires that raw PII never reach the LLM layer even transiently.

Keep the Sentinel blocklist focused. Avoid blocking common words that may appear in legitimate requests. Shadow mode helps you validate that your blocklist terms are specific enough not to generate false positives.

Deploy after every settings change. The PUT settings endpoint updates the database. Only the deploy endpoint pushes changes to Redis and activates them at the gateway.

Default state for new projects: Both guardrails start as disabled with empty entity and blocklist configuration. Guardrails must be explicitly configured and deployed to take effect.