Skip to main content

    Guardrails — PII Shield and Sentinel

    ⚠️ Heads up — the curl examples below pass BEHEST_KEY directly as the Bearer on /v1/projects/* admin endpoints. That pattern requires a JWT now. Follow the mint-first flow in the updated migration guide to get a JWT, then substitute it for $BEHEST_KEY in the curl snippets on this page. PII Shield / Sentinel configuration model and headers are unchanged.

    Behest provides two guardrails that run at the API gateway level before any request reaches an LLM provider: PII Shield for personal data protection, and Sentinel for prompt injection and content policy enforcement. Both are configured per project and operate independently.


    Architecture

    Guardrails run in the LiteLLM proxy layer, inside two callback hooks:

    Your app
       |
       v
    Kong Gateway
       | (validates JWT, resolves project, reads config from Redis)
       v
    LiteLLM Proxy
       |-- PIIShieldCallback.async_pre_call_hook()   <- scans and transforms request
       |-- SentinelCallback.async_pre_call_hook()    <- scans and optionally blocks
       |
       v
    LLM Provider (OpenAI, Anthropic, Google, etc.)
       |
       v
    LiteLLM Proxy
       |-- PIIShieldCallback.async_log_success_event() <- restores masked tokens
       v
    Your app (response with original values restored for MASK actions)
    

    Configuration is read from Redis on every request (cached with a 30-second TTL). Settings are written to Redis when you click Deploy in the dashboard, or when you call the deploy endpoint via the API.


    PII Shield

    PII Shield uses Microsoft Presidio (NER + regex hybrid) to detect personal information in user messages before they reach any LLM provider.

    Modes

    ModeBehavior
    disabledNo scanning. Default for new projects.
    shadowScans every request and logs detections to the guardrails:events stream. Does not modify requests or block traffic. Use this to measure detection rates before enforcing.
    enforceScans every request and applies the configured action (MASK, REDACT, or BLOCK) for each detected entity type.

    Entity types

    EntityCategoryExample
    EMAIL_ADDRESSContactjohn@example.com
    PHONE_NUMBERContact(555) 123-4567
    PERSONIdentityPerson names (via NER)
    US_SSNIdentity123-45-6789
    LOCATIONIdentityAddresses and location names (via NER)
    DATE_TIMEIdentityDate and time expressions
    CREDIT_CARDFinancial4111-1111-1111-1111
    US_BANK_NUMBERFinancialUS bank account numbers
    IBAN_CODEFinancialInternational bank account numbers
    IP_ADDRESSTechnical192.168.1.1

    Presidio uses a confidence threshold of 0.7. Only entities present in your pii_entities map are scanned — entities not listed are never flagged.

    Actions per entity

    ActionBehavior
    MASKReversible tokenization. The entity is replaced with a deterministic token (e.g., <EMAIL_ADDRESS_a3f8c2d1b4e9>). The LLM never sees the original value. After the LLM responds, Behest restores the original value in the response text. The vault TTL is 5 minutes.
    REDACTPermanent replacement. The entity is replaced with <EMAIL_ADDRESS>. The original value is discarded and cannot be recovered.
    BLOCKThe entire request is rejected with a 403 error if this entity type is detected. In shadow mode, detections are logged but requests are not blocked.

    How MASK works

    For reversible masking, Behest creates a short-lived vault entry in Redis keyed by pii_vault:{tenantId}:{projectId}:{requestId} with a 5-minute TTL. After the LLM responds, the post-call hook restores all masked tokens from the vault, so your app receives the response with the original values intact. The LLM processes anonymized content throughout.


    Sentinel

    Sentinel guards against prompt injection attempts and custom content policy violations.

    Modes

    ModeBehavior
    disabledNo scanning. Default for new projects.
    shadowScans requests and logs detections. Requests pass through even when a pattern matches. Use this to tune your blocklist before enforcing.
    enforceBlocks any request that matches a jailbreak pattern or contains a blocklisted term. Returns 403.

    What Sentinel detects

    Jailbreak patterns (built-in, cannot be disabled individually):

    • "Ignore all previous instructions" and variations
    • DAN / "act as an unrestricted AI" patterns
    • Fake [SYSTEM] or <<SYS>> injection in user input
    • "Bypass your safety/content filters" variants
    • "Disregard/override your system prompt/guidelines" variants

    Custom blocklist: An array of strings you define. Matching is case-insensitive. Maximum 200 terms.

    Sentinel scans only user messages (role: "user"). System prompts are never scanned — they are set by your application and are trusted.


    Configuring Guardrails in the Dashboard

    1. Open your project from the Projects page.
    2. Navigate to Configuration (the canvas/config tab for your project).
    3. Find the Guardrails section.
    4. Toggle the PII Shield and Sentinel modes.
    5. For PII Shield: select which entities to scan and choose an action (MASK, REDACT, BLOCK) for each.
    6. For Sentinel: add custom blocklist terms.
    7. Click Save to store the draft.
    8. Click Deploy to push the configuration to the gateway. Changes take effect within 30 seconds.

    Important: Saving updates the database. Deploying pushes the settings to Redis and activates them at the gateway. Both steps are required for changes to take effect.


    Configuring Guardrails via API

    All guardrail settings are part of the project settings resource.

    Read current settings

    bash
    curl -X GET https://api.behest.app/v1/projects/$PROJECT_ID/settings \
      -H "Authorization: Bearer $TOKEN" \
      | jq '{pii_mode, pii_entities, sentinel_mode, sentinel_blocklist}'

    Enable PII Shield in shadow mode

    bash
    curl -X PUT https://api.behest.app/v1/projects/$PROJECT_ID/settings \
      -H "Authorization: Bearer $TOKEN" \
      -H "Content-Type: application/json" \
      -d '{
        "pii_mode": "shadow",
        "pii_entities": {
          "EMAIL_ADDRESS": "MASK",
          "PHONE_NUMBER": "MASK",
          "CREDIT_CARD": "BLOCK",
          "US_SSN": "REDACT"
        }
      }'

    Enable PII Shield in enforce mode

    bash
    curl -X PUT https://api.behest.app/v1/projects/$PROJECT_ID/settings \
      -H "Authorization: Bearer $TOKEN" \
      -H "Content-Type: application/json" \
      -d '{
        "pii_mode": "enforce",
        "pii_entities": {
          "EMAIL_ADDRESS": "MASK",
          "PHONE_NUMBER": "MASK",
          "PERSON": "MASK",
          "CREDIT_CARD": "BLOCK",
          "US_SSN": "REDACT",
          "IBAN_CODE": "BLOCK",
          "IP_ADDRESS": "MASK"
        }
      }'

    Configure Sentinel with a custom blocklist

    bash
    curl -X PUT https://api.behest.app/v1/projects/$PROJECT_ID/settings \
      -H "Authorization: Bearer $TOKEN" \
      -H "Content-Type: application/json" \
      -d '{
        "sentinel_mode": "shadow",
        "sentinel_blocklist": [
          "ignore previous instructions",
          "competitor-product-name",
          "internal-project-codename"
        ]
      }'

    Deploy settings to activate

    bash
    curl -X POST https://api.behest.app/v1/projects/$PROJECT_ID/settings/deploy \
      -H "Authorization: Bearer $TOKEN"

    Enable both guardrails in one call

    bash
    curl -X PUT https://api.behest.app/v1/projects/$PROJECT_ID/settings \
      -H "Authorization: Bearer $TOKEN" \
      -H "Content-Type: application/json" \
      -d '{
        "pii_mode": "enforce",
        "pii_entities": {
          "EMAIL_ADDRESS": "MASK",
          "CREDIT_CARD": "BLOCK"
        },
        "sentinel_mode": "enforce",
        "sentinel_blocklist": ["secret-project"]
      }' && \
    curl -X POST https://api.behest.app/v1/projects/$PROJECT_ID/settings/deploy \
      -H "Authorization: Bearer $TOKEN"

    Viewing Guardrail Events

    Guardrail detections and blocks are logged to a Redis stream (guardrails:events) and accessible via the dashboard and API.

    Via the dashboard

    Navigate to your project's Logs or Analytics page. The guardrails event feed shows recent detections with their type, mode, entity types, and action taken.

    Via the API

    bash
    # All events for a project
    curl "https://api.behest.app/v1/tenants/$TENANT_ID/guardrails/events?project_id=$PROJECT_ID" \
      -H "Authorization: Bearer $TOKEN"
     
    # Filter by event type
    curl "https://api.behest.app/v1/tenants/$TENANT_ID/guardrails/events?type=pii_detection&project_id=$PROJECT_ID" \
      -H "Authorization: Bearer $TOKEN"
     
    # Filter by type: pii_block, pii_detection, sentinel_jailbreak, sentinel_blocklist
    curl "https://api.behest.app/v1/tenants/$TENANT_ID/guardrails/events?type=sentinel_jailbreak" \
      -H "Authorization: Bearer $TOKEN"

    Event structure

    Each event contains:

    json
    {
      "type": "pii_detection",
      "tenant_id": "...",
      "project_id": "...",
      "direction": "input",
      "mode": "enforce",
      "action_taken": "masked",
      "entity_types": ["EMAIL_ADDRESS", "PHONE_NUMBER"],
      "entity_count": "3",
      "request_id": "req_abc123",
      "timestamp": "1743024000.123"
    }

    Using Shadow Mode for Testing Before Enforcing

    Shadow mode is the recommended workflow when enabling guardrails on an existing project:

    1. Set pii_mode to "shadow" and deploy.
    2. Run your normal workload for 24-48 hours.
    3. Review the guardrail events to see which entity types are being detected and how frequently.
    4. Tune your pii_entities map: adjust which entities to monitor and their actions.
    5. Once comfortable, switch pii_mode to "enforce" and deploy.

    The same workflow applies to Sentinel. Use shadow mode to discover which blocklist terms appear naturally in legitimate traffic before blocking on them.


    Error Responses When Blocked

    When PII Shield blocks a request (entity action BLOCK in enforce mode):

    json
    {
      "error": {
        "message": "Request blocked: detected PII types ['CREDIT_CARD']",
        "type": "guardrail_error",
        "code": "BEHEST_PII_BLOCKED"
      }
    }

    HTTP status: 403

    When Sentinel blocks a request:

    json
    {
      "error": {
        "message": "Request blocked: potential prompt injection detected",
        "type": "guardrail_error",
        "code": "BEHEST_CONTENT_BLOCKED"
      }
    }

    HTTP status: 403


    Best Practices

    Start with shadow mode. Never enable enforce mode on day one. Run shadow mode for at least 24 hours to understand your traffic before activating blocking behavior.

    MASK instead of BLOCK for most entities. BLOCK is appropriate for data your application should never send to any LLM (SSNs, credit card numbers). MASK is better for names and emails where you want the LLM to reference the person naturally in its response.

    Use REDACT for data you never want in logs. REDACT permanently discards the original value — it cannot be restored. Use this when compliance requires that raw PII never reach the LLM layer even transiently.

    Keep the Sentinel blocklist focused. Avoid blocking common words that may appear in legitimate requests. Shadow mode helps you validate that your blocklist terms are specific enough not to generate false positives.

    Deploy after every settings change. The PUT settings endpoint updates the database. Only the deploy endpoint pushes changes to Redis and activates them at the gateway.

    Default state for new projects: Both guardrails start as disabled with empty entity and blocklist configuration. Guardrails must be explicitly configured and deployed to take effect.

    Enterprise Token FinOps: Enforce hard budgets and attribute costs per session.

    Learn more