Skip to main content

    Bring Your Own Key (BYOK)

    What Is BYOK?

    BYOK (Bring Your Own Key) means you supply your own API keys for OpenAI, Anthropic, Google, Mistral, Cohere, or OpenRouter. When your application makes an inference request through Behest, your key is used to call the provider directly. Behest never acts as an intermediary on the token billing side — you pay the provider at their published rates.

    Why BYOK matters:

    • Zero LLM markup — Behest charges a flat subscription fee. Your LLM tokens cost exactly what the provider charges, nothing more.
    • Your rate limits — requests run against your provider account's rate limits, not shared pool limits.
    • Your billing relationship — usage appears in your provider account for cost attribution, chargebacks, and auditing.
    • Model flexibility — access any model in your provider account, not just a curated subset.
    • Data handling terms — your data is subject to your provider agreement, not Behest's.

    BYOK is available on Pro, Business+, and Enterprise plans.


    Architectural Overview

    Your App
      │
      │  POST /v1/chat/completions
      │  Authorization: Bearer <Behest JWT>
      │  Body: { model: "gpt-4o", messages: [...] }
      │
      ▼
    Kong Gateway
      │  Validates Behest JWT (RS256)
      │  Enforces RPM rate limits (Redis INCR)
      │  Checks kill switches
      │  Injects X-Tenant-Id, X-Project-Id headers
      │
      ▼
    LiteLLM (custom_auth.py — auth_from_headers)
      │
      │  1. Read config:{pid}:provider_model from Redis
      │     → "gpt-4o"
      │
      │  2. Infer provider from model ID
      │     → "openai" (from routing:model_to_provider Redis hash
      │        or fallback hardcoded mapping)
      │
      │  3. Read provider:{tid}:{providerType}:api_key_enc from Redis
      │     → AES-256-GCM ciphertext
      │
      │  4. decrypt_provider_key(ciphertext)
      │     → plaintext key (exists only in this narrow scope)
      │
      │  5. Inject { api_key: plaintext } into LiteLLM params
      │     del plaintext  ← immediately deleted
      │
      ▼
    OpenAI API (or Anthropic, Google, etc.)
      Authorization: Bearer <your-key>
    

    The decrypted key lives in memory for the duration of the auth_from_headers call only. It is passed to LiteLLM's pre-call hook via metadata["_byoak_inject"], which pops it before the metadata is persisted to usage logs. It is never written to a log file, database, span attribute, or response body.


    Security Model

    Encryption

    Keys are encrypted with AES-256-GCM using a 32-byte key loaded from the PROVIDER_ENCRYPTION_KEY environment variable. The service fails closed at startup if this variable is missing, empty, or not a valid 64-character hex string.

    Ciphertext format:

    {iv_hex}:{ciphertext_hex}:{auth_tag_hex}
    
    • IV: 12 bytes (96 bits), randomly generated per key using the OS CSPRNG (crypto.randomBytes). IVs are never reused.
    • Auth tag: 16 bytes (128 bits). GCM's authentication tag detects any ciphertext tampering — decryption throws if the tag does not match.
    • Key length: 32 bytes (256-bit AES). Loaded from PROVIDER_ENCRYPTION_KEY (64 hex chars).

    In production (GCP/GKE), PROVIDER_ENCRYPTION_KEY is stored in GCP Secret Manager and injected as a Kubernetes secret. It is never in source code or in the container image.

    Tenant Isolation

    Redis key format: provider:{tenantId}:{providerType}:api_key_enc

    The tenantId is a UUID validated by a strict regex (/^[0-9a-f]{8}-[0-9a-f]{4}-...$/) before any Redis lookup. A crafted non-UUID value is rejected with HTTP 400 before Redis is touched, preventing cross-tenant key injection.

    Kong validates the Behest JWT (RS256) before the request reaches LiteLLM, extracting a verified tid claim. Client-supplied X-Tenant-Id headers are stripped by Kong — only Kong-injected headers are trusted.

    Key Visibility

    • The api_key_enc column is never returned in any API response (enforced by explicit field selection — api_key_enc is excluded from every SELECT)
    • Only key_last4 (last 4 characters) is returned to the UI for identification
    • The PUT request body field api_key is immediately deleted from the parsed body object after extraction to prevent it from appearing in request logs

    Production Secret Management

    In GKE production, the following secrets are managed via GCP Secret Manager and mounted as Kubernetes Secrets:

    • PROVIDER_ENCRYPTION_KEY — AES-256-GCM key for provider key encryption
    • Provider keys themselves never appear in environment variables — they are stored encrypted in PostgreSQL and cached encrypted in Redis

    Provider Key Lifecycle

    Adding a Key

    When you submit a key via PUT /v1/tenants/:tenantId/providers/:providerType:

    1. Format validation — key is checked against the provider's regex (e.g., OpenAI keys must match sk-(proj-|svcacct-)?[A-Za-z0-9_-]{20,})
    2. Live validation — a real API call to the provider (5s timeout) confirms the key is active
    3. Encryption — encryptProviderKey() generates a fresh IV, encrypts with AES-256-GCM
    4. Database upsert — stored in tenant_provider_keys with ON CONFLICT DO UPDATE (rotation = same operation as initial add)
    5. Redis write — provider:{tid}:{providerType}:api_key_enc set with a 24-hour TTL (refreshed hourly by redis-sync-worker)
    6. Redis membership — SADD provider:{tid}:enabled_providers {providerType}
    7. Model discovery — fire-and-forget background job calls the provider's model list endpoint and populates tenant_provider_models

    The key takes effect immediately (no deploy step required). Provider keys are tenant-level, not project-level.

    Rotating a Key

    Send a new PUT with the new key. The upsert replaces the existing ciphertext and key_last4. Redis is updated atomically. The old key is gone from Behest's storage the moment the write completes.

    Revoking a Key

    DELETE /v1/tenants/:tenantId/providers/:providerType — removes the database row and atomically deletes both Redis keys (api_key_enc and api_base) from a pipeline. Takes effect immediately.

    Projects configured to use this provider's models silently fall back to gemini-2.5-flash (platform default) on the next request. Their provider_model database setting is preserved.


    What Happens When a Key Expires or Is Invalid

    At request time, if custom_auth.py loads the ciphertext from Redis and decryption fails (tampered ciphertext, wrong key), it returns HTTP 500 to the client without revealing any key material. The error is logged with the project ID and provider type.

    If the provider rejects the key (e.g., you revoked it at the provider's dashboard), LiteLLM receives a 401 from the provider API and returns an appropriate error to your application. The stale key remains in Behest's storage until you rotate or remove it.

    If Redis has no ciphertext (e.g., Redis restarted before the hourly redis-sync-worker refresh), custom_auth.py falls back to the platform default instead of returning an error. The redis-sync-worker restores all provider key ciphertexts from PostgreSQL within its next cycle (up to 1 hour + 0–10% jitter).

    Clear error for misconfigured projects: If a project is configured for a model but no key exists for that provider, custom_auth.py raises:

    json
    { "detail": "Configure your openai API key in Provider Settings to use gpt-4o" }

    HTTP 400, never a silent fallback in this case.


    Testing Before Deploying

    The test-token flow lets you validate BYOK end-to-end before affecting production:

    http
    POST /v1/projects/:projectId/settings/test-token
    Authorization: Bearer <service-JWT>

    This returns a short-lived JWT (5-minute TTL). Requests using this JWT include X-Behest-Draft-Mode: 1, which tells custom_auth.py to read draft:config:{pid}:provider_model (your draft model selection) rather than config:{pid}:provider_model (the deployed model).

    Draft keys are written with a 300-second TTL and never overwrite production keys. After 5 minutes they expire automatically. Use this to confirm:

    • Your provider key decrypts successfully
    • The correct model is selected
    • Responses look as expected before deploying

    Supported Key Formats Per Provider

    ProviderFormatExample prefixLengthLive validation endpoint
    OpenAIsk-(proj-|svcacct-)?[A-Za-z0-9_-]{20,}sk-proj-...variableGET api.openai.com/v1/models
    Anthropicsk-ant-[A-Za-z0-9_-]{20,}sk-ant-api03-...variableGET api.anthropic.com/v1/models
    GoogleAIza[A-Za-z0-9_-]{35}AIzaSy...exactly 39 charsGET generativelanguage.googleapis.com/v1beta/models?key=...
    OpenRoutersk-or-v1-[a-f0-9]{64}sk-or-v1-...exactly 77 charsGET openrouter.ai/api/v1/auth/key
    Mistralany 10+ charsvariesvariableGET api.mistral.ai/v1/models
    Cohereany 10+ charsvariesvariableGET api.cohere.com/v1/models

    Format validation is performed as a regex pre-check before the live API call. If the format doesn't match, Behest returns 400 INVALID_KEY_FORMAT without making a network request to the provider.

    Enterprise Token FinOps: Enforce hard budgets and attribute costs per session.

    Learn more