Bring Your Own Key (BYOK)

What Is BYOK?

BYOK (Bring Your Own Key) means you supply your own API keys for OpenAI, Anthropic, Google, Mistral, Cohere, or OpenRouter. When your application makes an inference request through Behest, your key is used to call the provider directly. Behest never acts as an intermediary on the token billing side — you pay the provider at their published rates.

Why BYOK matters:

Zero LLM markup — Behest charges a flat subscription fee. Your LLM tokens cost exactly what the provider charges, nothing more.
Your rate limits — requests run against your provider account's rate limits, not shared pool limits.
Your billing relationship — usage appears in your provider account for cost attribution, chargebacks, and auditing.
Model flexibility — access any model in your provider account, not just a curated subset.
Data handling terms — your data is subject to your provider agreement, not Behest's.

BYOK is available on Pro, Business+, and Enterprise plans.

Architectural Overview

Your App
  │
  │  POST /v1/chat/completions
  │  Authorization: Bearer <Behest JWT>
  │  Body: { model: "gpt-4o", messages: [...] }
  │
  ▼
Kong Gateway
  │  Validates Behest JWT (RS256)
  │  Enforces RPM rate limits (Redis INCR)
  │  Checks kill switches
  │  Injects X-Tenant-Id, X-Project-Id headers
  │
  ▼
LiteLLM (custom_auth.py — auth_from_headers)
  │
  │  1. Read config:{pid}:provider_model from Redis
  │     → "gpt-4o"
  │
  │  2. Infer provider from model ID
  │     → "openai" (from routing:model_to_provider Redis hash
  │        or fallback hardcoded mapping)
  │
  │  3. Read provider:{tid}:{providerType}:api_key_enc from Redis
  │     → AES-256-GCM ciphertext
  │
  │  4. decrypt_provider_key(ciphertext)
  │     → plaintext key (exists only in this narrow scope)
  │
  │  5. Inject { api_key: plaintext } into LiteLLM params
  │     del plaintext  ← immediately deleted
  │
  ▼
OpenAI API (or Anthropic, Google, etc.)
  Authorization: Bearer <your-key>

The decrypted key lives in memory for the duration of the auth_from_headers call only. It is passed to LiteLLM's pre-call hook via metadata["_byoak_inject"], which pops it before the metadata is persisted to usage logs. It is never written to a log file, database, span attribute, or response body.

Security Model

Encryption

Keys are encrypted with AES-256-GCM using a 32-byte key loaded from the PROVIDER_ENCRYPTION_KEY environment variable. The service fails closed at startup if this variable is missing, empty, or not a valid 64-character hex string.

Ciphertext format:

{iv_hex}:{ciphertext_hex}:{auth_tag_hex}

IV: 12 bytes (96 bits), randomly generated per key using the OS CSPRNG (crypto.randomBytes). IVs are never reused.
Auth tag: 16 bytes (128 bits). GCM's authentication tag detects any ciphertext tampering — decryption throws if the tag does not match.
Key length: 32 bytes (256-bit AES). Loaded from PROVIDER_ENCRYPTION_KEY (64 hex chars).

In production (GCP/GKE), PROVIDER_ENCRYPTION_KEY is stored in GCP Secret Manager and injected as a Kubernetes secret. It is never in source code or in the container image.

Tenant Isolation

Redis key format: provider:{tenantId}:{providerType}:api_key_enc

The tenantId is a UUID validated by a strict regex (/^[0-9a-f]{8}-[0-9a-f]{4}-...$/) before any Redis lookup. A crafted non-UUID value is rejected with HTTP 400 before Redis is touched, preventing cross-tenant key injection.

Kong validates the Behest JWT (RS256) before the request reaches LiteLLM, extracting a verified tid claim. Client-supplied X-Tenant-Id headers are stripped by Kong — only Kong-injected headers are trusted.

Key Visibility

The api_key_enc column is never returned in any API response (enforced by explicit field selection — api_key_enc is excluded from every SELECT)
Only key_last4 (last 4 characters) is returned to the UI for identification
The PUT request body field api_key is immediately deleted from the parsed body object after extraction to prevent it from appearing in request logs

Production Secret Management

In GKE production, the following secrets are managed via GCP Secret Manager and mounted as Kubernetes Secrets:

PROVIDER_ENCRYPTION_KEY — AES-256-GCM key for provider key encryption
Provider keys themselves never appear in environment variables — they are stored encrypted in PostgreSQL and cached encrypted in Redis

Provider Key Lifecycle

Adding a Key

When you submit a key via PUT /v1/tenants/:tenantId/providers/:providerType:

Format validation — key is checked against the provider's regex (e.g., OpenAI keys must match sk-(proj-|svcacct-)?[A-Za-z0-9_-]{20,})
Live validation — a real API call to the provider (5s timeout) confirms the key is active
Encryption — encryptProviderKey() generates a fresh IV, encrypts with AES-256-GCM
Database upsert — stored in tenant_provider_keys with ON CONFLICT DO UPDATE (rotation = same operation as initial add)
Redis write — provider:{tid}:{providerType}:api_key_enc set with a 24-hour TTL (refreshed hourly by redis-sync-worker)
Redis membership — SADD provider:{tid}:enabled_providers {providerType}
Model discovery — fire-and-forget background job calls the provider's model list endpoint and populates tenant_provider_models

The key takes effect immediately (no deploy step required). Provider keys are tenant-level, not project-level.

Rotating a Key

Send a new PUT with the new key. The upsert replaces the existing ciphertext and key_last4. Redis is updated atomically. The old key is gone from Behest's storage the moment the write completes.

Revoking a Key

DELETE /v1/tenants/:tenantId/providers/:providerType — removes the database row and atomically deletes both Redis keys (api_key_enc and api_base) from a pipeline. Takes effect immediately.

Projects configured to use this provider's models silently fall back to gemini-2.5-flash (platform default) on the next request. Their provider_model database setting is preserved.

What Happens When a Key Expires or Is Invalid

At request time, if custom_auth.py loads the ciphertext from Redis and decryption fails (tampered ciphertext, wrong key), it returns HTTP 500 to the client without revealing any key material. The error is logged with the project ID and provider type.

If the provider rejects the key (e.g., you revoked it at the provider's dashboard), LiteLLM receives a 401 from the provider API and returns an appropriate error to your application. The stale key remains in Behest's storage until you rotate or remove it.

If Redis has no ciphertext (e.g., Redis restarted before the hourly redis-sync-worker refresh), custom_auth.py falls back to the platform default instead of returning an error. The redis-sync-worker restores all provider key ciphertexts from PostgreSQL within its next cycle (up to 1 hour + 0–10% jitter).

Clear error for misconfigured projects: If a project is configured for a model but no key exists for that provider, custom_auth.py raises:

json

{ "detail": "Configure your openai API key in Provider Settings to use gpt-4o" }

HTTP 400, never a silent fallback in this case.

Testing Before Deploying

The test-token flow lets you validate BYOK end-to-end before affecting production:

http

POST /v1/projects/:projectId/settings/test-token
Authorization: Bearer <service-JWT>

This returns a short-lived JWT (5-minute TTL). Requests using this JWT include X-Behest-Draft-Mode: 1, which tells custom_auth.py to read draft:config:{pid}:provider_model (your draft model selection) rather than config:{pid}:provider_model (the deployed model).

Draft keys are written with a 300-second TTL and never overwrite production keys. After 5 minutes they expire automatically. Use this to confirm:

Your provider key decrypts successfully
The correct model is selected
Responses look as expected before deploying

Supported Key Formats Per Provider

Provider	Format	Example prefix	Length	Live validation endpoint
OpenAI	`sk-(proj-\|svcacct-)?[A-Za-z0-9_-]{20,}`	`sk-proj-...`	variable	`GET api.openai.com/v1/models`
Anthropic	`sk-ant-[A-Za-z0-9_-]{20,}`	`sk-ant-api03-...`	variable	`GET api.anthropic.com/v1/models`
Google	`AIza[A-Za-z0-9_-]{35}`	`AIzaSy...`	exactly 39 chars	`GET generativelanguage.googleapis.com/v1beta/models?key=...`
OpenRouter	`sk-or-v1-[a-f0-9]{64}`	`sk-or-v1-...`	exactly 77 chars	`GET openrouter.ai/api/v1/auth/key`
Mistral	any 10+ chars	varies	variable	`GET api.mistral.ai/v1/models`
Cohere	any 10+ chars	varies	variable	`GET api.cohere.com/v1/models`

Format validation is performed as a regex pre-check before the live API call. If the format doesn't match, Behest returns 400 INVALID_KEY_FORMAT without making a network request to the provider.