Core Concepts

Understanding these six concepts is enough to use Behest effectively. The rest of the docs fill in details.

Tenants

A tenant is your organization. It is the top-level container for all of your Behest resources.

Created once when you sign up
Holds projects, API keys, and BYOK provider credentials
Has a plan field (free, pro, business, enterprise) that gates certain features
plan: "free" cannot add BYOK provider keys and is limited to 3 active projects

Projects

A project is a deployment unit. Each of your AI-powered features (e.g., "support chatbot", "code assistant", "summarization pipeline") should be its own project.

What a project gives you:

A unique slug (e.g., amber-fox-042) used for routing
Two FQDNs for sending inference requests
Its own settings: system prompt, CORS origins, rate limits, token budgets, guardrails
A set of API keys for minting JWTs

Project lifecycle:

dns_status: PENDING → DNS record being provisioned (async, ~30s)
dns_status: READY → accepting inference requests
dns_status: FAILED → provisioning failed; retry via POST /v1/projects/:id/provisioning/retry
status: suspended → all API keys revoked, no new requests accepted

Free plan limit: 3 active (non-suspended) projects.

Slugs and FQDNs

When you create a project, Behest generates a random human-readable slug in the form {adjective}-{noun}-{3 digits}, for example amber-fox-042 or silent-brook-117.

The slug is used to derive two FQDNs:

Environment	Pattern	Example
Development	`{slug}.dev.internal.behest.ai`	`amber-fox-042.dev.internal.behest.ai`
Production	`{slug}.behest.app`	`amber-fox-042.behest.app`

How routing works: When a request arrives at Kong, the behest-tenant-auth plugin parses the slug from the Host header, then does a Redis lookup (slug:{slug} → {project_id, tenant_id}) to resolve the tenant and project. This means requests are routed by hostname, not by a path prefix or an API key claim.

You can also route using the legacy pattern {tenantId}.{projectId}.api.behest.io, but slug-based routing is preferred.

API Keys

An API key is a long-lived secret scoped to a single project. Format: behest_sk_live_{32 hex chars}.

API keys are never sent directly to an LLM. Their sole purpose is to exchange for short-lived JWTs at the /auth/v1/auth/mint endpoint.

Security model:

Stored as an Argon2id hash plus a SHA-256 lookup index — the plaintext is never persisted
Shown exactly once at creation time
Can be rotated (old key revoked, new key issued atomically) or permanently revoked
Each key carries a role (user by default); the minted JWT inherits this role unless overridden at mint time

Key management:

Multiple keys per project are supported
Keys are soft-deleted via revoked_at timestamp; once revoked they cannot be un-revoked, only rotated or replaced

JWTs

A JWT (JSON Web Token) is a short-lived credential used to authenticate a single user session against the inference endpoint.

Minting: Call POST /auth/v1/auth/mint with your API key in the Authorization: Bearer header and a user_id in the request body. Behest verifies the API key, then signs a JWT using RS256.

JWT claims:

Claim	Description
`tid`	Tenant ID
`pid`	Project ID
`uid`	End-user ID (from your `user_id` field)
`role`	Role (`user`, `dashboard-service`, `admin`)
`scp`	Scopes array (empty by default)
`tier`	Optional tier name for per-tier settings
`iss`	Issuer (`https://api.behest.app`)
`aud`	Audience (`behest`)
`iat`	Issued-at (Unix timestamp)
`nbf`	Not-before (Unix timestamp)
`exp`	Expiry (Unix timestamp)
`jti`	Unique token ID

Lifespan: 60 seconds minimum, 86400 seconds (24h) maximum. Default: 3600 seconds (1h). Design your application to mint fresh tokens proactively before the current one expires.

Verification: Kong verifies the JWT signature using the JWKS endpoint (GET /.well-known/jwks.json) and rejects expired or malformed tokens before the request reaches LiteLLM.

BYOK (Bring Your Own Keys)

BYOK lets you connect your own LLM provider accounts so that:

Inference costs appear on your provider bills, not Behest's
You control your rate limits and spend directly with each provider
Behest is entirely out of the billing loop

How it works:

You store your provider API key via PUT /auth/v1/tenants/:tenantId/providers/:providerType
Behest validates the key against the provider API, then encrypts it with AES-256-GCM and stores the ciphertext
Each project selects which model to use via PUT /auth/v1/projects/:projectId/settings/model
On every inference request, LiteLLM's custom_auth.py reads the encrypted key from Redis, decrypts it in-process, and uses it for that single call
The plaintext key exists only in memory for the duration of one upstream call — it is never logged or returned

Supported providers: openai, anthropic, google, openrouter, mistral, cohere

Plan requirement: BYOK requires a Pro subscription or higher. Free plan tenants receive a 403 BYOAK_REQUIRES_PRO error.

Tiers

Tiers let you define different settings profiles for different user segments within a single project. For example, your "premium" users might get a higher RPM limit and a different system prompt than your "free" users.

Up to 3 tiers per project
Each tier has a name and an overrides object (subset of project settings)
At mint time, pass tier: "premium" in the request body to get a JWT tagged with that tier
Kong and LiteLLM resolve the effective settings by merging the tier overrides on top of the project defaults
Fields that can be overridden: rpm_limit, tokens_per_day, pii_mode, pii_entities, sentinel_mode, sentinel_blocklist, memory_enabled, memory_window, retention_days, store_tool_calls, provider_model, system_prompt

Rate Limiting

Rate limiting is enforced at the Kong gateway level before any request reaches LiteLLM.

Per-project RPM limit Configured as rpm_limit in project settings (default: 60 RPM). Kong counts requests per project per minute using a Redis counter keyed on rpm:{tenantId}:{projectId}:{YYYYMMDDHHM}. Exceeds return 429 Too Many Requests.

Per-user RPM limit A fraction of the project RPM limit (default: 10% of project limit). Requests from the same uid JWT claim above this threshold are rejected. Set per-user limits to 0 to disable.

Per-user daily token budget Configured as tokens_per_day in project settings (default: 1,000,000 tokens/day per user). Tracked via Redis counter. On budget exhaustion, requests return 429.

Per-project aggregate token budget Separate from per-user budget. Default: 10,000,000 tokens/day for the entire project. Configurable via config:{pid}:project_tokens_per_day in Redis.

Kill Switches

Kill switches let you immediately halt all inference requests without waiting for key revocation or deployment changes.

Scope	Redis key	Effect
Global	`killswitch:global`	Blocks all inference on the entire platform
Tenant	`killswitch:tenant:{tenantId}`	Blocks all inference for one tenant
Project	`killswitch:project:{projectId}`	Blocks all inference for one project

Kill switches are checked in order (global → tenant → project) on every request. They take effect within milliseconds — there is no caching delay. Manage them via the admin endpoints at POST /v1/admin/killswitch/*.