Core Concepts
Understanding these six concepts is enough to use Behest effectively. The rest of the docs fill in details.
Tenants
A tenant is your organization. It is the top-level container for all of your Behest resources.
- Created once when you sign up
- Holds projects, API keys, and BYOK provider credentials
- Has a
planfield (free,pro,business,enterprise) that gates certain features plan: "free"cannot add BYOK provider keys and is limited to 3 active projects
Projects
A project is a deployment unit. Each of your AI-powered features (e.g., "support chatbot", "code assistant", "summarization pipeline") should be its own project.
What a project gives you:
- A unique slug (e.g.,
amber-fox-042) used for routing - Two FQDNs for sending inference requests
- Its own settings: system prompt, CORS origins, rate limits, token budgets, guardrails
- A set of API keys for minting JWTs
Project lifecycle:
dns_status: PENDING→ DNS record being provisioned (async, ~30s)dns_status: READY→ accepting inference requestsdns_status: FAILED→ provisioning failed; retry viaPOST /v1/projects/:id/provisioning/retrystatus: suspended→ all API keys revoked, no new requests accepted
Free plan limit: 3 active (non-suspended) projects.
Slugs and FQDNs
When you create a project, Behest generates a random human-readable slug in the form {adjective}-{noun}-{3 digits}, for example amber-fox-042 or silent-brook-117.
The slug is used to derive two FQDNs:
| Environment | Pattern | Example |
|---|---|---|
| Development | {slug}.dev.internal.behest.ai | amber-fox-042.dev.internal.behest.ai |
| Production | {slug}.behest.app | amber-fox-042.behest.app |
How routing works: When a request arrives at Kong, the behest-tenant-auth plugin parses the slug from the Host header, then does a Redis lookup (slug:{slug} → {project_id, tenant_id}) to resolve the tenant and project. This means requests are routed by hostname, not by a path prefix or an API key claim.
You can also route using the legacy pattern {tenantId}.{projectId}.api.behest.io, but slug-based routing is preferred.
API Keys
An API key is a long-lived secret scoped to a single project. Format: behest_sk_live_{32 hex chars}.
API keys are never sent directly to an LLM. Their sole purpose is to exchange for short-lived JWTs at the /auth/v1/auth/mint endpoint.
Security model:
- Stored as an Argon2id hash plus a SHA-256 lookup index — the plaintext is never persisted
- Shown exactly once at creation time
- Can be rotated (old key revoked, new key issued atomically) or permanently revoked
- Each key carries a
role(userby default); the minted JWT inherits this role unless overridden at mint time
Key management:
- Multiple keys per project are supported
- Keys are soft-deleted via
revoked_attimestamp; once revoked they cannot be un-revoked, only rotated or replaced
JWTs
A JWT (JSON Web Token) is a short-lived credential used to authenticate a single user session against the inference endpoint.
Minting: Call POST /auth/v1/auth/mint with your API key in the Authorization: Bearer header and a user_id in the request body. Behest verifies the API key, then signs a JWT using RS256.
JWT claims:
| Claim | Description |
|---|---|
tid | Tenant ID |
pid | Project ID |
uid | End-user ID (from your user_id field) |
role | Role (user, dashboard-service, admin) |
scp | Scopes array (empty by default) |
tier | Optional tier name for per-tier settings |
iss | Issuer (https://api.behest.app) |
aud | Audience (behest) |
iat | Issued-at (Unix timestamp) |
nbf | Not-before (Unix timestamp) |
exp | Expiry (Unix timestamp) |
jti | Unique token ID |
Lifespan: 60 seconds minimum, 86400 seconds (24h) maximum. Default: 3600 seconds (1h). Design your application to mint fresh tokens proactively before the current one expires.
Verification: Kong verifies the JWT signature using the JWKS endpoint (GET /.well-known/jwks.json) and rejects expired or malformed tokens before the request reaches LiteLLM.
BYOK (Bring Your Own Keys)
BYOK lets you connect your own LLM provider accounts so that:
- Inference costs appear on your provider bills, not Behest's
- You control your rate limits and spend directly with each provider
- Behest is entirely out of the billing loop
How it works:
- You store your provider API key via
PUT /auth/v1/tenants/:tenantId/providers/:providerType - Behest validates the key against the provider API, then encrypts it with AES-256-GCM and stores the ciphertext
- Each project selects which model to use via
PUT /auth/v1/projects/:projectId/settings/model - On every inference request, LiteLLM's
custom_auth.pyreads the encrypted key from Redis, decrypts it in-process, and uses it for that single call - The plaintext key exists only in memory for the duration of one upstream call — it is never logged or returned
Supported providers: openai, anthropic, google, openrouter, mistral, cohere
Plan requirement: BYOK requires a Pro subscription or higher. Free plan tenants receive a 403 BYOAK_REQUIRES_PRO error.
Tiers
Tiers let you define different settings profiles for different user segments within a single project. For example, your "premium" users might get a higher RPM limit and a different system prompt than your "free" users.
- Up to 3 tiers per project
- Each tier has a
nameand anoverridesobject (subset of project settings) - At mint time, pass
tier: "premium"in the request body to get a JWT tagged with that tier - Kong and LiteLLM resolve the effective settings by merging the tier overrides on top of the project defaults
- Fields that can be overridden:
rpm_limit,tokens_per_day,pii_mode,pii_entities,sentinel_mode,sentinel_blocklist,memory_enabled,memory_window,retention_days,store_tool_calls,provider_model,system_prompt
Rate Limiting
Rate limiting is enforced at the Kong gateway level before any request reaches LiteLLM.
Per-project RPM limit
Configured as rpm_limit in project settings (default: 60 RPM). Kong counts requests per project per minute using a Redis counter keyed on rpm:{tenantId}:{projectId}:{YYYYMMDDHHM}. Exceeds return 429 Too Many Requests.
Per-user RPM limit
A fraction of the project RPM limit (default: 10% of project limit). Requests from the same uid JWT claim above this threshold are rejected. Set per-user limits to 0 to disable.
Per-user daily token budget
Configured as tokens_per_day in project settings (default: 1,000,000 tokens/day per user). Tracked via Redis counter. On budget exhaustion, requests return 429.
Per-project aggregate token budget
Separate from per-user budget. Default: 10,000,000 tokens/day for the entire project. Configurable via config:{pid}:project_tokens_per_day in Redis.
Kill Switches
Kill switches let you immediately halt all inference requests without waiting for key revocation or deployment changes.
| Scope | Redis key | Effect |
|---|---|---|
| Global | killswitch:global | Blocks all inference on the entire platform |
| Tenant | killswitch:tenant:{tenantId} | Blocks all inference for one tenant |
| Project | killswitch:project:{projectId} | Blocks all inference for one project |
Kill switches are checked in order (global → tenant → project) on every request. They take effect within milliseconds — there is no caching delay. Manage them via the admin endpoints at POST /v1/admin/killswitch/*.