Skip to main content

    Core Concepts

    Understanding these six concepts is enough to use Behest effectively. The rest of the docs fill in details.


    Tenants

    A tenant is your organization. It is the top-level container for all of your Behest resources.

    • Created once when you sign up
    • Holds projects, API keys, and BYOK provider credentials
    • Has a plan field (free, pro, business, enterprise) that gates certain features
    • plan: "free" cannot add BYOK provider keys and is limited to 3 active projects

    Projects

    A project is a deployment unit. Each of your AI-powered features (e.g., "support chatbot", "code assistant", "summarization pipeline") should be its own project.

    What a project gives you:

    • A unique slug (e.g., amber-fox-042) used for routing
    • Two FQDNs for sending inference requests
    • Its own settings: system prompt, CORS origins, rate limits, token budgets, guardrails
    • A set of API keys for minting JWTs

    Project lifecycle:

    • dns_status: PENDING → DNS record being provisioned (async, ~30s)
    • dns_status: READY → accepting inference requests
    • dns_status: FAILED → provisioning failed; retry via POST /v1/projects/:id/provisioning/retry
    • status: suspended → all API keys revoked, no new requests accepted

    Free plan limit: 3 active (non-suspended) projects.


    Slugs and FQDNs

    When you create a project, Behest generates a random human-readable slug in the form {adjective}-{noun}-{3 digits}, for example amber-fox-042 or silent-brook-117.

    The slug is used to derive two FQDNs:

    EnvironmentPatternExample
    Development{slug}.dev.internal.behest.aiamber-fox-042.dev.internal.behest.ai
    Production{slug}.behest.appamber-fox-042.behest.app

    How routing works: When a request arrives at Kong, the behest-tenant-auth plugin parses the slug from the Host header, then does a Redis lookup (slug:{slug}{project_id, tenant_id}) to resolve the tenant and project. This means requests are routed by hostname, not by a path prefix or an API key claim.

    You can also route using the legacy pattern {tenantId}.{projectId}.api.behest.io, but slug-based routing is preferred.


    API Keys

    An API key is a long-lived secret scoped to a single project. Format: behest_sk_live_{32 hex chars}.

    API keys are never sent directly to an LLM. Their sole purpose is to exchange for short-lived JWTs at the /auth/v1/auth/mint endpoint.

    Security model:

    • Stored as an Argon2id hash plus a SHA-256 lookup index — the plaintext is never persisted
    • Shown exactly once at creation time
    • Can be rotated (old key revoked, new key issued atomically) or permanently revoked
    • Each key carries a role (user by default); the minted JWT inherits this role unless overridden at mint time

    Key management:

    • Multiple keys per project are supported
    • Keys are soft-deleted via revoked_at timestamp; once revoked they cannot be un-revoked, only rotated or replaced

    JWTs

    A JWT (JSON Web Token) is a short-lived credential used to authenticate a single user session against the inference endpoint.

    Minting: Call POST /auth/v1/auth/mint with your API key in the Authorization: Bearer header and a user_id in the request body. Behest verifies the API key, then signs a JWT using RS256.

    JWT claims:

    ClaimDescription
    tidTenant ID
    pidProject ID
    uidEnd-user ID (from your user_id field)
    roleRole (user, dashboard-service, admin)
    scpScopes array (empty by default)
    tierOptional tier name for per-tier settings
    issIssuer (https://api.behest.app)
    audAudience (behest)
    iatIssued-at (Unix timestamp)
    nbfNot-before (Unix timestamp)
    expExpiry (Unix timestamp)
    jtiUnique token ID

    Lifespan: 60 seconds minimum, 86400 seconds (24h) maximum. Default: 3600 seconds (1h). Design your application to mint fresh tokens proactively before the current one expires.

    Verification: Kong verifies the JWT signature using the JWKS endpoint (GET /.well-known/jwks.json) and rejects expired or malformed tokens before the request reaches LiteLLM.


    BYOK (Bring Your Own Keys)

    BYOK lets you connect your own LLM provider accounts so that:

    • Inference costs appear on your provider bills, not Behest's
    • You control your rate limits and spend directly with each provider
    • Behest is entirely out of the billing loop

    How it works:

    1. You store your provider API key via PUT /auth/v1/tenants/:tenantId/providers/:providerType
    2. Behest validates the key against the provider API, then encrypts it with AES-256-GCM and stores the ciphertext
    3. Each project selects which model to use via PUT /auth/v1/projects/:projectId/settings/model
    4. On every inference request, LiteLLM's custom_auth.py reads the encrypted key from Redis, decrypts it in-process, and uses it for that single call
    5. The plaintext key exists only in memory for the duration of one upstream call — it is never logged or returned

    Supported providers: openai, anthropic, google, openrouter, mistral, cohere

    Plan requirement: BYOK requires a Pro subscription or higher. Free plan tenants receive a 403 BYOAK_REQUIRES_PRO error.


    Tiers

    Tiers let you define different settings profiles for different user segments within a single project. For example, your "premium" users might get a higher RPM limit and a different system prompt than your "free" users.

    • Up to 3 tiers per project
    • Each tier has a name and an overrides object (subset of project settings)
    • At mint time, pass tier: "premium" in the request body to get a JWT tagged with that tier
    • Kong and LiteLLM resolve the effective settings by merging the tier overrides on top of the project defaults
    • Fields that can be overridden: rpm_limit, tokens_per_day, pii_mode, pii_entities, sentinel_mode, sentinel_blocklist, memory_enabled, memory_window, retention_days, store_tool_calls, provider_model, system_prompt

    Rate Limiting

    Rate limiting is enforced at the Kong gateway level before any request reaches LiteLLM.

    Per-project RPM limit Configured as rpm_limit in project settings (default: 60 RPM). Kong counts requests per project per minute using a Redis counter keyed on rpm:{tenantId}:{projectId}:{YYYYMMDDHHM}. Exceeds return 429 Too Many Requests.

    Per-user RPM limit A fraction of the project RPM limit (default: 10% of project limit). Requests from the same uid JWT claim above this threshold are rejected. Set per-user limits to 0 to disable.

    Per-user daily token budget Configured as tokens_per_day in project settings (default: 1,000,000 tokens/day per user). Tracked via Redis counter. On budget exhaustion, requests return 429.

    Per-project aggregate token budget Separate from per-user budget. Default: 10,000,000 tokens/day for the entire project. Configurable via config:{pid}:project_tokens_per_day in Redis.


    Kill Switches

    Kill switches let you immediately halt all inference requests without waiting for key revocation or deployment changes.

    ScopeRedis keyEffect
    Globalkillswitch:globalBlocks all inference on the entire platform
    Tenantkillswitch:tenant:{tenantId}Blocks all inference for one tenant
    Projectkillswitch:project:{projectId}Blocks all inference for one project

    Kill switches are checked in order (global → tenant → project) on every request. They take effect within milliseconds — there is no caching delay. Manage them via the admin endpoints at POST /v1/admin/killswitch/*.

    Enterprise Token FinOps: Enforce hard budgets and attribute costs per session.

    Learn more