Skip to main content

    Tiers and Usage

    Behest models end-user entitlements as tiers. Every JWT carries a tier claim; Kong and LiteLLM enforce the limits associated with that tier. Usage is exposed per-user and per-project via GET /v1/billing/usage.


    Built-in tiers

    TierTypical useDefault limits (configurable in dashboard)
    freeDemos, signed-out trials, generous free plan10 RPM, 100 req/day, 50k tokens/day
    proPaid users, normal product use60 RPM, 10k req/day, 2M tokens/day
    maxPower users, agents, enterprise seats300 RPM, 100k req/day, 20M tokens/day

    Limits are per-(project, uid) — two users on the same project do not share a quota. Defaults live in project_tiers.overrides (JSONB) and can be tuned in the dashboard without redeploying.


    Assigning a tier

    Tier is set at mint time on your backend:

    ts
    import { Behest } from "@behest/client-ts";
    const behest = new Behest(); // reads BEHEST_KEY from env
     
    const { token, ttl } = await behest.auth.mint({
      user_id: user.id,
      tier: user.plan, // numeric tier id, e.g. 1 = free, 2 = pro, 3 = max
    });

    In local-signing mode (behest_pk_* key), the tier rides along in the JWT claims automatically. The tier cannot be changed without re-minting — so a typical TTL of 5–15 min gives users a fast upgrade path.

    After a user upgrades

    1. Update your DB (users.plan = 2).
    2. Mint a fresh JWT on the next request; the current short-lived token will expire on its own.
    3. No Behest-side action needed.

    Reading usage

    GET /v1/billing/usage

    http
    GET https://amber-fox-042.behest.app/v1/billing/usage?from=2026-04-01&to=2026-04-30
    Authorization: Bearer <JWT>

    Returns platform + BYOK buckets separately:

    json
    {
      "project_id": "663e...",
      "window": { "from": "2026-04-01T00:00:00Z", "to": "2026-04-30T00:00:00Z" },
      "platform": {
        "requests": 12450,
        "input_tokens": 3_200_000,
        "output_tokens": 980_000,
        "cost_usd": 12.47,
        "by_model": [
          { "model": "gpt-4o-mini", "requests": 9800, "cost_usd": 4.2 },
          { "model": "claude-3-5-sonnet", "requests": 2650, "cost_usd": 8.27 }
        ]
      },
      "byok": {
        "requests": 840,
        "input_tokens": 120_000,
        "output_tokens": 45_000,
        "cost_usd": 0,
        "by_provider": [{ "provider": "openai", "requests": 840 }]
      },
      "by_user": [
        { "user_id": "user_123", "requests": 420, "cost_usd": 1.14, "tier": "pro" }
      ]
    }

    What /v1/billing/usage returns is determined by the role on your API key, not a value the caller picks. An admin-roled API key mints JWTs that see the whole project; a user-roled key (the default) mints JWTs whose response is filtered server-side to the authenticated uid. Role comes from the key record on the server: if you omit role in the mint body you get the key's role; if you pass one, the server only honors it when it matches the key's role, or when it's an explicit downgrade to "user" (e.g., an admin-roled service key minting a per-end-user JWT). Any other caller-supplied role — including escalation to "admin" from a user-roled key — is rejected. See authentication.md.

    Per-user usage from your backend

    Show a "12 / 100 today" meter:

    ts
    const report = await behest.usage.get({
      from: new Date(Date.now() - 24 * 60 * 60 * 1000),
      to: new Date(),
    });
    // report.totals.tokens, report.breakdown[] — scoped to the JWT's uid when the key is user-roled

    Handling 402 (over quota)

    When a user blows their daily cap, Behest returns:

    http
    HTTP/1.1 402 Payment Required
    Content-Type: application/json
    X-Trace-Id: 4b2c...
     
    {
      "error": {
        "code": "quota_exceeded",
        "message": "Daily token limit reached for tier 'free'.",
        "details": {
          "tier": "free",
          "limit": { "tokens_per_day": 50000 },
          "usage": { "tokens_today": 50123 }
        }
      }
    }

    Client pattern (backend, using the SDK):

    ts
    import { Behest, BehestQuotaError } from "@behest/client-ts";
     
    try {
      await behest.chat.completions.create({ messages });
    } catch (err) {
      if (err instanceof BehestQuotaError) {
        const body = err.raw as {
          error?: { details?: { tier?: string; limit?: unknown; usage?: unknown } };
        };
        showUpgradeModal({
          tier: body?.error?.details?.tier,
          usage: body?.error?.details?.usage,
          limit: body?.error?.details?.limit,
        });
        return;
      }
      throw err;
    }

    Client pattern (browser talking directly to Behest with a minted JWT):

    ts
    const resp = await fetch(`https://${SLUG}.behest.app/v1/chat/completions`, {
      method: "POST",
      headers: {
        Authorization: `Bearer ${token}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({ messages }),
    });
    if (resp.status === 402) {
      const body = await resp.json();
      showUpgradeModal(body.error.details);
      return;
    }

    Do not retry 402. It is a deliberate deny. Retrying will not help; route the user to upgrade or wait.


    Tier enforcement internals (for mental model)

    Request arrives at Kong with JWT { uid, tier: "pro" }
      ↓ Kong resolves per-project overrides from Redis (tiers:{pid})
      ↓ Kong applies RPM limit → 429 if exceeded
      ↓ LiteLLM custom_auth checks daily buckets in Redis → 402 if exceeded
      ↓ Request proceeds
    

    Both RPM (Kong) and daily token cap (LiteLLM) are configurable per tier per project in the dashboard → Tiers page.


    Upgrade flow pattern

    1. User hits 402 → backend SDK throws BehestQuotaError (or browser receives a 402 response).
    2. Frontend shows modal with current vs next-tier limits + CTA.
    3. User upgrades through your billing provider (Stripe/Paddle/etc.).
    4. Your webhook updates users.plan in your DB.
    5. Next token mint uses the new tier; current token expires within 15 min.

    See Streaming UI for how to cancel an in-flight stream when a 402 happens mid-response.


    See also

    Enterprise Token FinOps: Enforce hard budgets and attribute costs per session.

    Learn more