Tiers and Usage

Behest models end-user entitlements as tiers. Every JWT carries a tier claim; Kong and LiteLLM enforce the limits associated with that tier. Usage is exposed per-user and per-project via GET /v1/billing/usage.

Built-in tiers

Tier	Typical use	Default limits (configurable in dashboard)
`free`	Demos, signed-out trials, generous free plan	10 RPM, 100 req/day, 50k tokens/day
`pro`	Paid users, normal product use	60 RPM, 10k req/day, 2M tokens/day
`max`	Power users, agents, enterprise seats	300 RPM, 100k req/day, 20M tokens/day

Limits are per-(project, uid) — two users on the same project do not share a quota. Defaults live in project_tiers.overrides (JSONB) and can be tuned in the dashboard without redeploying.

Assigning a tier

Tier is set at mint time on your backend:

import { Behest } from "@behest/client-ts";
const behest = new Behest(); // reads BEHEST_KEY from env
 
const { token, ttl } = await behest.auth.mint({
  user_id: user.id,
  tier: user.plan, // numeric tier id, e.g. 1 = free, 2 = pro, 3 = max
});

In local-signing mode (behest_pk_* key), the tier rides along in the JWT claims automatically. The tier cannot be changed without re-minting — so a typical TTL of 5–15 min gives users a fast upgrade path.

After a user upgrades

Update your DB (users.plan = 2).
Mint a fresh JWT on the next request; the current short-lived token will expire on its own.
No Behest-side action needed.

Reading usage

`GET /v1/billing/usage`

http

GET https://amber-fox-042.behest.app/v1/billing/usage?from=2026-04-01&to=2026-04-30
Authorization: Bearer <JWT>

Returns platform + BYOK buckets separately:

json

{
  "project_id": "663e...",
  "window": { "from": "2026-04-01T00:00:00Z", "to": "2026-04-30T00:00:00Z" },
  "platform": {
    "requests": 12450,
    "input_tokens": 3_200_000,
    "output_tokens": 980_000,
    "cost_usd": 12.47,
    "by_model": [
      { "model": "gpt-4o-mini", "requests": 9800, "cost_usd": 4.2 },
      { "model": "claude-3-5-sonnet", "requests": 2650, "cost_usd": 8.27 }
    ]
  },
  "byok": {
    "requests": 840,
    "input_tokens": 120_000,
    "output_tokens": 45_000,
    "cost_usd": 0,
    "by_provider": [{ "provider": "openai", "requests": 840 }]
  },
  "by_user": [
    { "user_id": "user_123", "requests": 420, "cost_usd": 1.14, "tier": "pro" }
  ]
}

What /v1/billing/usage returns is determined by the role on your API key, not a value the caller picks. An admin-roled API key mints JWTs that see the whole project; a user-roled key (the default) mints JWTs whose response is filtered server-side to the authenticated uid. Role comes from the key record on the server: if you omit role in the mint body you get the key's role; if you pass one, the server only honors it when it matches the key's role, or when it's an explicit downgrade to "user" (e.g., an admin-roled service key minting a per-end-user JWT). Any other caller-supplied role — including escalation to "admin" from a user-roled key — is rejected. See authentication.md.

Per-user usage from your backend

Show a "12 / 100 today" meter:

const report = await behest.usage.get({
  from: new Date(Date.now() - 24 * 60 * 60 * 1000),
  to: new Date(),
});
// report.totals.tokens, report.breakdown[] — scoped to the JWT's uid when the key is user-roled

Handling 402 (over quota)

When a user blows their daily cap, Behest returns:

http

HTTP/1.1 402 Payment Required
Content-Type: application/json
X-Trace-Id: 4b2c...
 
{
  "error": {
    "code": "quota_exceeded",
    "message": "Daily token limit reached for tier 'free'.",
    "details": {
      "tier": "free",
      "limit": { "tokens_per_day": 50000 },
      "usage": { "tokens_today": 50123 }
    }
  }
}

Client pattern (backend, using the SDK):

import { Behest, BehestQuotaError } from "@behest/client-ts";
 
try {
  await behest.chat.completions.create({ messages });
} catch (err) {
  if (err instanceof BehestQuotaError) {
    const body = err.raw as {
      error?: { details?: { tier?: string; limit?: unknown; usage?: unknown } };
    };
    showUpgradeModal({
      tier: body?.error?.details?.tier,
      usage: body?.error?.details?.usage,
      limit: body?.error?.details?.limit,
    });
    return;
  }
  throw err;
}

Client pattern (browser talking directly to Behest with a minted JWT):

const resp = await fetch(`https://${SLUG}.behest.app/v1/chat/completions`, {
  method: "POST",
  headers: {
    Authorization: `Bearer ${token}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ messages }),
});
if (resp.status === 402) {
  const body = await resp.json();
  showUpgradeModal(body.error.details);
  return;
}

Do not retry 402. It is a deliberate deny. Retrying will not help; route the user to upgrade or wait.

Tier enforcement internals (for mental model)

Request arrives at Kong with JWT { uid, tier: "pro" }
  ↓ Kong resolves per-project overrides from Redis (tiers:{pid})
  ↓ Kong applies RPM limit → 429 if exceeded
  ↓ LiteLLM custom_auth checks daily buckets in Redis → 402 if exceeded
  ↓ Request proceeds

Both RPM (Kong) and daily token cap (LiteLLM) are configurable per tier per project in the dashboard → Tiers page.

Upgrade flow pattern

User hits 402 → backend SDK throws BehestQuotaError (or browser receives a 402 response).
Frontend shows modal with current vs next-tier limits + CTA.
User upgrades through your billing provider (Stripe/Paddle/etc.).
Your webhook updates users.plan in your DB.
Next token mint uses the new tier; current token expires within 15 min.

See Streaming UI for how to cancel an in-flight stream when a 402 happens mid-response.