Model Selection and Routing
How Model Selection Works
In Behest, model selection is a project-level setting. Each project has exactly one active model at a time. You set it once in the dashboard or API; every inference request from that project uses that model without you specifying it per-request.
The architecture separates concerns cleanly:
- Tenant level — holds provider API keys (one per provider, shared across all projects)
- Project level — selects which model to use (references the tenant's provider key)
This means you configure your OpenAI key once for your account, then each project can independently choose gpt-4o, gpt-4o-mini, or any other OpenAI model. Swapping a model is a project settings change — no code changes, no redeploy of your application.
Platform Default Model
When a project has no model configured, or when BYOK is not available on your plan, Behest routes to:
gemini-2.5-flash
This uses Behest's platform Google API key. Requests are routed as gemini/gemini-2.5-flash in LiteLLM. You can send "model": "default" in your request body and Kong will substitute the project's configured model (or gemini-2.5-flash if none is set).
Selecting a Model for a Project
Dashboard
- Open your project
- Go to Settings → Model
- Choose a provider from the list of configured providers
- Choose a model from the discovered model list
- Save as draft — test with the test-token before deploying
- Click Deploy to publish
API
PUT /v1/projects/:projectId/settings
Authorization: Bearer <service-JWT>
Content-Type: application/json
{
"provider_model": "gpt-4o"
}The provider_model field accepts any model ID that:
- Is recognized by Behest (exists in
provider_modelstable or matches a known prefix) - Belongs to a provider for which your tenant has a configured key
If you try to set a model for a provider with no key, you'll receive:
{
"error": "No openai API key configured for this tenant",
"code": "PROVIDER_NOT_CONFIGURED"
}Model selection is saved as a draft — it does not take effect until you deploy.
Draft vs. Deployed Model Configuration
Behest uses a two-stage model: draft and deployed.
| State | Redis key prefix | How to set | Lifetime |
|---|---|---|---|
| Draft | draft:config:{pid}:* | Test-token endpoint | 300 seconds (TTL) |
| Deployed | config:{pid}:* | Deploy endpoint | Permanent (no TTL) |
When you save a model selection it is stored in project_settings.draft_provider_model. On deploy, it is copied to project_settings.provider_model and written to config:{pid}:provider_model in Redis.
For live testing before deploying, call POST /v1/projects/:projectId/settings/test-token to get a short-lived JWT. Requests using that JWT read from draft:config:{pid}:provider_model. This lets you validate the full BYOK path — your key, your model, your prompts — without affecting production traffic.
Inference flow priority:
- If
X-Behest-Draft-Mode: 1header is present → readdraft:config:{pid}:provider_model - Otherwise → read
config:{pid}:provider_model - If neither exists → use platform default (
gemini-2.5-flash)
Model Discovery
When you save a provider key, Behest automatically discovers available models by calling the provider's models list endpoint. Discovered models are written to two places:
provider_modelstable — global catalog (all tenants see the same model metadata)tenant_provider_modelstable — junction table for per-tenant visibility (only models you discovered with your key are available to you)provider:models:{providerId}in Redis — model metadata cachetenant:models:{tenantId}in Redis — aggregate of all models available to your tenant
This means tenant isolation is enforced: if Tenant A discovers gpt-4o, that model entry appears in Tenant A's model list. Tenant B with no OpenAI key cannot see it, even though the model exists in the global catalog.
Available Models
Models available to your account depend on which provider keys you have configured. Use the models API to list them:
GET /v1/tenants/:tenantId/providers/models
Authorization: Bearer <service-JWT>Response:
{
"models": [
{
"modelId": "gpt-4o",
"displayName": "GPT-4o",
"providerSlug": "openai",
"providerDisplayName": "OpenAI",
"capabilities": { ... },
"contextWindow": 128000,
"maxOutputTokens": 16384,
"supportsStreaming": true,
"supportsToolUse": true,
"supportsVision": true
}
]
}Model Capabilities Matrix
Capabilities are stored in the provider_models.capabilities JSONB column and populated by model discovery. The following fields are tracked:
| Capability | Field name | Type |
|---|---|---|
| Context window | context_window | integer (tokens) |
| Max output tokens | max_output_tokens | integer (tokens) |
| Streaming | supports_streaming | boolean |
| Function calling / tool use | supports_tool_use or supports_function_calling | boolean |
| Vision (image input) | supports_vision | boolean |
Known model capabilities (from PROVIDER_REGISTRY.md and fallback tables)
| Model | Context | Streaming | Vision | Tool use | Provider |
|---|---|---|---|---|---|
gpt-4o | 128K | Yes | Yes | Yes | openai |
gpt-4o-mini | 128K | Yes | Yes | Yes | openai |
gpt-4.1 | 1M | Yes | Yes | Yes | openai |
o3-mini | 200K | Yes | No | No | openai |
o4-mini | 200K | Yes | No | Yes | openai |
claude-opus-4-20250514 | 200K | Yes | Yes | Yes | anthropic |
claude-sonnet-4-20250514 | 200K | Yes | Yes | Yes | anthropic |
claude-haiku-4-5-20251001 | 200K | Yes | Yes | Yes | anthropic |
gemini-2.5-pro | 1M | Yes | Yes | Yes | |
gemini-2.5-flash | 1M | Yes | Yes | Yes | |
gemini-2.0-flash | 1M | Yes | Yes | Yes | |
mistral-large-latest | 128K | Yes | Yes | Yes | mistral |
mistral-small-latest | 128K | Yes | No | Yes | mistral |
codestral-latest | 256K | Yes | No | Yes | mistral |
command-r-plus-08-2024 | 128K | Yes | No | Yes | cohere |
command-r-08-2024 | 128K | Yes | No | Yes | cohere |
Cost Comparison (Approximate, per 1M tokens)
Source: PROVIDER_REGISTRY.md. Prices change; verify at the provider's pricing page.
| Model | Input | Output | Notes |
|---|---|---|---|
gpt-4.1-nano | $0.10 | $0.40 | Fastest/cheapest OpenAI |
gpt-4o-mini | $0.15 | $0.60 | Best value OpenAI general |
gemini-2.5-flash | $0.15 | $0.60 | Platform default; large context |
gemini-2.0-flash | $0.10 | $0.40 | Cost-optimized |
mistral-small-latest | $0.10 | $0.30 | Most affordable Mistral |
command-r-08-2024 | $0.15 | $0.60 | RAG-optimized Cohere |
gpt-4o | $2.50 | $10.00 | Flagship OpenAI |
gpt-4.1 | $2.00 | $8.00 | Long-context OpenAI |
claude-sonnet-4-20250514 | $3.00 | $15.00 | Flagship Anthropic |
mistral-large-latest | $2.00 | $6.00 | Flagship Mistral |
command-r-plus-08-2024 | $2.50 | $10.00 | RAG flagship Cohere |
gemini-2.5-pro | $1.25 | $10.00 | Advanced Google reasoning |
claude-opus-4-20250514 | $15.00 | $75.00 | Highest capability Anthropic |
Switching Models Without Code Changes
Because "model": "default" is valid in your request body, your application code never needs to hard-code a model name. Kong substitutes the project's deployed model before the request reaches LiteLLM.
// Your application code — never changes
const response = await openai.chat.completions.create({
model: "default",
messages: [{ role: "user", content: "Hello" }],
});To switch from gpt-4o to claude-sonnet-4-20250514:
- Configure your Anthropic key (one-time, if not already done)
- Change the project's model to
claude-sonnet-4-20250514in the dashboard - Deploy
Zero application code changes. Zero redeployment of your service.
Model Allowlists and Blocklists
Behest enforces model access at two levels:
Tenant-scoped visibility — a model is only selectable if it appears in your tenant_provider_models junction table. This table is populated when you save a provider key and model discovery runs. You cannot select a model your key hasn't discovered.
Global model registry — the provider_models table and model-registry.ts track known model prefixes. Models with recognized prefixes (gpt-, claude-, gemini, mistral, command, o1, o3, o4, codestral, open-mistral) are accepted even before model discovery runs (backward-compatible fallback). Unknown prefixes are rejected with a 422.
Per-project allowlists/blocklists for end users (restricting which models their end users can request) are on the roadmap.
Per-Request Model Override
End users can specify any model ID in the request body as long as it belongs to the same provider as the project's configured model. Cross-provider overrides are silently ignored (the project's configured model is used instead) to prevent credential misrouting.
# If your project is configured for openai/gpt-4o, this override works:
client.chat.completions.create(model="gpt-4o-mini", messages=[...])
# This override is silently ignored (different provider):
client.chat.completions.create(model="claude-sonnet-4-20250514", messages=[...])