Skip to main content

    Model Selection and Routing

    How Model Selection Works

    In Behest, model selection is a project-level setting. Each project has exactly one active model at a time. You set it once in the dashboard or API; every inference request from that project uses that model without you specifying it per-request.

    The architecture separates concerns cleanly:

    • Tenant level — holds provider API keys (one per provider, shared across all projects)
    • Project level — selects which model to use (references the tenant's provider key)

    This means you configure your OpenAI key once for your account, then each project can independently choose gpt-4o, gpt-4o-mini, or any other OpenAI model. Swapping a model is a project settings change — no code changes, no redeploy of your application.


    Platform Default Model

    When a project has no model configured, or when BYOK is not available on your plan, Behest routes to:

    gemini-2.5-flash
    

    This uses Behest's platform Google API key. Requests are routed as gemini/gemini-2.5-flash in LiteLLM. You can send "model": "default" in your request body and Kong will substitute the project's configured model (or gemini-2.5-flash if none is set).


    Selecting a Model for a Project

    Dashboard

    1. Open your project
    2. Go to Settings → Model
    3. Choose a provider from the list of configured providers
    4. Choose a model from the discovered model list
    5. Save as draft — test with the test-token before deploying
    6. Click Deploy to publish

    API

    http
    PUT /v1/projects/:projectId/settings
    Authorization: Bearer <service-JWT>
    Content-Type: application/json
     
    {
      "provider_model": "gpt-4o"
    }

    The provider_model field accepts any model ID that:

    1. Is recognized by Behest (exists in provider_models table or matches a known prefix)
    2. Belongs to a provider for which your tenant has a configured key

    If you try to set a model for a provider with no key, you'll receive:

    json
    {
      "error": "No openai API key configured for this tenant",
      "code": "PROVIDER_NOT_CONFIGURED"
    }

    Model selection is saved as a draft — it does not take effect until you deploy.


    Draft vs. Deployed Model Configuration

    Behest uses a two-stage model: draft and deployed.

    StateRedis key prefixHow to setLifetime
    Draftdraft:config:{pid}:*Test-token endpoint300 seconds (TTL)
    Deployedconfig:{pid}:*Deploy endpointPermanent (no TTL)

    When you save a model selection it is stored in project_settings.draft_provider_model. On deploy, it is copied to project_settings.provider_model and written to config:{pid}:provider_model in Redis.

    For live testing before deploying, call POST /v1/projects/:projectId/settings/test-token to get a short-lived JWT. Requests using that JWT read from draft:config:{pid}:provider_model. This lets you validate the full BYOK path — your key, your model, your prompts — without affecting production traffic.

    Inference flow priority:

    1. If X-Behest-Draft-Mode: 1 header is present → read draft:config:{pid}:provider_model
    2. Otherwise → read config:{pid}:provider_model
    3. If neither exists → use platform default (gemini-2.5-flash)

    Model Discovery

    When you save a provider key, Behest automatically discovers available models by calling the provider's models list endpoint. Discovered models are written to two places:

    • provider_models table — global catalog (all tenants see the same model metadata)
    • tenant_provider_models table — junction table for per-tenant visibility (only models you discovered with your key are available to you)
    • provider:models:{providerId} in Redis — model metadata cache
    • tenant:models:{tenantId} in Redis — aggregate of all models available to your tenant

    This means tenant isolation is enforced: if Tenant A discovers gpt-4o, that model entry appears in Tenant A's model list. Tenant B with no OpenAI key cannot see it, even though the model exists in the global catalog.


    Available Models

    Models available to your account depend on which provider keys you have configured. Use the models API to list them:

    http
    GET /v1/tenants/:tenantId/providers/models
    Authorization: Bearer <service-JWT>

    Response:

    json
    {
      "models": [
        {
          "modelId": "gpt-4o",
          "displayName": "GPT-4o",
          "providerSlug": "openai",
          "providerDisplayName": "OpenAI",
          "capabilities": { ... },
          "contextWindow": 128000,
          "maxOutputTokens": 16384,
          "supportsStreaming": true,
          "supportsToolUse": true,
          "supportsVision": true
        }
      ]
    }

    Model Capabilities Matrix

    Capabilities are stored in the provider_models.capabilities JSONB column and populated by model discovery. The following fields are tracked:

    CapabilityField nameType
    Context windowcontext_windowinteger (tokens)
    Max output tokensmax_output_tokensinteger (tokens)
    Streamingsupports_streamingboolean
    Function calling / tool usesupports_tool_use or supports_function_callingboolean
    Vision (image input)supports_visionboolean

    Known model capabilities (from PROVIDER_REGISTRY.md and fallback tables)

    ModelContextStreamingVisionTool useProvider
    gpt-4o128KYesYesYesopenai
    gpt-4o-mini128KYesYesYesopenai
    gpt-4.11MYesYesYesopenai
    o3-mini200KYesNoNoopenai
    o4-mini200KYesNoYesopenai
    claude-opus-4-20250514200KYesYesYesanthropic
    claude-sonnet-4-20250514200KYesYesYesanthropic
    claude-haiku-4-5-20251001200KYesYesYesanthropic
    gemini-2.5-pro1MYesYesYesgoogle
    gemini-2.5-flash1MYesYesYesgoogle
    gemini-2.0-flash1MYesYesYesgoogle
    mistral-large-latest128KYesYesYesmistral
    mistral-small-latest128KYesNoYesmistral
    codestral-latest256KYesNoYesmistral
    command-r-plus-08-2024128KYesNoYescohere
    command-r-08-2024128KYesNoYescohere

    Cost Comparison (Approximate, per 1M tokens)

    Source: PROVIDER_REGISTRY.md. Prices change; verify at the provider's pricing page.

    ModelInputOutputNotes
    gpt-4.1-nano$0.10$0.40Fastest/cheapest OpenAI
    gpt-4o-mini$0.15$0.60Best value OpenAI general
    gemini-2.5-flash$0.15$0.60Platform default; large context
    gemini-2.0-flash$0.10$0.40Cost-optimized
    mistral-small-latest$0.10$0.30Most affordable Mistral
    command-r-08-2024$0.15$0.60RAG-optimized Cohere
    gpt-4o$2.50$10.00Flagship OpenAI
    gpt-4.1$2.00$8.00Long-context OpenAI
    claude-sonnet-4-20250514$3.00$15.00Flagship Anthropic
    mistral-large-latest$2.00$6.00Flagship Mistral
    command-r-plus-08-2024$2.50$10.00RAG flagship Cohere
    gemini-2.5-pro$1.25$10.00Advanced Google reasoning
    claude-opus-4-20250514$15.00$75.00Highest capability Anthropic

    Switching Models Without Code Changes

    Because "model": "default" is valid in your request body, your application code never needs to hard-code a model name. Kong substitutes the project's deployed model before the request reaches LiteLLM.

    javascript
    // Your application code — never changes
    const response = await openai.chat.completions.create({
      model: "default",
      messages: [{ role: "user", content: "Hello" }],
    });

    To switch from gpt-4o to claude-sonnet-4-20250514:

    1. Configure your Anthropic key (one-time, if not already done)
    2. Change the project's model to claude-sonnet-4-20250514 in the dashboard
    3. Deploy

    Zero application code changes. Zero redeployment of your service.


    Model Allowlists and Blocklists

    Behest enforces model access at two levels:

    Tenant-scoped visibility — a model is only selectable if it appears in your tenant_provider_models junction table. This table is populated when you save a provider key and model discovery runs. You cannot select a model your key hasn't discovered.

    Global model registry — the provider_models table and model-registry.ts track known model prefixes. Models with recognized prefixes (gpt-, claude-, gemini, mistral, command, o1, o3, o4, codestral, open-mistral) are accepted even before model discovery runs (backward-compatible fallback). Unknown prefixes are rejected with a 422.

    Per-project allowlists/blocklists for end users (restricting which models their end users can request) are on the roadmap.


    Per-Request Model Override

    End users can specify any model ID in the request body as long as it belongs to the same provider as the project's configured model. Cross-provider overrides are silently ignored (the project's configured model is used instead) to prevent credential misrouting.

    python
    # If your project is configured for openai/gpt-4o, this override works:
    client.chat.completions.create(model="gpt-4o-mini", messages=[...])
     
    # This override is silently ignored (different provider):
    client.chat.completions.create(model="claude-sonnet-4-20250514", messages=[...])

    Enterprise Token FinOps: Enforce hard budgets and attribute costs per session.

    Learn more