Changelog
What we shipped this week. Follow along as we build the AI backend you don't have to.
Usage Tiers go live
Custom per-project usage tiers are now generally available. Author tiers via REST API, with each tier overriding rate, spend, security, memory, and routing controls. The tier name is signed into every end-user JWT for chargeback and per-tier monetization.
- REST API: GET / POST / PUT / DELETE /v1/projects/{id}/tiers and GET /tiers/{id}/resolved
- Sparse override fields per tier: rpm_limit, tokens_per_day, tokens_per_month, pii_mode, pii_entities, sentinel_mode, sentinel_blocklist, memory_enabled, memory_window, retention_days, store_tool_calls, provider_model, system_prompt
- Tier name signed into the end-user JWT (`tier` claim) and emitted as the X-Tier header on proxied requests
- Up to 3 custom tiers per project; Pro+ plan required to author custom tiers
- Tier sync to Redis for sub-millisecond lookup at the gateway
- Tier-aware feature gating for BYOAK and advanced PII / Sentinel modes
BYOAK v2 + Stripe Billing
Bring Your Own API Keys (BYOAK v2) ships with 8 LLM providers, AES-256-GCM encryption at rest, and per-tenant routing. Self-serve Stripe billing with three platform plans (Free / Pro / Max) and token metering goes live alongside.
- BYOAK v2: tenant-scoped keys for OpenAI, Anthropic, Gemini, Vertex AI, Bedrock, Mistral, Cohere, OpenRouter
- AES-256-GCM encryption at rest; per-project routing through your own provider accounts
- Stripe self-serve checkout, upgrade / downgrade / cancel; monthly or annual (15% discount)
- Token-based metering: Free 5M / Pro 50M / Max 500M tokens per month
- Tenant signing keys: per-tenant RSA key CRUD + JWKS, dashboard management UI
Public Beta Launch
Shipped the initial Behest AI platform with the core feature set for multi-tenant AI backend infrastructure.
- Auth & tenant isolation with JWT support and API key management
- Three-tier rate limiting (per-IP, per-project, per-user)
- CORS-ready API with per-project origin configuration
- PII Shield with mask, redact, and block modes (Presidio-powered)
- Sentinel prompt injection defense with pattern detection and custom blocklists
- Conversation memory with configurable session windows
- Token budgets with per-user and per-project daily limits
- Full observability stack (OpenTelemetry, Grafana, Loki, Tempo)
- Self-hosted deployment via Helm charts on GKE
PII Shield & Sentinel
Added enterprise-grade security features to protect sensitive data and block prompt injection attacks.
- PII Shield: three enforcement modes (disabled, shadow, enforce) with reversible masking
- Sentinel: multiple detection patterns for common jailbreak techniques
- Custom blocklist support per project
- Shadow mode for monitoring without blocking — test before enforcing
Token Budgets
Per-user and per-project daily token budget enforcement to prevent runaway AI costs.
- Configurable daily token budgets per user (default 1M) and per project (default 10M)
- Pre-check at gateway with post-request reconciliation
- Actual token counts from LLM provider responses for accurate tracking
- Budget exceeded responses with clear error messages
CORS & Self-Hosted Deployment
Per-project CORS configuration and production-ready Helm chart deployment for Kubernetes.
- Per-project CORS origin allowlists with preflight handling
- Helm chart for GKE Autopilot deployment
- Docker Compose for local development
- Cloud SQL Auth Proxy integration for production databases