Capability-driven AI model routing with automatic failover
Capability-driven AI routing library for Python and TypeScript. A single integration point for multiple AI providers with automatic rotation to aggregate free tiers, minimize cost, and maintain service continuity.
Applications request capabilities. ModelMesh Lite manages providers, quotas, costs, and failover.
AI applications need capabilities (text generation, image generation, embeddings, speech) and do not care about specific providers. Coupling to one provider means quota exhaustion halts the app, rate limits cause interruptions, outages remove entire capabilities, and each provider needs its own integration code.
ModelMesh Lite separates what the application needs from who delivers it: provider rotation avoids downtime, free-tier aggregation combines quotas, capability-based routing prevents lock-in, and a unified API simplifies development.
ModelMesh Lite does not replace high-level AI frameworks such as LangChain or LangGraph. It operates one layer below, providing the infrastructure for efficient, reliable access to the cloud APIs that those frameworks depend on. Provider rotation, quota aggregation, failover, and credential management become shared plumbing that any framework or application can build upon.
Each request resolves in two stages: capability resolution (choose a pool matching the requested capability) and model selection (select the best active model and its provider). Applications remain stable even when providers change.
System layers: Application → Router → Pool → Model → Provider. The application requests a capability; the router resolves it to a pool; the pool groups models that fulfill it; each model declares capabilities and constraints; providers expose models and manage quotas. The router selects the best active model, routes through its provider, and handles rotation on failure.
Every integration point is a connector, a class or function implementing a defined interface (providers, rotation policies, secret stores, storage backends, observability outputs, discovery). Each connector type has a connector catalogue, a registry of available implementations with code, metadata, and a configuration schema. Interface definitions are in ConnectorInterfaces.md.
Custom connectors are first-class citizens; they register in the same catalogue and receive the same treatment as pre-shipped ones. Connectors can be bundled with the application or loaded at runtime from connector packages (zip archives) referenced in configuration. The library ships with broad pre-built coverage (full catalogue in ConnectorCatalogue.md); standard capabilities require zero configuration beyond API keys.
| Connector type | Function | Interface | Pre-shipped |
|---|---|---|---|
| Provider | Expose AI models and web APIs through an OpenAI-compatible interface | complete, check_quota, report_usage, list_models | openai.llm.v1, google.gemini.v1, huggingface.inference.v1, openrouter.gateway.v1, cloudflare.workers-ai.v1 |
| Rotation policy | Govern model lifecycle: deactivation, recovery, selection | deactivate, recover, select | modelmesh.stick-until-failure.v1, modelmesh.priority-selection.v1, modelmesh.round-robin.v1, modelmesh.cost-first.v1, modelmesh.latency-first.v1, modelmesh.session-stickiness.v1, modelmesh.rate-limit-aware.v1, modelmesh.load-balanced.v1 |
| Secret store | Resolve API keys and tokens from secure backends | get, set, list, delete | modelmesh.env.v1, modelmesh.dotenv.v1, aws.secrets-manager.v1, google.secret-manager.v1, microsoft.key-vault.v1, 1password.connect.v1 |
| Storage | Persist state, configuration, and logs to external backends | load, save, list, delete, stat, exists, acquire, release | modelmesh.local-file.v1, aws.s3.v1, google.drive.v1, redis.redis.v1 |
| Observability | Export routing decisions, request logs, and statistics | emit, log, flush | modelmesh.console.v1, modelmesh.local-file.v1, modelmesh.webhook.v1 |
| Discovery | Sync model catalogues and monitor provider health | sync, probe | modelmesh.registry-sync.v1, modelmesh.health-monitor.v1 |
The Connector Development Kit is the class library used to build both pre-shipped and custom connectors. It provides generic base classes with sensible default behavior for each connector type, so a new connector can be created with minimal code. Users who need specialized behavior can derive from an existing connector and override only the methods that differ, inheriting everything else. The same CDK classes that the library uses internally are available to users, making custom connectors indistinguishable from pre-shipped ones. CDK architecture, base classes, and tutorials are in cdk/Overview.md.
Every object in the system has a unique dot-notated ID string that establishes its identity, scope, and lineage. Dot-notation ensures global uniqueness across object types, makes relationships explicit, and provides a human-readable namespace. The general form is scope.name or scope.name.qualifier, where each segment narrows the context from left to right.
| Object type | Pattern | Examples |
|---|---|---|
| Model | vendor.model-name |
openai.gpt-4o, anthropic.claude-sonnet-4, google.gemini-2.0-flash, deepseek.deepseek-chat |
| Provider connector | vendor.service.version |
openai.llm.v1, google.gemini.v1, anthropic.claude.v1, huggingface.inference.v1 |
| Rotation policy | vendor.policy-name.version |
modelmesh.stick-until-failure.v1, modelmesh.cost-first.v1, modelmesh.round-robin.v1 |
| Secret store | vendor.store-type.version |
modelmesh.env.v1, aws.secrets-manager.v1, google.secret-manager.v1 |
| Storage | vendor.backend.version |
modelmesh.local-file.v1, aws.s3.v1, redis.redis.v1 |
| Observability | vendor.output.version |
modelmesh.console.v1, modelmesh.webhook.v1, modelmesh.local-file.v1 |
| Discovery | vendor.strategy.version |
modelmesh.registry-sync.v1, modelmesh.health-monitor.v1 |
| Capability | category.subcategory.leaf |
generation.text-generation.chat-completion, representation.embeddings.text-embeddings |
| Pool | category.subcategory or category.leaf |
generation.text-generation, representation.embeddings, understanding.vision-understanding |
| System service | system.service-name |
system.router, system.state-manager, system.capability-resolver, system.event-emitter |
| Configuration section | config.section.subsection |
config.providers, config.pools.rotation, config.observability.routing |
Rules:
openai.gpt-4o and connector openai.llm.v1 both start with openai), but the type context disambiguates them.openai, google, modelmesh, aws). Custom objects use the user’s organization name (e.g., acmecorp.internal-llm.v1)..v1, .v2). This enables side-by-side versioning: openai.llm.v1 and openai.llm.v2 can coexist in the catalogue.generation.text-generation has ID generation.text-generation.[a-z0-9\-\.]. Examples: openai.gpt-4o-mini, modelmesh.rate-limit-aware.v1.Model capabilities form a hierarchical tree. Parent nodes are categories; leaf nodes are concrete, routable capabilities (full tree in ModelCapabilities.md).
| Category | Produces | Example leaves |
|---|---|---|
| generation | new content | chat-completion, text-to-image, text-to-speech |
| understanding | analysis of input | summarization, ocr, speech-to-text |
| transformation | converted content | translation, background-removal, voice-cloning |
| representation | encoded data | text-embeddings, image-embeddings |
| retrieval | found information | semantic-search, grounded-generation, reranking |
| interaction | multi-step behavior | tool-calling, agent-execution |
| evaluation | quality assessment | content-moderation, factuality-checking |
Rules: Models register at leaf nodes. Pools can target any node and include all descendants. Requesting understanding matches all understanding models; requesting ocr matches only that leaf. A model with multiple leaves appears in multiple ancestor pools automatically.
The hierarchy is extensible. Users can add custom categories, subcategories, and leaf nodes (e.g., compliance → pii-detection, regulatory-review). Custom nodes follow the same routing, pooling, and inheritance rules as pre-shipped ones.
A model definition is a capability contract, a declaration of what an application can expect when routing through that model. Attributes fall into four categories:
Capabilities and delivery modes are orthogonal: chat-completion supports sync, streaming, and batch, while web-search supports only sync. Full attribute reference in SystemConfiguration.md — Models.
A capability pool groups models that fulfill the same type of task. Pools are defined by a capability node, not by provider, and collect all models registered at that node or its descendants.
The library ships with predefined pools for common capabilities (ModelCapabilities.md — Predefined Pools). Users add custom pools (e.g., code-review, medical-summarization, long-context-analysis) with the same rotation and failover logic.
Pool membership is automatic: a model definition registered at chat-completion, ocr, and tool-calling leaf nodes joins the generation, understanding, interaction, and all ancestor pools without manual assignment.
Pools can be defined in two ways:
A provider exposes one or more models through a specific API via a provider connector that implements a uniform, OpenAI-compatible interface. The model definition describes what the AI can do; the provider connector describes how it is accessed and managed: authentication, quota, rate limits, cost, availability. A provider registers a single connector for all its models (the default) or per-model connectors when distinct handling is required.
Two responsibility areas:
Infrastructure capabilities feed routing directly: discovery, quota, and pricing data inform model selection and proactive rotation. Providers report operational data; the pool’s rotation policy acts on it. Details in ConnectorInterfaces.md — Provider and Provider Interface.
Within each pool, every model is classified as Active (eligible for routing) or Standby (temporarily excluded). A rotation policy governs transitions through three components (deactivation, recovery, and selection), configured independently per pool. Rotation operates at model level (individual model moves to standby) or provider level (provider-wide issue deactivates all its models across all pools). The library tracks each model’s state (failure counts, cooldown timers, quota usage) and persists it through storage connectors. All policy attributes in SystemConfiguration.md — Pools.
A model moves to standby based on: error-based (failure count, error rate, specific HTTP codes), request-count-based (request cap, token limit, cost budget), or time-based (quota period expiry, maintenance window). Request-count-based triggers enable free-tier aggregation: when one provider’s free quota is exhausted, rotation selects the next active provider, chaining quotas automatically.
A standby model returns to active through: startup probe, cooldown (fixed delay), calendar (aligned with quota resets), periodic probe, or manual API command.
Pre-shipped: modelmesh.stick-until-failure.v1 (default), modelmesh.priority-selection.v1, modelmesh.round-robin.v1, modelmesh.cost-first.v1, modelmesh.latency-first.v1, modelmesh.session-stickiness.v1, modelmesh.rate-limit-aware.v1, modelmesh.load-balanced.v1. Rate-limit-aware switches models preemptively before hitting limits; load-balanced distributes requests proportionally to each model’s rate-limit headroom. Strategy details in SystemConfiguration.md — Pools.
Before rotating, the router retries the same model with configurable backoff (fixed, exponential with jitter, or provider Retry-After). Retryable errors (timeouts, 500, 503) are retried; non-retryable errors (400, 401, 403) trigger immediate rotation. Retry attempts count toward the deactivation threshold. Scope is configurable: same model, same provider, or cross-provider.
The library ships pre-built policy connectors for each component. Users can replace individual components or the entire policy, via configuration or at runtime.
Each request passes through: capability resolution → pool selection → delivery mode filter → state filter (exclude standby models) → selection strategy → intelligent retry. The pipeline combines capability hierarchy, model attributes, pool state, and rotation policy to select the best available model. Example in SystemConfiguration.md — Routing Pipeline.
The library exposes an OpenAI-compatible interface. Applications interact with it using standard OpenAI SDK calls (chat.completions, embeddings, audio.speech, etc.). Requests use virtual model names that map to configured pools; a call to text-generation resolves to the best active real model and provider, with format translation, failover, and state management handled transparently.
The library provides a ChatOpenAI-compatible interface, so LangChain and LangGraph pipelines connect directly. The interface is not limited to AI models; web API services such as document parsing, content moderation, or search APIs can be wrapped as provider connectors, gaining rotation, quota management, and failover through the same unified interface.
The library comes with a build script that packages it with selected connectors, policies, and a YAML configuration into a Docker image. The resulting container exposes standard OpenAI API endpoints (/v1/chat/completions, /v1/embeddings, etc.) that any application or framework can connect to without embedding the library.
This decouples routing from application code: multiple applications (LangChain pipelines, IDE assistants, internal tools) share a single proxy with centralized configuration, credential management, and state. The proxy supports all library features and can be deployed alongside applications or as a shared service.
| Mode | Description | Best for |
|---|---|---|
| Embedded library | Server-side or desktop. Calls provider APIs and exposes a drop-in OpenAI SDK replacement. | backends, CLI tools, desktop apps, LangChain/LangGraph pipelines |
| Client library | Browser client. Calls provider APIs directly with client-side routing. | single-page apps, browser-based tools |
| OpenAI-compatible proxy | Standalone Docker container exposing standard OpenAI API endpoints. Any OpenAI SDK or REST client connects without embedding the library. | multi-app environments, IDE integrations, shared infrastructure |
ModelMesh Lite runs in web browsers via the BrowserBaseProvider class and a dedicated browser entry point. BrowserBaseProvider mirrors the full BaseProvider interface but uses the Fetch API and ReadableStream instead of Node.js http/https modules and Node streams. The same protected hooks (_buildHeaders, _buildRequestPayload, _parseResponse, _parseSseChunk, _getCompletionEndpoint) are available for subclassing, so browser providers are built with the same patterns as server-side providers.
Most AI provider APIs do not send CORS headers, so browsers block direct requests. BrowserBaseProvider accepts an optional proxyUrl configuration field. When set, all API URLs are prefixed with the proxy URL, routing requests through a lightweight CORS proxy that adds the required headers. The library ships with a minimal transparent CORS proxy in tools/cors-proxy/ (Node.js script, Docker Compose included).
| Mode | When to use | Configuration |
|---|---|---|
| With CORS proxy | Standard web pages calling any AI provider API | Set proxyUrl in BrowserBaseProvider config |
| Direct access | Browser extensions with host_permissions, providers that send CORS headers (e.g., Anthropic), non-browser runtimes |
Omit proxyUrl |
For bundlers (Webpack, Vite, esbuild), import from @nistrapa/modelmesh-core/browser to exclude Node.js-dependent modules (ProxyServer, FileSecretStore, HttpHealthDiscovery, file-backed KeyValueStorage). The createBrowser() convenience function provides a browser-optimized equivalent of modelmesh.create().
Full browser usage guide in guides/BrowserUsage.md.
The system is configured declaratively via YAML, programmatically via API, or both. Configuration can be serialized to and deserialized from persistent storage connectors, enabling centralized management and sharing across instances. Full YAML reference in SystemConfiguration.md.
API keys and tokens must never be hardcoded in configuration or source code. Secret store connectors resolve credentials from secure backends (environment variables, cloud secret managers, vaults) at runtime. Configuration references secrets by name (${secrets:openai-key}); the library resolves them at initialization through the configured store. A CLI utility publishes and manages credentials across stores. Pre-shipped stores and deployment patterns in ConnectorCatalogue.md.
Storage connectors serialize and deserialize library data to external backends. Three data types flow through them:
Sync policies: in-memory (no persistence), sync-on-boundary (load/save at startup/shutdown), periodic (configurable interval), immediate (every state change).
Pre-shipped connectors include modelmesh.local-file.v1, aws.s3.v1, google.drive.v1, and redis.redis.v1 (full table in ConnectorCatalogue.md). Custom connectors implement the same interface and register in the connector catalogue. Details in SystemConfiguration.md.
Three levels of visibility into routing and provider behavior:
Data exports through pluggable observability connectors; multiple can be active simultaneously (e.g., webhook for alerts + file for dashboards). Full metrics, API, and configuration in SystemConfiguration.md — Observability.
Two connectors keep the model catalogue accurate and provider health visible without manual intervention.
Synchronizes the local model catalogue with provider APIs on a configurable schedule. Detects new models, deprecated models, and pricing changes, updating the catalogue automatically. Sync frequency and auto-registration behavior are configurable per provider. Changes are logged through observability connectors.
Background process that probes configured providers at a configurable interval. Records latency, success/failure, and error codes; maintains rolling availability scores; feeds results into the rotation policy for proactive deactivation. Probe frequency, timeout, and failure thresholds are configurable per provider.
Both are pluggable; pre-shipped implementations and extension points in ConnectorCatalogue.md.
ModelMesh Lite ships with a User Manual (integration, configuration, deployment), a Developer Manual (custom connectors, extending the hierarchy, contributing), and an extensive sample collection.
| Feature | Description |
|---|---|
| OpenAI-compatible interface | drop-in replacement for any OpenAI SDK client; virtual model names abstract pools, policies, and providers |
| Unified API | single integration point for multiple providers |
| Model capability hierarchy | structured, extensible AI task taxonomy |
| Capability-based model pools | group models by capability, static or dynamic membership |
| Model rotation policies | pluggable per-pool lifecycle: deactivation, recovery, and selection with model- and provider-level actions |
| Rate-limit-aware selection | proactive throttling avoidance and load balancing across models by absolute or relative capacity |
| Intelligent retry | configurable backoff before rotation; reduces false rotations on transient errors |
| Discovery connectors | automatic model catalogue sync and continuous provider health monitoring |
| Delivery modes | synchronous, streaming, batch |
| Free-tier aggregation | combine free quotas across providers |
| Declarative configuration | YAML + runtime API with per-pool policies |
| Credential management | pluggable secure API key and token resolution |
| Persistent storage | pluggable backends for state, configuration, and logs (local, S3, Google Drive, Redis) |
| Pluggable architecture | extensible connector interfaces at every point, with runtime package loading |
| OpenAI-compatible proxy | build script packages library and configuration into a Docker container exposing standard OpenAI API |
| Observability | routing decisions, logging, aggregate statistics |
| Documentation and samples | user manual, developer manual, and extensive sample collection |
| Connector Development Kit | base classes, mixins, and test utilities for building custom connectors with minimal code |
| Cross-language | Python and TypeScript |
The capability taxonomy and predefined pools are in ModelCapabilities.md. Runtime objects and services (Router, CapabilityPool, Model, Provider, and others) are in SystemServices.md with individual service docs in system/. Connector interface definitions are in ConnectorInterfaces.md with full interface specifications in interfaces/. The full connector and provider catalogue is in ConnectorCatalogue.md with individual connector docs in connectors/. YAML configuration reference is in SystemConfiguration.md. The Connector Development Kit documentation is in cdk/Overview.md with base class reference in cdk/BaseClasses.md and tutorials in cdk/DeveloperGuide.md.
For a hands-on introduction with working code samples, see the FAQ and Quick Start guides.