Capability-driven AI model routing with automatic failover
One integration point for all your AI providers.
Automatic failover, free-tier aggregation, and capability-based routing.
Your application requests a capability (e.g. “chat completion”). ModelMesh picks the best available provider, rotates on failure, and chains free quotas across providers – all behind a standard OpenAI SDK interface.
Python:
pip install modelmesh-lite # core (zero dependencies)
pip install modelmesh-lite[yaml] # + YAML config support
TypeScript / Node.js:
npm install @nistrapa/modelmesh-core
Docker Proxy (any language):
# Option A: Pull pre-built image from GitHub Container Registry
docker pull ghcr.io/apartsinprojects/modelmesh:latest
# Option B: Build from source
git clone https://github.com/ApartsinProjects/ModelMesh.git
cd ModelMesh
cp .env.example .env # add your API keys
docker compose up --build
# Proxy at http://localhost:8080 — speaks the OpenAI REST API
Set an API key and go:
export OPENAI_API_KEY="sk-..."
import modelmesh
client = modelmesh.create("chat-completion")
response = client.chat.completions.create(
model="chat-completion", # virtual model name = capability pool
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
import { create } from "@nistrapa/modelmesh-core";
const client = create("chat-completion");
const response = await client.chat.completions.create({
model: "chat-completion",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);
client.chat.completions.create(model="chat-completion", ...)
|
v
+-----------+ +-----------+ +----------+
| Router | --> | Pool | --> | Model | --> Provider API
+-----------+ +-----------+ +----------+
Resolves the Groups models Selects best Sends request,
capability to that can do active model handles retry
a pool the task (rotation policy) and failover
"chat-completion" resolves to a pool containing all models that support chat. The pool’s rotation policy picks the best active model. If it fails, the router retries with backoff, then rotates to the next model. When a provider’s free quota runs out, rotation automatically moves to the next provider.
Add more API keys – ModelMesh chains them automatically:
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="AI..."
client = modelmesh.create("chat-completion")
# Inspect the providers behind the virtual model
print(client.describe())
# Pool "chat-completion" (strategy: stick-until-failure)
# capability: generation.text-generation.chat-completion
# → openai.gpt-4o [openai.llm.v1] (active)
# openai.gpt-4o-mini [openai.llm.v1] (active)
# anthropic.claude-sonnet-4 [anthropic.claude.v1] (active)
# google.gemini-2.0-flash [google.gemini.v1] (active)
Same client.chat.completions.create() call – but now if OpenAI is down or its quota is exhausted, the request routes to Anthropic, then Gemini.
For full control, use a configuration file:
# modelmesh.yaml
providers:
openai.llm.v1:
connector: openai.llm.v1
config:
api_key: "${secrets:OPENAI_API_KEY}"
anthropic.claude.v1:
connector: anthropic.claude.v1
config:
api_key: "${secrets:ANTHROPIC_API_KEY}"
models:
openai.gpt-4o:
provider: openai.llm.v1
capabilities:
- generation.text-generation.chat-completion
anthropic.claude-sonnet-4:
provider: anthropic.claude.v1
capabilities:
- generation.text-generation.chat-completion
pools:
chat:
capability: generation.text-generation.chat-completion
strategy: stick-until-failure
client = modelmesh.create(config="modelmesh.yaml")
Ten reasons to add ModelMesh to your next project.
| # | Value | Feature | How It Delivers |
|---|---|---|---|
| 1 | Integrate in two minutes, scale the configuration as you grow | Progressive Configuration | Env vars for instant start. YAML for providers, pools, strategies, budgets, secrets. Programmatic for dynamic setups. All three compose seamlessly |
| 2 | One familiar API across every provider you will ever use | Uniform OpenAI-Compatible API | Same client.chat.completions.create() for OpenAI, Anthropic, Gemini, DeepSeek, Mistral, Ollama, or custom models. Chat, embeddings, TTS, STT, image generation. Swap providers in config, never in code |
| 3 | Chain free tiers so you never hit a quota wall | Free-Tier Aggregation | Set free API keys, call create("chat"). The library detects providers, pools them by capability, and rotates silently when a quota exhausts. Your code sees one provider; ModelMesh manages the rotation |
| 4 | Provider goes down, your app stays up | Resilient Routing | Multiple rotation strategies: cost-first, latency-first, round-robin, sticky, rate-limit-aware. On failure the router deactivates the model, selects the next candidate, and retries within the same request |
| 5 | Request capabilities, not model names | Capability Discovery | Ask for "chat-completion", not "gpt-4o". ModelMesh resolves to the best available model. New models appear, old ones deprecate, your code stays the same |
| 6 | Spending caps enforced before the overage, not after | Budget Enforcement | Real-time cost tracking per model and provider. Set daily or monthly limits in config. BudgetExceededError fires before the breaching request |
| 7 | One library for Python backend, TypeScript frontend, Docker proxy | Full-Stack Deployment | pip install, npm install, or docker run. Each exposes the same API with zero core dependencies. One config file drives all deployment modes |
| 8 | Test AI code like regular code | Mock Client and Testing | mock_client(responses=[...]) returns an identical API with zero network calls and millisecond execution. Typed exceptions carry structured metadata. client.explain() dry-runs routing decisions |
| 9 | Production-grade observability without extra plumbing | Observability Connectors | Pre-built sinks for console, file, JSON-log, Prometheus, and webhooks. Structured traces across routing, failover, and budget events. Plug in custom callbacks for existing dashboards |
| 10 | When pre-built doesn’t fit, extend without forking | CDK | Base classes for providers, rotation policies, secret stores, storage backends, and observability sinks. Inherit, override what you need, ship as a reusable package |
| Document | Description |
|---|---|
| FAQ | Ten questions developers ask before adopting, each with a working code tutorial |
| Developer Quick Start | Get productive in 5 minutes: all features walkthrough with cheat sheet |
| Document | Description |
|---|---|
| System Concept | Architecture, design, and full feature overview |
| Model Capabilities | Capability hierarchy tree and predefined pools |
| System Configuration | Full YAML configuration reference |
| Connector Catalogue | All pre-shipped connectors with config schemas |
| Document | Description |
|---|---|
| Error Handling | Exception hierarchy, catch patterns, retry guidance |
| Middleware | Write custom middleware: logging, transforms, caching, error fallbacks |
| Testing | Unit testing with mock_client() — no API keys needed |
| Capabilities | Discover, resolve, and search capability aliases |
| Audio (TTS/STT) | AudioRequest/AudioResponse types, client.audio namespace |
| Document | Description |
|---|---|
| Proxy Guide | Deploy as OpenAI-compatible proxy: Docker, CLI, config, browser access |
| Browser Usage | BrowserBaseProvider, CORS proxy setup, and browser-specific patterns |
| AI Agent Integration | Guide for AI coding agents (Claude Code, Cursor, etc.) to integrate ModelMesh |
| Document | Description |
|---|---|
| Connector Interfaces | Interface definitions for all connector types |
| Provider | Provider connector interface spec |
| Rotation Policy | Rotation policy interface spec |
| Secret Store | Secret store interface spec |
| Storage | Storage backend interface spec |
| Observability | Observability connector interface spec |
| Discovery | Discovery connector interface spec |
| Document | Description |
|---|---|
| Overview | Runtime architecture and object graph |
| System Services | Router, Pool, Model, and State runtime objects |
| Router | Request routing and retry logic |
| Capability Pool | Pool lifecycle and model selection |
| State Manager | State persistence and recovery |
| Event Emitter | Event system for routing, failover, and budget events |
| Document | Description |
|---|---|
| CDK Overview | Architecture and class hierarchy |
| Base Classes | Reference for all CDK base classes |
| Developer Guide | Tutorials: build your own connectors |
| Convenience Layer | QuickProvider and zero-config setup |
| Mixins | Cache, metrics, rate limiter, HTTP client |
| Collection | Description |
|---|---|
| Quickstart | 12 progressive examples in Python and TypeScript |
| System Integration | Multi-provider, streaming, embeddings, cost optimization |
| CDK Tutorials | Build providers, rotation policies, and more |
| Custom Connectors | Full custom connector examples for all 6 types |
| Proxy Test | Vanilla JS browser test page for the OpenAI proxy |
# Clone and install dev dependencies
git clone https://github.com/ApartsinProjects/ModelMesh.git
cd ModelMesh
# Run tests
pip install pytest
cd src/python && python -m pytest ../../tests/ -v
Created by Sasha Apartsin