Capability-driven AI model routing with automatic failover
Deploy ModelMesh as an OpenAI-compatible HTTP proxy. The proxy exposes a standard OpenAI REST API on a configurable port. Internally it leverages all ModelMesh capabilities: multi-provider routing, automatic failover, pool strategies, budget controls, and free-tier aggregation. Any OpenAI SDK client, curl, or plain fetch() call can talk to the proxy without modification.
Browser / SDK / curl
|
v (OpenAI REST API)
+-----------+
| Proxy | port 8080 (default)
+-----------+
|
v (ModelMesh routing)
+-----------+ +-----------+ +----------+
| Router | --> | Pool | --> | Model | --> Provider API
+-----------+ +-----------+ +----------+
The proxy translates incoming requests to ModelMesh routing calls. When the client sends model: "text-generation", ModelMesh resolves the pool, picks the best active model using the configured rotation strategy, retries with backoff on failure, and rotates to the next provider when one is down or rate-limited.
Python (pip):
pip install modelmesh-lite[yaml]
From source:
git clone https://github.com/ApartsinProjects/ModelMesh.git
cd ModelMesh
pip install -e ".[yaml]"
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GROQ_API_KEY="gsk_..."
With no config file, the proxy auto-detects providers from environment variables:
python -m modelmesh.proxy
The proxy starts on http://localhost:8080 and creates a default chat-completion pool from all detected providers.
# List available models (pool IDs)
curl http://localhost:8080/v1/models
# Send a chat completion
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "chat-completion",
"messages": [{"role": "user", "content": "Hello!"}]
}'
For full control over providers, models, pools, and strategies, use a YAML configuration file.
# modelmesh.yaml
secrets:
store: modelmesh.env.v1 # Read API keys from environment variables
providers:
openai.llm.v1:
api_key: ${secrets:OPENAI_API_KEY}
models:
gpt-4o-mini:
provider: openai.llm.v1
capabilities:
- generation.text-generation.chat-completion
pools:
chat:
capability: generation.text-generation.chat-completion
strategy: stick-until-failure
# modelmesh.yaml
secrets:
store: modelmesh.env.v1
providers:
openai.llm.v1:
api_key: ${secrets:OPENAI_API_KEY}
budget:
daily_limit: 5.00
anthropic.claude.v1:
api_key: ${secrets:ANTHROPIC_API_KEY}
budget:
daily_limit: 5.00
groq.api.v1:
api_key: ${secrets:GROQ_API_KEY}
models:
gpt-4o-mini:
provider: openai.llm.v1
capabilities:
- generation.text-generation.chat-completion
delivery:
synchronous: true
streaming: true
features:
tool_calling: true
structured_output: true
json_mode: true
system_prompt: true
constraints:
context_window: 128000
max_output_tokens: 16384
claude-3-5-haiku:
provider: anthropic.claude.v1
capabilities:
- generation.text-generation.chat-completion
delivery:
synchronous: true
streaming: true
features:
tool_calling: true
system_prompt: true
constraints:
context_window: 200000
max_output_tokens: 8192
llama-3.3-70b:
provider: groq.api.v1
capabilities:
- generation.text-generation.chat-completion
delivery:
synchronous: true
streaming: true
features:
tool_calling: true
system_prompt: true
constraints:
context_window: 131072
max_output_tokens: 32768
pools:
text-generation:
strategy: modelmesh.stick-until-failure.v1
capability: generation.text-generation
Start with config:
python -m modelmesh.proxy --config modelmesh.yaml
| Section | Purpose | Reference |
|---|---|---|
secrets |
Secret store backend (env vars, dotenv, cloud vaults) | SystemConfiguration |
providers |
Provider registration, API keys, budgets, rate limits | SystemConfiguration |
models |
Model definitions with capabilities, features, constraints | SystemConfiguration |
pools |
Capability pools with rotation strategies | SystemConfiguration |
| Connector ID | Provider | Capabilities |
|---|---|---|
openai.llm.v1 |
OpenAI | Chat, embeddings, audio |
anthropic.claude.v1 |
Anthropic | Chat |
groq.api.v1 |
Groq | Chat (fast inference) |
google.gemini.v1 |
Google Gemini | Chat |
deepseek.api.v1 |
DeepSeek | Chat |
mistral.api.v1 |
Mistral | Chat, embeddings |
together.api.v1 |
Together AI | Chat |
openrouter.api.v1 |
OpenRouter | Chat (multi-model gateway) |
xai.api.v1 |
xAI (Grok) | Chat |
cohere.api.v1 |
Cohere | Chat, embeddings |
See ConnectorCatalogue for all connectors and config schemas.
| Strategy | Connector ID | Behaviour |
|---|---|---|
| Stick-until-failure | modelmesh.stick-until-failure.v1 |
Use one model until it fails, then rotate (default) |
| Round-robin | modelmesh.round-robin.v1 |
Cycle through models in sequence |
| Cost-first | modelmesh.cost-first.v1 |
Always pick the model with lowest accumulated cost |
| Latency-first | modelmesh.latency-first.v1 |
Always pick the model with lowest observed latency |
| Priority | modelmesh.priority-selection.v1 |
Follow an ordered preference list with fallback |
| Session-stickiness | modelmesh.session-stickiness.v1 |
Route same-session requests to the same model |
| Rate-limit-aware | modelmesh.rate-limit-aware.v1 |
Track per-model quotas, switch before exhaustion |
| Load-balanced | modelmesh.load-balanced.v1 |
Distribute requests using weighted round-robin |
For custom strategies, see the FAQ — CDK extension guide. For full strategy config options, see the Connector Catalogue.
| Store | Connector ID | Source |
|---|---|---|
| Environment variables | modelmesh.env.v1 |
process.env / os.environ |
| Dotenv file | modelmesh.dotenv.v1 |
.env file |
| AWS Secrets Manager | aws.secrets-manager.v1 |
AWS cloud |
| Google Secret Manager | google.secret-manager.v1 |
GCP cloud |
| Azure Key Vault | microsoft.key-vault.v1 |
Azure cloud |
| 1Password Connect | 1password.connect.v1 |
1Password vault |
python -m modelmesh.proxy [OPTIONS]
| Flag | Default | Description |
|---|---|---|
--config PATH |
(auto-detect) | Path to YAML configuration file |
--host HOST |
0.0.0.0 |
Bind address |
--port PORT |
8080 |
Listen port |
--token TOKEN |
(none) | Bearer token for authentication |
--log-level LEVEL |
INFO |
Logging level: DEBUG, INFO, WARNING, ERROR |
Examples:
# Auto-detect providers from env vars
python -m modelmesh.proxy
# Custom config, port, and auth token
python -m modelmesh.proxy --config modelmesh.yaml --port 9090 --token my-secret
# Debug logging
python -m modelmesh.proxy --config modelmesh.yaml --log-level DEBUG
All endpoints follow the OpenAI REST API specification.
List available models (pool IDs exposed as virtual model names).
curl http://localhost:8080/v1/models
Response:
{
"object": "list",
"data": [
{ "id": "text-generation", "object": "model", "created": 1710000000, "owned_by": "modelmesh" }
]
}
Send a chat completion request. Use the pool ID as the model parameter.
Non-streaming:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "text-generation",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is ModelMesh?"}
],
"temperature": 0.7,
"max_tokens": 512
}'
Streaming:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "text-generation",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'
Streaming responses use Server-Sent Events (SSE) with chunked transfer encoding. Each chunk is a data: {json}\n\n line, terminated by data: [DONE].
curl -X POST http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"model": "embeddings", "input": "Hello world"}'
curl -X POST http://localhost:8080/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"model": "tts", "input": "Hello world"}'
curl -X POST http://localhost:8080/v1/audio/transcriptions \
-H "Content-Type: application/json" \
-d '{"model": "stt", "text": "audio data..."}'
Server health and status.
curl http://localhost:8080/health
Response:
{
"running": true,
"host": "0.0.0.0",
"port": 8080,
"uptime_seconds": 120.5,
"active_connections": 0,
"total_requests": 42
}
When the proxy is started with --token, all requests must include a bearer token:
curl -H "Authorization: Bearer my-secret" http://localhost:8080/v1/models
The proxy includes full CORS support (Access-Control-Allow-Origin: *) for browser access. No additional CORS proxy is needed.
| File | Purpose |
|---|---|
Dockerfile |
Python 3.12-slim image with ModelMesh + PyYAML |
docker-compose.yaml |
Service definition with port mapping and config mount |
modelmesh.yaml |
Pool and provider configuration |
.env |
API keys (gitignored, never committed) |
Option A: Pull pre-built image from GHCR (fastest):
docker pull ghcr.io/apartsinprojects/modelmesh:latest
# Run with environment variables
docker run -p 8080:8080 \
-e OPENAI_API_KEY="sk-..." \
-e ANTHROPIC_API_KEY="sk-ant-..." \
ghcr.io/apartsinprojects/modelmesh:latest \
--host 0.0.0.0 --port 8080
# Or run with config file and env file
docker run -p 8080:8080 \
--env-file .env \
-v ./modelmesh.yaml:/app/modelmesh.yaml:ro \
ghcr.io/apartsinprojects/modelmesh:latest \
--config /app/modelmesh.yaml --host 0.0.0.0 --port 8080
Option B: Docker Compose (recommended for development):
# 1. Create .env with your API keys
cat > .env << 'EOF'
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GROQ_API_KEY=gsk_...
EOF
# 2. Build and start
docker compose up --build
# 3. Or run in background
docker compose up --build -d
# 4. View logs
docker compose logs -f
# 5. Stop
docker compose down
Option C: Build Docker image locally:
# Build
docker build -t modelmesh-proxy .
# Run
docker run -p 8080:8080 \
-e OPENAI_API_KEY="sk-..." \
-e ANTHROPIC_API_KEY="sk-ant-..." \
-v ./modelmesh.yaml:/app/modelmesh.yaml:ro \
modelmesh-proxy \
--config /app/modelmesh.yaml --host 0.0.0.0 --port 8080
# Build + run (interactive, Ctrl+C to stop)
./scripts/proxy-up.sh
# Build + run in background
./scripts/proxy-up.sh --detach
# Stop
./scripts/proxy-down.sh
# Run smoke tests against running proxy
./scripts/proxy-test.sh
# Full cycle: build, start, test, stop
./scripts/proxy-test.sh --full
# Health check
curl http://localhost:8080/health
# List models
curl http://localhost:8080/v1/models
# Chat completion
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"text-generation","messages":[{"role":"user","content":"Hi"}]}'
# docker-compose.yaml
services:
modelmesh-proxy:
build: .
ports:
- "8080:8080"
env_file: .env
volumes:
- ./modelmesh.yaml:/app/modelmesh.yaml:ro
command: ["--config", "/app/modelmesh.yaml", "--host", "0.0.0.0", "--port", "8080"]
FROM python:3.12-slim
WORKDIR /app
COPY src/python/ ./
COPY pyproject.toml ./
RUN pip install . && pip install pyyaml>=6.0
EXPOSE 8080
ENTRYPOINT ["python", "-m", "modelmesh.proxy"]
CMD ["--host", "0.0.0.0", "--port", "8080"]
The proxy includes CORS support, so you can call it directly from browser JavaScript using fetch().
A complete browser test page is provided at samples/proxy-test/index.html. Open it directly in a browser (no build step needed).
Non-streaming request:
const response = await fetch('http://localhost:8080/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: 'text-generation',
messages: [{ role: 'user', content: 'Hello!' }],
}),
});
const data = await response.json();
console.log(data.choices[0].message.content);
Streaming request:
const response = await fetch('http://localhost:8080/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: 'text-generation',
messages: [{ role: 'user', content: 'Hello!' }],
stream: true,
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop();
for (const line of lines) {
const trimmed = line.trim();
if (trimmed === 'data: [DONE]') break;
if (!trimmed.startsWith('data: ')) continue;
const chunk = JSON.parse(trimmed.slice(6));
const token = chunk.choices?.[0]?.delta?.content;
if (token) process.stdout.write(token); // or append to DOM
}
}
Any OpenAI SDK client works with the proxy by changing the base URL:
Python (openai SDK):
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed", # or your proxy token
)
response = client.chat.completions.create(
model="text-generation",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
TypeScript (openai SDK):
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:8080/v1',
apiKey: 'not-needed',
});
const response = await client.chat.completions.create({
model: 'text-generation',
messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(response.choices[0].message.content);
You can also embed the proxy server in your own Python application:
from modelmesh.proxy.server import ProxyServer
# Start in background
server = ProxyServer(
config="modelmesh.yaml",
host="0.0.0.0",
port=8080,
token="my-secret",
)
server.start(block=False)
# Check status
print(server.get_status())
# Stop
server.stop()
| Symptom | Cause | Fix |
|---|---|---|
ModuleNotFoundError: yaml |
PyYAML not installed | pip install pyyaml or use pip install modelmesh-lite[yaml] |
Connection refused |
Proxy not running or wrong port | Check docker compose logs or start with --log-level DEBUG |
502 Routing error |
All providers failed | Check API keys in .env, verify provider connectivity |
401 Invalid token |
Bearer token mismatch | Pass correct --token or remove auth requirement |
| CORS error in browser | Proxy not reachable | Verify proxy URL and that the proxy is running |
Empty request body |
Missing Content-Type header | Add -H "Content-Type: application/json" to curl |
| Streaming not working | Client not reading SSE | Use ReadableStream API, not response.json() |
See also: FAQ · Quick Start · System Configuration