Capability-driven AI model routing with automatic failover
ModelMesh middleware lets you intercept requests and responses without modifying library internals. Use middleware for logging, request transforms, response enrichment, caching, or custom error handling. Middleware runs inside the router’s request pipeline, after pool selection and before/after provider execution.
import modelmesh
from modelmesh import Middleware, MiddlewareContext
from modelmesh.interfaces.provider import CompletionRequest, CompletionResponse
class LoggingMiddleware(Middleware):
async def before_request(self, request, context):
print(f"Routing to {context.model_id}")
return request
async def after_response(self, response, context):
print(f"Got {response.usage.total_tokens} tokens")
return response
client = modelmesh.create("chat", middleware=[LoggingMiddleware()])
import { create, Middleware, MiddlewareContext } from '@nistrapa/modelmesh-core';
class LoggingMiddleware extends Middleware {
async beforeRequest(request, context) {
console.log(`Routing to ${context.modelId}`);
return request;
}
async afterResponse(response, context) {
console.log(`Got ${response.usage?.totalTokens} tokens`);
return response;
}
}
const client = create('chat', { middleware: [new LoggingMiddleware()] });
Middleware hooks are called in a pipeline around each provider call:
beforeRequest (A → B → C) ← forward order
│
▼
provider.complete()
│
▼
afterResponse (C → B → A) ← reverse order (onion model)
If the provider throws an error:
onError (A → B → C) ← forward order, first handler wins
Override any combination of these three hooks:
before_request(request, context) → requestCalled before the provider receives the request. Return the (possibly modified) request to proceed.
after_response(response, context) → responseCalled after a successful response. Return the (possibly modified) response.
on_error(error, context) → responseCalled when the provider raises an error. Either:
CompletionResponse to suppress the errorEvery hook receives a context object with metadata about the current routing decision:
| Field | Type | Description |
|---|---|---|
model_id |
str |
Real model identifier selected |
provider_id |
str |
Connector ID of the provider |
pool_name |
str |
Virtual model / pool name |
attempt |
int |
Current retry attempt (1-based) |
timestamp |
float |
When the request was initiated |
metadata |
dict |
Arbitrary key-value store for chaining |
Add custom headers or modify parameters:
class AddMetadata(Middleware):
async def before_request(self, request, context):
# Add a custom field to the request
context.metadata["request_id"] = str(uuid.uuid4())
return request
Add computed fields to responses:
class TimingMiddleware(Middleware):
async def before_request(self, request, context):
context.metadata["start_time"] = time.time()
return request
async def after_response(self, response, context):
elapsed = time.time() - context.metadata["start_time"]
context.metadata["latency_ms"] = elapsed * 1000
return response
Return a cached or default response on failure:
class FallbackMiddleware(Middleware):
def __init__(self, default_response):
self._default = default_response
async def on_error(self, error, context):
if context.attempt >= 3:
return self._default # Return fallback after 3 failures
raise error # Let the router retry
Stack multiple middleware — they execute in registration order:
client = modelmesh.create("chat", middleware=[
AuthMiddleware(), # Runs first (before_request)
LoggingMiddleware(), # Runs second (before_request)
CachingMiddleware(), # Runs third (before_request)
])
# after_response runs in reverse: Caching → Logging → Auth
For advanced use, you can create a MiddlewareStack directly:
from modelmesh import MiddlewareStack
stack = MiddlewareStack()
stack.add(LoggingMiddleware())
stack.add(CachingMiddleware())
# Use programmatically
result = await stack.run_before_request(request, context)
result = await stack.run_after_response(response, context)
result = await stack.run_on_error(error, context)
Middleware is entirely optional. Existing code that doesn’t use middleware continues to work identically. The middleware parameter defaults to None, and the router skips middleware hooks when none are configured.
See also: FAQ · Quick Start · System Configuration