Capability-driven AI model routing with automatic failover
YAML configuration reference for ModelMesh Lite. The system is configured declaratively via YAML, programmatically via API, or both. Configuration can be serialized to and deserialized from storage connectors for centralized management and sharing across instances. For the runtime objects that consume this configuration see SystemServices.md. For a tutorial-style introduction, see the FAQ and Quick Start.
| Section | Purpose |
|---|---|
secrets |
Secret store backend and credential resolution |
providers |
Provider registration, authentication, quotas, and budgets |
models |
Explicit model definitions (capabilities, delivery, features, constraints) |
pools |
Capability pool definitions with per-pool rotation and retry configuration |
storage |
Persistent storage backend and sync policy |
observability |
Routing events, request logging, and aggregate statistics |
discovery |
Model registry sync and provider health monitoring |
connectors |
Custom connector package loading |
proxy |
OpenAI-compatible proxy deployment settings |
Configures the secret store backend. All credentials elsewhere in configuration are referenced by name (${secrets:key-name}) and resolved at initialization through the configured store.
| Attribute | Type | Description |
|---|---|---|
store |
string | Secret store connector type. Pre-shipped: modelmesh.env.v1 (default), modelmesh.dotenv.v1, aws.secrets-manager.v1, google.secret-manager.v1, microsoft.key-vault.v1, 1password.connect.v1. |
path |
string | File path for modelmesh.dotenv.v1 store. |
region |
string | Cloud region for cloud secret managers. |
Store-specific attributes (bucket, vault name, project ID, etc.) are passed through to the connector.
secrets:
store: aws.secrets-manager.v1
region: us-east-1
See ConnectorCatalogue.md — Secret Store for pre-shipped stores and deployment patterns.
Registers AI model providers and web API services. Each provider entry configures authentication, quota tracking, rate limits, budgets, and infrastructure capabilities. A provider can be enabled or disabled without removing its configuration.
| Attribute | Type | Description |
|---|---|---|
enabled |
boolean | Enable or disable the provider. Default: true. |
api_key |
string | API key or token. Use secret references: ${secrets:key-name}. |
base_url |
string | Custom API base URL (for self-hosted or proxy endpoints). |
connector |
string | Provider connector type. Defaults to provider name. |
| Attribute | Type | Description |
|---|---|---|
auth.method |
string | Authentication method: api_key, oauth, service_account. |
auth.key_rotation |
boolean | Enable automatic key rotation. |
| Attribute | Type | Description |
|---|---|---|
quota.query_current |
boolean | Provider API supports querying current usage. |
quota.query_remaining |
boolean | Provider API supports querying remaining capacity. |
quota.reset_schedule |
string | Quota reset frequency: monthly, daily, rolling. |
budget.daily_limit |
number | Daily spend cap in USD. |
budget.monthly_limit |
number | Monthly spend cap in USD. |
| Attribute | Type | Description |
|---|---|---|
discovery.enumerate_models |
boolean | Auto-discover models at startup. |
discovery.model_details |
boolean | Query model metadata (context window, pricing). |
discovery.capability_query |
boolean | Query which capabilities models support. |
| Attribute | Type | Description |
|---|---|---|
batch.supported |
boolean | Provider supports batch submissions. |
batch.max_items |
integer | Maximum requests per batch. |
files.upload |
boolean | Provider supports file uploads. |
files.max_size |
string | Maximum file size (e.g., 512MB). |
fine_tuning.supported |
boolean | Provider supports fine-tuning. |
providers:
openai.llm.v1:
enabled: true
api_key: ${secrets:openai-api-key}
budget:
daily_limit: 5.00
monthly_limit: 50.00
discovery:
enumerate_models: true
huggingface.inference.v1:
enabled: true
api_key: ${secrets:hf-api-key}
anthropic.claude.v1:
enabled: false
See ConnectorCatalogue.md — Provider for pre-shipped provider connectors and capability matrix.
Explicit model definitions supplement auto-discovered models. Each entry is a capability contract declaring what an application can expect. Models register at leaf nodes of the capability hierarchy and automatically join ancestor pools.
| Attribute | Type | Description |
|---|---|---|
provider |
string | Provider that serves this model. |
capabilities |
list | Capability leaf nodes (e.g., chat-completion, ocr, tool-calling). |
| Attribute | Type | Description |
|---|---|---|
delivery.synchronous |
boolean | Supports synchronous requests. |
delivery.streaming |
boolean | Supports streaming responses. |
delivery.batch |
boolean | Supports batch submissions. |
| Attribute | Type | Description |
|---|---|---|
batch.max_items |
integer | Maximum requests per batch. |
batch.max_payload |
string | Maximum total batch size. |
batch.completion_window |
duration | Expected turnaround time (e.g., 24h). |
batch.cost_discount |
float | Batch pricing relative to sync (e.g., 0.5 = 50% off). |
batch.callback |
boolean | Supports webhook notification on completion. |
batch.polling |
boolean | Supports status polling. |
batch.partial_results |
boolean | Can return completed items before batch finishes. |
| Attribute | Type | Description |
|---|---|---|
features.tool_calling |
boolean | Supports tool/function calling. |
features.structured_output |
boolean | Supports structured (JSON schema) output. |
features.json_mode |
boolean | Supports JSON mode. |
features.system_prompt |
boolean | Supports system prompts. |
features.grounding |
boolean | Supports grounded generation. |
features.logprobs |
boolean | Returns log probabilities. |
features.fine_tunable |
boolean | Can be fine-tuned. |
| Attribute | Type | Description |
|---|---|---|
constraints.context_window |
integer | Maximum context window in tokens. |
constraints.max_output_tokens |
integer | Maximum output tokens per request. |
constraints.max_images |
integer | Maximum images per request. |
constraints.max_file_size |
string | Maximum input file size. |
constraints.supported_languages |
list | Supported languages (ISO codes). |
Not all delivery modes are available for every capability. The matrix below shows supported combinations.
| Capability | Sync | Stream | Batch |
|---|---|---|---|
| chat-completion | yes | yes | yes |
| text-to-image | yes | — | yes |
| text-embeddings | yes | — | yes |
| speech-to-text | yes | — | yes |
| text-to-speech | yes | yes | yes |
| document-parsing | yes | — | yes |
| web-search | yes | — | — |
| content-moderation | yes | — | yes |
models:
gpt-4o:
provider: openai.llm.v1
capabilities:
- generation.text-generation.chat-completion
- generation.structured-generation.json-generation
- understanding.vision-understanding.image-captioning
- interaction.tool-calling
delivery:
synchronous: true
streaming: true
batch: true
batch:
max_items: 50000
completion_window: 24h
cost_discount: 0.5
features:
tool_calling: true
structured_output: true
json_mode: true
system_prompt: true
fine_tunable: true
constraints:
context_window: 128000
max_output_tokens: 16384
max_images: 20
Defines capability pools and their per-pool rotation, selection, and retry configuration. Each pool targets a node in the capability hierarchy and automatically includes all models registered at that node or its descendants.
Note: The current implementation supports
capability,models,providers, andstrategypool fields. Additional fields shown below are reserved for future releases.
| Attribute | Type | Description |
|---|---|---|
capability |
string | Capability node to target (e.g., generation.text-generation). Defaults to the pool name. |
providers |
list | Restrict pool to specific providers. |
excluded_providers |
list | Exclude specific providers from pool. |
model_priority |
list | Ordered model preference list. |
provider_priority |
list | Ordered provider preference list. |
| Attribute | Type | Description |
|---|---|---|
strategy |
string | Model selection strategy. Pre-shipped: modelmesh.stick-until-failure.v1 (default), modelmesh.priority-selection.v1, modelmesh.round-robin.v1, modelmesh.cost-first.v1, modelmesh.latency-first.v1, modelmesh.session-stickiness.v1, modelmesh.rate-limit-aware.v1, modelmesh.load-balanced.v1. |
fallback_strategy |
string | Strategy to use when primary list is exhausted. |
balance_mode |
string | For modelmesh.load-balanced.v1: distribute by absolute or relative capacity. |
Error-based:
| Attribute | Type | Description |
|---|---|---|
deactivation.retry_limit |
integer | Consecutive failures before deactivation. |
deactivation.error_rate_threshold |
float | Error rate over sliding window (0.0–1.0). |
deactivation.error_codes |
list | HTTP codes that count toward deactivation (e.g., [429, 500, 503]). |
Request-count-based:
| Attribute | Type | Description |
|---|---|---|
deactivation.request_limit |
integer | Max requests before deactivation. |
deactivation.token_limit |
integer | Max tokens before deactivation. |
deactivation.budget_limit |
number | Max spend (USD) before deactivation. |
Time-based:
| Attribute | Type | Description |
|---|---|---|
deactivation.quota_window |
string | Deactivate when quota period expires: monthly, daily. |
deactivation.maintenance_window |
string | Scheduled deactivation (cron expression). |
| Attribute | Type | Description |
|---|---|---|
recovery.cooldown |
duration | Time before standby model is reconsidered (e.g., 60s). |
recovery.probe_on_start |
boolean | Test standby models at library startup. |
recovery.probe_interval |
duration | Periodically test standby models (e.g., 300s). |
recovery.on_quota_reset |
boolean | Reactivate when provider quota resets. |
recovery.quota_reset_schedule |
string | Calendar schedule for quota resets: monthly, daily_utc. |
| Attribute | Type | Description |
|---|---|---|
retry.max_attempts |
integer | Retries on same model before rotating. |
retry.backoff |
string | Backoff strategy: fixed, exponential_jitter, retry_after. |
retry.initial_delay |
duration | First retry delay (e.g., 500ms). |
retry.max_delay |
duration | Maximum backoff delay (e.g., 10s). |
retry.retryable_codes |
list | HTTP codes eligible for retry (e.g., [429, 500, 502, 503]). |
retry.non_retryable_codes |
list | HTTP codes that skip retry and rotate immediately (e.g., [400, 401, 403]). |
retry.scope |
string | Retry scope: same_model, same_provider, any. |
retry.honor_retry_after |
boolean | Use provider’s Retry-After header when present. |
| Attribute | Type | Description |
|---|---|---|
rate_limit.threshold |
float | Switch models at this fraction of the limit (0.0–1.0, e.g., 0.8). |
rate_limit.min_delta |
duration | Minimum time between requests to the same model. |
rate_limit.max_rpm |
integer | Max requests per minute before switching models. |
Rate-limit-aware switches models preemptively when usage approaches a configurable threshold (rate_limit.threshold), with no deactivation — just a proactive switch. Load-balanced distributes requests by rate-limit headroom: absolute mode distributes evenly across models; relative mode distributes proportionally to each model’s known limit. Both use provider-reported rate data when available, falling back to local counting.
Provider-level actions deactivate or reactivate all models from a provider across all pools simultaneously.
| Attribute | Type | Description |
|---|---|---|
provider_deactivation |
string | Deactivate all models of a provider across all pools. Values: on_auth_failure, on_api_outage. |
provider_recovery |
string | Reactivate all models when provider recovers. Values: on_probe_success, on_manual. |
pools:
text-generation:
strategy: modelmesh.cost-first.v1
deactivation:
retry_limit: 3
error_codes: [429, 500, 503]
recovery:
cooldown: 60s
on_quota_reset: true
retry:
max_attempts: 2
backoff: exponential_jitter
initial_delay: 500ms
scope: same_provider
image-generation:
strategy: modelmesh.stick-until-failure.v1
provider_priority: [huggingface.inference.v1, openrouter.gateway.v1, openai.llm.v1]
code-review:
capability: generation.text-generation.code-generation
strategy: modelmesh.priority-selection.v1
model_priority: [gpt-4o, claude-sonnet-4]
fallback_strategy: modelmesh.cost-first.v1
See ConnectorCatalogue.md — Rotation Policies for pre-shipped strategies.
Configures the persistent storage backend and sync policy. State, configuration, and observability logs flow through this connector.
| Attribute | Type | Description |
|---|---|---|
connector |
string | Storage connector type. Pre-shipped: modelmesh.local-file.v1 (default), aws.s3.v1, google.drive.v1, redis.redis.v1. |
sync_policy |
string | When to persist: in-memory, sync-on-boundary, periodic, immediate. |
sync_interval |
duration | Interval for periodic sync (e.g., 300s). |
Connector-specific attributes (path, bucket, credentials, etc.) are passed through.
storage:
connector: modelmesh.local-file.v1
path: ./mesh-state.json
sync_policy: sync-on-boundary
storage:
connector: aws.s3.v1
bucket: my-modelmesh-state
key: state.json
region: us-east-1
sync_policy: periodic
sync_interval: 300s
See ConnectorCatalogue.md — Storage for pre-shipped backends.
Configures routing event export, request logging, and aggregate statistics. Each sub-section can use a different connector; multiple connectors can be active simultaneously.
| Attribute | Type | Description |
|---|---|---|
routing.connector |
string | Observability connector: modelmesh.console.v1 (default), modelmesh.local-file.v1, modelmesh.webhook.v1. |
routing.url |
string | Webhook URL (for modelmesh.webhook.v1 connector). |
routing.path |
string | File path (for modelmesh.local-file.v1 connector). |
| Attribute | Type | Description |
|---|---|---|
logging.connector |
string | Observability connector type. |
logging.level |
string | Detail level: metadata, summary, full. |
logging.path |
string | File path (for modelmesh.local-file.v1 connector). |
Levels:
metadata — timestamps, model, provider, token counts, latency, statussummary — metadata + truncated prompt/responsefull — metadata + complete payloads| Attribute | Type | Description |
|---|---|---|
statistics.connector |
string | Observability connector type. |
statistics.path |
string | File path (for modelmesh.local-file.v1 connector). |
statistics.flush_interval |
duration | Interval to flush buffered metrics (e.g., 60s). |
Recorded metrics (per model, provider, and pool): requests_total, requests_success, requests_failed, tokens_in, tokens_out, cost_total, latency_avg, latency_p95, downtime_total, standby_events, quota_resets, rotation_events.
observability:
routing:
connector: modelmesh.webhook.v1
url: https://my-app.com/hooks/mesh
logging:
connector: modelmesh.local-file.v1
level: metadata
path: ./requests.jsonl
statistics:
connector: modelmesh.local-file.v1
path: ./stats.json
flush_interval: 60s
Each routing decision records: requested capability, resolved pool, selected model/provider, delivery mode, replaced provider (if rotated), rotation reason, fallback chain, and routing latency.
Note: The
mesh.stats()API is planned for a future release. Statistics are currently available through the observability connector’s raw output (JSONL records with"type": "stats").
See ConnectorCatalogue.md — Observability for pre-shipped connectors.
Configures automatic model catalogue synchronization and provider health monitoring. Both run as background processes on configurable schedules.
| Attribute | Type | Description |
|---|---|---|
sync.enabled |
boolean | Enable registry synchronization. |
sync.interval |
duration | Sync frequency (e.g., 1h). |
sync.auto_register |
boolean | Automatically register discovered models. |
sync.providers |
list | Providers to sync (default: all enabled). |
| Attribute | Type | Description |
|---|---|---|
health.enabled |
boolean | Enable health monitoring. |
health.interval |
duration | Probe frequency (e.g., 60s). |
health.timeout |
duration | Probe timeout (e.g., 10s). |
health.failure_threshold |
integer | Consecutive failures before deactivation. |
health.providers |
list | Providers to probe (default: all enabled). |
discovery:
sync:
enabled: true
interval: 1h
auto_register: true
health:
enabled: true
interval: 60s
timeout: 10s
failure_threshold: 3
See ConnectorCatalogue.md — Discovery for pre-shipped connectors.
Configures custom connector loading. Connector packages are zip archives containing connector code, metadata, and configuration schema.
| Attribute | Type | Description |
|---|---|---|
packages |
list | Paths or URLs to connector packages (zip archives). |
connectors:
packages:
- ./connectors/my-custom-provider.zip
- https://registry.example.com/connectors/pg-storage-1.0.zip
Custom connectors register in the same catalogue and receive the same treatment as pre-shipped ones. See SystemConcept.md — Connector-Based Extensibility.
Configures the OpenAI-compatible proxy deployment. The build script packages the library with selected connectors, policies, and this configuration into a Docker image.
| Attribute | Type | Description |
|---|---|---|
host |
string | Bind address (e.g., 0.0.0.0). |
port |
integer | Listen port (e.g., 8080). |
endpoints |
list | OpenAI API endpoints to expose (e.g., /v1/chat/completions, /v1/embeddings). Default: all supported. |
auth |
object | Proxy-level authentication for incoming requests. |
cors |
object | CORS settings for browser clients. |
proxy:
host: 0.0.0.0
port: 8080
endpoints:
- /v1/chat/completions
- /v1/embeddings
- /v1/audio/speech
auth:
method: bearer
tokens:
- ${secrets:proxy-token}
See SystemConcept.md — OpenAI-Compatible Proxy and Deployment Modes.
secrets:
store: modelmesh.dotenv.v1
path: ./.env
providers:
openai.llm.v1:
api_key: ${secrets:OPENAI_API_KEY}
budget:
daily_limit: 5.00
discovery:
enumerate_models: true
huggingface.inference.v1:
api_key: ${secrets:HF_API_KEY}
deepseek.api.v1:
api_key: ${secrets:DEEPSEEK_API_KEY}
pools:
text-generation:
strategy: modelmesh.cost-first.v1
deactivation:
retry_limit: 3
recovery:
cooldown: 60s
on_quota_reset: true
retry:
max_attempts: 2
backoff: exponential_jitter
scope: same_provider
image-generation:
strategy: modelmesh.round-robin.v1
provider_priority: [huggingface.inference.v1, openai.llm.v1]
storage:
connector: modelmesh.local-file.v1
path: ./mesh-state.json
sync_policy: sync-on-boundary
observability:
routing:
connector: modelmesh.console.v1
logging:
connector: modelmesh.local-file.v1
level: metadata
path: ./requests.jsonl
statistics:
connector: modelmesh.local-file.v1
path: ./stats.json
flush_interval: 60s
discovery:
sync:
enabled: true
interval: 1h
auto_register: true
health:
enabled: true
interval: 60s
timeout: 10s
failure_threshold: 3
Configuration is loaded at initialization. The runtime API provides read-only introspection of the mesh state.
# Initialize with a configuration dict (typically loaded from YAML)
import yaml
with open("config.yaml") as f:
config = yaml.safe_load(f)
mesh.initialize(config)
# Introspect runtime state
mesh.pool_status() # Per-pool health and model counts
mesh.active_providers() # Currently active provider connectors
mesh.list_pools() # Configured pool names and capabilities
mesh.list_models() # All registered models with status
Planned (not yet implemented):
ModelMesh.from_yaml(),mesh.add_provider(),mesh.save_config(),ModelMesh.from_storage(),mesh.export_state(),mesh.import_state(),mesh.stats(). These APIs are reserved for future releases.
Custom connectors are registered through the connector catalogue and referenced by ID in configuration. See SystemConcept.md – Connector-Based Extensibility and the Connector Development Kit for details.
# Provider
class MyProvider(ProviderConnector):
def complete(self, request): ...
def check_quota(self): ...
# Secret store
class VaultStore(SecretStore):
def get(self, name): ...
# Storage
class PgStorage(StorageConnector):
def load(self): ...
def save(self, data): ...
modelmesh secrets set openai-api-key "sk-..." --store aws.secrets-manager.v1
modelmesh secrets import .env --store aws.secrets-manager.v1
modelmesh secrets list --store aws.secrets-manager.v1
Note:
mesh.export_state()andmesh.import_state()are planned for a future release. State persistence is currently handled automatically through the configured storage connector and sync policy.
Request: "parse 500 invoice PDFs, return structured JSON"
1. Capability resolution → document-understanding.document-parsing
2. Pool selection → models at document-parsing leaf
3. Delivery mode filter → batch-capable models on batch-capable providers only
4. Provider state filter → exclude standby providers
5. Strategy application → cost-first → Claude Sonnet (Anthropic)
6. Intelligent retry → on transient failure, retry with backoff → rotate to GPT-4o (OpenAI)
See also: FAQ · Quick Start · Connector Catalogue · Connector Interfaces · System Concept