ModelMesh Lite

Capability-driven AI model routing with automatic failover

View the Project on GitHub ApartsinProjects/ModelMesh

Connector Interfaces

Interface overview for every ModelMesh Lite connector type. Each section describes the connector’s purpose and the interfaces it exposes. This is a conceptual overview, not a full specification. Full interface definitions with code are in interfaces/. Pre-shipped implementations are listed in ConnectorCatalogue.md. For a tutorial on building custom connectors, see the FAQ — CDK guide.

CDK: Base classes with sensible defaults for each interface are available in the Connector Development Kit. See cdk/BaseClasses.md for implementations.


Provider

A provider connector exposes one or more AI models (or web API services) through a uniform, OpenAI-compatible API. It bridges the gap between the library’s abstract capability model and the provider’s concrete API: translating requests, managing authentication, tracking usage, and reporting operational data that drives routing and rotation decisions.

Interfaces:

Interface Purpose Key methods
Model Execution Execute requests through an OpenAI-compatible API (chat, embeddings, audio, images). Handle authentication, format translation, and streaming. complete, stream
Capabilities Declare which capabilities, delivery modes, and features the provider supports. The router uses this to match requests to eligible providers. get_capabilities, supports
Model Catalogue List available models with their attributes (context window, pricing, supported modalities). Feeds pool membership and model definitions. list_models, get_model_info
Quota & Rate Limits Report current usage, remaining capacity, and rate-limit headroom. Enables proactive rotation before limits are hit. check_quota, get_rate_limits
Cost & Pricing Provide per-token and per-request cost metadata. Feeds cost-first selection and budget-based deactivation. get_pricing, report_usage
Error Classification Classify provider errors as retryable (timeout, 500, 503) or non-retryable (400, 401, 403). Feeds intelligent retry and rotation decisions. classify_error, is_retryable
Infrastructure (optional) Batch processing, file upload, fine-tuning, and model discovery. Not all providers support these; connectors declare which are available. submit_batch, upload_file, create_fine_tune, discover_models

Infrastructure Capability Categories

Category Examples
Discovery enumerate models, query model details
Quota & Usage query current quota, usage history, remaining budget
Pricing query model pricing, batch discounts
Batch Operations submit, cancel, query status, retrieve results
File Management upload, list, delete files
Fine-Tuning create job, monitor training, deploy
Authentication API key, OAuth, service account

Routing integration: enumerate_models feeds auto-discovery at startup; query_current_quota enables proactive rotation; query_pricing drives cost-first strategy; batch.supported enables batch-only routing; files.upload supports document workflows.

Common Configuration

Parameters shared by all provider connectors. Individual connectors may add connector-specific parameters (see ConnectorCatalogue.md).

Parameter Type Description
provider.execution.base_url string Custom API endpoint URL. Overrides the default for self-hosted or proxy deployments.
provider.execution.timeout duration Request timeout (e.g., 30s). Applied to all API calls.
provider.execution.max_retries integer Provider-level retries before reporting failure to the router.
provider.auth.method string Authentication method: api_key, oauth, service_account.
provider.auth.api_key string API key or secret reference (${secrets:key-name}).
provider.auth.key_rotation boolean Enable automatic key rotation.
provider.catalogue.auto_discover boolean Enumerate models from the provider API at startup.
provider.catalogue.refresh_interval duration Re-sync model catalogue on this schedule (e.g., 1h).
provider.quota.query_current boolean Provider API supports querying current usage.
provider.quota.query_remaining boolean Provider API supports querying remaining capacity.
provider.quota.reset_schedule string Quota reset frequency: monthly, daily, rolling.
provider.budget.daily_limit number Daily spend cap in USD. Triggers deactivation when exceeded.
provider.budget.monthly_limit number Monthly spend cap in USD.
provider.pricing.query boolean Provider API supports pricing queries.
provider.error.retryable_codes list HTTP status codes eligible for retry (e.g., [429, 500, 502, 503]).
provider.error.non_retryable_codes list HTTP codes that skip retry and trigger immediate rotation (e.g., [400, 401, 403]).
provider.infrastructure.batch boolean Provider supports batch submissions.
provider.infrastructure.files boolean Provider supports file upload/management.
provider.infrastructure.fine_tuning boolean Provider supports fine-tuning.
provider.enabled boolean Enable or disable the provider. Default: true.

Rotation Policy

A rotation policy governs model lifecycle within a pool through three independently replaceable components. Rotation operates at model level (individual model moves to standby) or provider level (all models from a provider deactivated across pools). Each component receives the current model state (failure counts, cooldown timers, quota usage, latency history) and makes decisions accordingly.

Interfaces:

Interface Purpose Key methods
Deactivation Evaluate whether an active model should move to standby. Triggered after each request or on state change (quota exhausted, error threshold, maintenance window). should_deactivate, get_reason
Recovery Evaluate whether a standby model should return to active. Triggered on timer, calendar event, probe result, or manual command. should_recover, get_recovery_schedule
Selection Choose the best model from active candidates for a given request. Considers cost, latency, rate-limit headroom, session affinity, or custom scoring. select, score

Common Configuration

Parameters shared by all rotation policies. Configured per pool; policies receive these through the pool context. Full YAML reference in SystemConfiguration.md — Pools.

Parameter Type Description
rotation.deactivation.retry_limit integer Consecutive failures before deactivation.
rotation.deactivation.error_rate_threshold float Error rate over sliding window (0.0–1.0) before deactivation.
rotation.deactivation.error_codes list HTTP codes that count toward deactivation (e.g., [429, 500, 503]).
rotation.deactivation.request_limit integer Max requests before deactivation (free-tier cap).
rotation.deactivation.token_limit integer Max tokens before deactivation.
rotation.deactivation.budget_limit number Max spend (USD) before deactivation.
rotation.deactivation.quota_window string Deactivate when quota period expires: monthly, daily.
rotation.deactivation.maintenance_window string Scheduled deactivation (cron expression).
rotation.recovery.cooldown duration Time from deactivation before reactivation (e.g., 60s).
rotation.recovery.probe_on_start boolean Test standby models at library startup.
rotation.recovery.probe_interval duration Periodically test standby models (e.g., 300s).
rotation.recovery.on_quota_reset boolean Reactivate when provider quota resets.
rotation.recovery.quota_reset_schedule string Calendar schedule for quota resets: monthly, daily_utc.
rotation.selection.model_priority list Ordered model preference list.
rotation.selection.provider_priority list Ordered provider preference list.
rotation.selection.fallback_strategy string Strategy after priority list exhausted.
rotation.selection.balance_mode string For load-balanced: absolute or relative distribution.
rotation.selection.rate_limit.threshold float Switch models at this fraction of the limit (0.0–1.0).
rotation.selection.rate_limit.min_delta duration Minimum time between requests to the same model.
rotation.selection.rate_limit.max_rpm integer Max requests per minute before switching models.
rotation.provider_deactivation string Deactivate all models of a provider across all pools: on_auth_failure, on_api_outage.
rotation.provider_recovery string Reactivate all models when provider recovers: on_probe_success, on_manual.

Secret Store

A secret store connector resolves API keys and tokens from a secure backend at runtime. Configuration references secrets by name (${secrets:openai-key}); the library resolves them through the configured store at initialization and on rotation (when a new provider is activated).

Interfaces:

Interface Purpose Key methods
Resolution Retrieve a secret value by name. The only required interface — all stores must support this. get
Management (optional) Store, list, and remove secrets. Used by the CLI utility for credential provisioning across environments. set, list, delete

Common Configuration

Parameters shared by all secret store connectors. Individual stores may add connector-specific parameters (see ConnectorCatalogue.md).

Parameter Type Description
secret-store.resolution.cache_enabled boolean Cache resolved secrets in memory. Default: true.
secret-store.resolution.cache_ttl duration Time-to-live for cached secrets (e.g., 300s).
secret-store.resolution.reload_on_rotation boolean Re-resolve secrets when a new provider is activated during rotation. Default: true.
secret-store.resolution.fail_on_missing boolean Fail initialization if a referenced secret is not found. Default: true.

Storage

A storage connector serializes and deserializes library data to an external backend. Three data types flow through it: state (model health, failure counts, cooldown timers, quota usage), configuration (providers, pools, policies, credential references), and observability logs (routing decisions, request records, statistics). Sync policies control when persistence occurs: in-memory, sync-on-boundary, periodic, or immediate.

Interfaces:

Interface Purpose Key methods
Persistence Read and write serialized data. The connector handles format and transport; the library handles serialization logic. load, save
Inventory Enumerate and remove stored entries. Used for cleanup, migration, and administrative tooling. list, delete
Stat Query Query metadata about stored entries (existence, size, last modified) without loading full content. Used for cache validation and change detection. stat, exists
Locking Acquire and release advisory locks on stored entries. Prevents concurrent writes in multi-instance deployments. Required for periodic and immediate sync policies. acquire, release, is_locked

Common Configuration

Parameters shared by all storage connectors. Individual connectors may add connector-specific parameters (see ConnectorCatalogue.md).

Parameter Type Description
storage.persistence.sync_policy string When to persist: in-memory, sync-on-boundary, periodic, immediate.
storage.persistence.sync_interval duration Interval for periodic sync (e.g., 300s).
storage.persistence.format string Serialization format: json (default), yaml, msgpack.
storage.persistence.compression boolean Compress serialized data before writing. Default: false.
storage.persistence.encryption boolean Encrypt data at rest using the configured secret store. Default: false.
storage.locking.enabled boolean Enable advisory locking for concurrent access. Default: true for multi-instance sync policies.
storage.locking.timeout duration Maximum time to wait for a lock (e.g., 30s).
storage.locking.retry_interval duration Interval between lock acquisition attempts (e.g., 1s).

Observability

An observability connector exports routing activity to an external output. Multiple connectors can be active simultaneously (e.g., webhook for alerts + file for dashboards). The library pushes data through the connector at four levels of detail.

Interfaces:

Interface Purpose Key methods
Events Publish routing decisions and state changes (model activated, deactivated, rotated, provider health change). emit
Logging Record request/response data at a configurable detail level (metadata only, truncated summary, or full payloads). log
Statistics Buffer and flush aggregate metrics (request counts, token usage, cost, latency, downtime per model/provider/pool). flush
Tracing Structured trace reporting with severity levels. All core components (Router, Pool, Mesh, BaseProvider) emit traces through this interface. trace

Common Configuration

Parameters shared by all observability connectors. Individual connectors may add connector-specific parameters (see ConnectorCatalogue.md).

Parameter Type Description
observability.events.filter list Event types to emit (e.g., [rotation, deactivation, recovery, health]). Default: all.
observability.events.include_metadata boolean Include model and provider metadata in event payloads. Default: true.
observability.logging.level string Detail level: metadata, summary, full. Default: metadata.
observability.logging.redact_secrets boolean Redact API keys and tokens from logged payloads. Default: true.
observability.logging.max_payload_size integer Truncate logged payloads exceeding this byte count.
observability.tracing.min_severity string Minimum severity level for trace entries: debug, info, warning, error, critical. Default: info. Entries below this threshold are discarded.
observability.statistics.flush_interval duration Interval to flush buffered metrics (e.g., 60s).
observability.statistics.retention duration Retention window for in-memory statistics (e.g., 7d).
observability.statistics.scopes list Aggregation scopes: model, provider, pool. Default: all.

Discovery

A discovery connector keeps the model catalogue accurate and provider health visible without manual intervention. Discovery connectors run as background processes on configurable schedules and feed results into the rotation policy for proactive deactivation.

Interfaces:

Interface Purpose Key methods
Registry Sync Synchronize the local model catalogue with provider APIs. Detect new models, deprecated models, and pricing changes. sync, get_sync_status
Health Monitoring Probe provider availability and performance. Record latency, error codes, and rolling availability scores. probe, get_health_report

Common Configuration

Parameters shared by all discovery connectors. Individual connectors may add connector-specific parameters (see ConnectorCatalogue.md).

Parameter Type Description
discovery.sync.enabled boolean Enable registry synchronization. Default: true.
discovery.sync.interval duration Sync frequency (e.g., 1h).
discovery.sync.auto_register boolean Automatically register newly discovered models. Default: true.
discovery.sync.providers list Providers to sync. Default: all enabled providers.
discovery.sync.on_new_model string Action on new model: register, notify, ignore. Default: register.
discovery.sync.on_deprecated_model string Action on deprecated model: deactivate, notify, ignore. Default: notify.
discovery.health.enabled boolean Enable health monitoring. Default: true.
discovery.health.interval duration Probe frequency (e.g., 60s).
discovery.health.timeout duration Probe timeout (e.g., 10s).
discovery.health.failure_threshold integer Consecutive failures before deactivation.
discovery.health.providers list Providers to probe. Default: all enabled providers.

Audio

Audio capabilities (text-to-speech, speech-to-text) are integrated into the provider interface through dedicated request and response types that bridge into the existing CompletionRequest/CompletionResponse pipeline. This allows audio providers (ElevenLabs TTS, AssemblyAI STT) to participate in the same rotation, failover, and pool routing as text-generation providers.

AudioRequest and AudioResponse Types

Type Purpose Key fields
AudioRequest Wraps a TTS or STT request for routing through the provider pipeline. input (text for TTS, audio buffer for STT), voice, format, model, language
AudioResponse Wraps audio provider output. audio (binary data or stream for TTS), text (transcript for STT), duration, usage

Audio connectors bridge these types into CompletionRequest/CompletionResponse internally. The provider’s complete() method receives a CompletionRequest whose extra field carries the audio-specific parameters; the response’s extra field carries audio-specific output. This preserves the uniform provider interface while supporting audio-specific data.

MeshClient Audio Namespace

The MeshClient exposes audio through an OpenAI SDK-compatible namespace:

Method Capability Description
client.audio.speech.create() generation.audio.text-to-speech Generate speech from text. Routes to TTS providers (ElevenLabs, OpenAI, Google Cloud).
client.audio.transcriptions.create() understanding.audio.speech-to-text Transcribe audio to text. Routes to STT providers (AssemblyAI, OpenAI Whisper, Groq).

Audio requests follow the same routing pipeline as text requests: capability resolution, pool selection, rotation policy, retry, and failover. Pools targeting generation.audio or understanding.audio collect all audio-capable models automatically.


See also: FAQ · Connector Catalogue · CDK Base Classes · System Configuration