Capability-driven AI model routing with automatic failover
ModelMesh Lite is composed of a set of cooperating runtime services that together implement capability-based routing, model lifecycle management, provider abstraction, and observability. This document describes the overall architecture, initialization sequence, request flow, and service groupings. For conceptual foundations see SystemConcept.md; for YAML configuration see SystemConfiguration.md.
ModelMesh (facade)
├── Router
│ ├── RoutingPipeline
│ │ ├── CapabilityResolver → CapabilityTree
│ │ ├── DeliveryFilter
│ │ ├── StateFilter
│ │ └── SelectionStrategy
│ ├── RetryPolicy
│ └── CapabilityPool[]
│ ├── Model[] → ModelState
│ ├── Provider[] → ProviderState
│ └── RotationPolicy
│ ├── DeactivationEvaluator
│ ├── RecoveryEvaluator
│ └── SelectionStrategy
├── ModelRegistry
├── ConnectorRegistry
├── OpenAIClient → Router
├── ProxyServer → Router
├── EventEmitter → ObservabilityConnector[]
├── RequestLogger → ObservabilityConnector[]
├── StatisticsCollector → ObservabilityConnector[]
├── SecretResolver → SecretStoreConnector
└── StateManager → StorageConnector
The following steps execute in order when ModelMesh.initialize(config) is called:
MeshConfig structure.ConnectorRegistry and load all built-in and custom connector packages.SecretResolver with the configured secret store connector. Resolve all ${secrets:name} references in the configuration.StateManager with the configured storage connector. Call load() to restore ModelState, ProviderState, and pool memberships from the previous session.CapabilityTree from the default hierarchy plus any custom extensions declared in configuration.ModelRegistry with model definitions from configuration. Instantiate Provider wrappers for each configured provider.CapabilityPool instances for each configured pool, assign models based on capability node membership, and attach RotationPolicy components.RoutingPipeline with default stages (CapabilityResolver, pool selection, DeliveryFilter, StateFilter, SelectionStrategy, RetryPolicy).Router with the assembled pipeline and pool set.EventEmitter, RequestLogger, and StatisticsCollector with their configured observability connectors.A typical synchronous completion request follows this path:
OpenAIClient.chat.completions.create(model="text-generation", messages=[...]) or sends an HTTP request to ProxyServer at POST /v1/chat/completions."text-generation" is passed to Router.complete() as the capability identifier.Router invokes RoutingPipeline.execute(), which runs each stage in sequence:
"text-generation" to matching CapabilityPool instances using the CapabilityTree.Router sends the request to the selected Provider.execute(), which delegates to the underlying provider connector.Router. RequestLogger records the request, StatisticsCollector buffers metrics, and ModelState is updated.RetryPolicy determines whether to retry the same model (with backoff) or rotate to the next candidate. If deactivation thresholds are reached, the RotationPolicy moves the model to standby and EventEmitter publishes a model_deactivated event.| Group | Services | Purpose |
|---|---|---|
| Facade | ModelMesh | Library entry point; initializes and wires all subsystems |
| Routing | Router, RoutingPipeline, CapabilityResolver, DeliveryFilter, StateFilter, RetryPolicy | Request orchestration, pipeline stages, and retry logic |
| Pools & Models | CapabilityPool, Model, ProviderService, ModelState, ProviderState | Model grouping, runtime state, and provider abstraction |
| Rotation | RotationPolicyService, DeactivationEvaluator, RecoveryEvaluator, SelectionStrategy | Deactivation, recovery, and selection governance |
| Registries | ModelRegistry, ConnectorRegistry, CapabilityTree | Model catalogue, connector catalogue, capability hierarchy |
| External Interfaces | OpenAIClient, ProxyServer | Application-facing API surfaces |
| Observability | EventEmitter, RequestLogger, StatisticsCollector | Events, logging, and metrics |
| Infrastructure | SecretResolver, StateManager | Secret resolution and state persistence |