ModelMesh Lite

Capability-driven AI model routing with automatic failover

View the Project on GitHub ApartsinProjects/ModelMesh

Rotation Policy Interface

A rotation policy governs model lifecycle within a pool through three independently replaceable components. Rotation operates at model level (individual model moves to standby) or provider level (all models from a provider deactivated across pools). Each component receives the current model state – failure counts, cooldown timers, quota usage, latency history – and makes decisions accordingly.

Reference: ConnectorInterfaces.md – Rotation Policy ConnectorCatalogue.md – Rotation Policies

Sub-Interfaces

Sub-Interface Required Purpose
Deactivation yes Evaluate whether an active model should move to standby
Recovery yes Evaluate whether a standby model should return to active
Selection yes Choose the best model from active candidates for a given request

Supporting Types

Python

from dataclasses import dataclass
from datetime import datetime
from enum import Enum
from typing import Optional


class ModelStatus(Enum):
    """Lifecycle status of a model within a pool."""
    ACTIVE = "active"
    STANDBY = "standby"


class DeactivationReason(Enum):
    """Reason a model was moved from active to standby."""
    ERROR_THRESHOLD = "error_threshold"
    QUOTA_EXHAUSTED = "quota_exhausted"
    BUDGET_EXCEEDED = "budget_exceeded"
    TOKEN_LIMIT = "token_limit"
    REQUEST_LIMIT = "request_limit"
    MAINTENANCE_WINDOW = "maintenance_window"
    MANUAL = "manual"


class RecoveryTrigger(Enum):
    """Trigger that caused a standby model to return to active."""
    COOLDOWN_EXPIRED = "cooldown_expired"
    QUOTA_RESET = "quota_reset"
    PROBE_SUCCESS = "probe_success"
    MANUAL = "manual"
    STARTUP_PROBE = "startup_probe"


@dataclass
class ModelSnapshot:
    """Point-in-time snapshot of a model's operational state.

    Passed to deactivation, recovery, and selection policies so they can
    make decisions based on current health, usage, and performance data.
    """
    model_id: str
    provider_id: str
    status: ModelStatus
    failure_count: int
    error_rate: float
    cooldown_remaining: Optional[float] = None
    quota_used: int = 0
    tokens_used: int = 0
    cost_accumulated: float = 0.0
    latency_avg: Optional[float] = None
    last_request: Optional[datetime] = None
    last_failure: Optional[datetime] = None


@dataclass
class SelectionResult:
    """Result of the selection strategy choosing a model for a request."""
    model_id: str
    provider_id: str
    score: float
    reason: str

TypeScript

/** Lifecycle status of a model within a pool. */
enum ModelStatus {
    ACTIVE = "active",
    STANDBY = "standby",
}

/** Reason a model was moved from active to standby. */
enum DeactivationReason {
    ERROR_THRESHOLD = "error_threshold",
    QUOTA_EXHAUSTED = "quota_exhausted",
    BUDGET_EXCEEDED = "budget_exceeded",
    TOKEN_LIMIT = "token_limit",
    REQUEST_LIMIT = "request_limit",
    MAINTENANCE_WINDOW = "maintenance_window",
    MANUAL = "manual",
}

/** Trigger that caused a standby model to return to active. */
enum RecoveryTrigger {
    COOLDOWN_EXPIRED = "cooldown_expired",
    QUOTA_RESET = "quota_reset",
    PROBE_SUCCESS = "probe_success",
    MANUAL = "manual",
    STARTUP_PROBE = "startup_probe",
}

/** Point-in-time snapshot of a model's operational state. */
interface ModelSnapshot {
    model_id: string;
    provider_id: string;
    status: ModelStatus;
    failure_count: number;
    error_rate: number;
    cooldown_remaining?: number;
    quota_used: number;
    tokens_used: number;
    cost_accumulated: number;
    latency_avg?: number;
    last_request?: Date;
    last_failure?: Date;
}

/** Result of the selection strategy choosing a model for a request. */
interface SelectionResult {
    model_id: string;
    provider_id: string;
    score: number;
    reason: string;
}

Interface Definitions

Python

from abc import ABC, abstractmethod


class DeactivationPolicy(ABC):
    """Evaluate whether an active model should move to standby.

    Triggered after each request or on state change (quota exhausted,
    error threshold reached, maintenance window entered).
    """

    @abstractmethod
    def should_deactivate(self, snapshot: ModelSnapshot) -> bool:
        """Return True if the model should be moved to standby."""
        ...

    @abstractmethod
    def get_reason(self, snapshot: ModelSnapshot) -> DeactivationReason | None:
        """Return the reason for deactivation, or None if the model should stay active."""
        ...


class RecoveryPolicy(ABC):
    """Evaluate whether a standby model should return to active.

    Triggered on timer, calendar event, probe result, or manual command.
    """

    @abstractmethod
    def should_recover(self, snapshot: ModelSnapshot) -> bool:
        """Return True if the model should be reactivated."""
        ...

    @abstractmethod
    def get_recovery_schedule(self, snapshot: ModelSnapshot) -> datetime | None:
        """Return the next scheduled recovery check time, or None if not scheduled."""
        ...


class SelectionStrategy(ABC):
    """Choose the best model from active candidates for a given request.

    Considers cost, latency, rate-limit headroom, session affinity,
    or custom scoring depending on the strategy implementation.
    """

    @abstractmethod
    def select(
        self, candidates: list[ModelSnapshot], request: "CompletionRequest"
    ) -> SelectionResult:
        """Select the best model from the candidate list for the given request."""
        ...

    @abstractmethod
    def score(
        self, candidate: ModelSnapshot, request: "CompletionRequest"
    ) -> float:
        """Score a single candidate for the given request. Higher is better."""
        ...

Note: CompletionRequest is imported from the Provider interface.

TypeScript

/** Evaluate whether an active model should move to standby. */
interface DeactivationPolicy {
    /** Return true if the model should be moved to standby. */
    shouldDeactivate(snapshot: ModelSnapshot): boolean;

    /** Return the reason for deactivation, or null if the model should stay active. */
    getReason(snapshot: ModelSnapshot): DeactivationReason | null;
}

/** Evaluate whether a standby model should return to active. */
interface RecoveryPolicy {
    /** Return true if the model should be reactivated. */
    shouldRecover(snapshot: ModelSnapshot): boolean;

    /** Return the next scheduled recovery check time, or null if not scheduled. */
    getRecoverySchedule(snapshot: ModelSnapshot): Date | null;
}

/** Choose the best model from active candidates for a given request. */
interface SelectionStrategy {
    /** Select the best model from the candidate list for the given request. */
    select(candidates: ModelSnapshot[], request: CompletionRequest): SelectionResult;

    /** Score a single candidate for the given request. Higher is better. */
    score(candidate: ModelSnapshot, request: CompletionRequest): number;
}

Common Configuration

Parameters shared by all rotation policies. Configured per pool; policies receive these through the pool context.

Parameter Type Default Description
rotation.deactivation.retry_limit integer 3 Consecutive failures before deactivation.
rotation.deactivation.error_rate_threshold float 0.5 Error rate over sliding window (0.0–1.0) before deactivation.
rotation.deactivation.error_codes list [429, 500, 503] HTTP codes that count toward deactivation.
rotation.deactivation.request_limit integer Max requests before deactivation (free-tier cap).
rotation.deactivation.token_limit integer Max tokens before deactivation.
rotation.deactivation.budget_limit number Max spend (USD) before deactivation.
rotation.deactivation.quota_window string Deactivate when quota period expires: monthly, daily.
rotation.deactivation.maintenance_window string Scheduled deactivation (cron expression).
rotation.recovery.cooldown duration 60s Time from deactivation before reactivation.
rotation.recovery.probe_on_start boolean false Test standby models at library startup.
rotation.recovery.probe_interval duration 300s Periodically test standby models.
rotation.recovery.on_quota_reset boolean true Reactivate when provider quota resets.
rotation.recovery.quota_reset_schedule string monthly Calendar schedule for quota resets: monthly, daily_utc.
rotation.selection.model_priority list Ordered model preference list.
rotation.selection.provider_priority list Ordered provider preference list.
rotation.selection.fallback_strategy string round-robin Strategy after priority list exhausted.
rotation.selection.balance_mode string relative For load-balanced: absolute or relative distribution.
rotation.selection.rate_limit.threshold float 0.8 Switch models at this fraction of the limit (0.0–1.0).
rotation.selection.rate_limit.min_delta duration Minimum time between requests to the same model.
rotation.selection.rate_limit.max_rpm integer Max requests per minute before switching models.
rotation.provider_deactivation string on_auth_failure Deactivate all models of a provider across all pools: on_auth_failure, on_api_outage.
rotation.provider_recovery string on_probe_success Reactivate all models when provider recovers: on_probe_success, on_manual.

CDK Base Class

The CDK provides BaseRotationPolicy with threshold-based deactivation, cooldown recovery, and priority selection. Specialized class: ThresholdRotationPolicy. See DeveloperGuide – Tutorial 3.