Expert guide for deploying and managing inference services with Odin, the MoAI Inference Framework operator. Use this skill when working with InferenceService, InferenceServiceTemplate, templateRefs, parallelism (tensor, pipeline, data, expert), workerTemplate, runtime-bases, presets, LeaderWorkerSet, rolloutStrategy, or troubleshooting Odin workloads in Kubernetes clusters. Also use when questions involve deploying a model on Kubernetes, serving LLMs with vLLM, GPU resource configuration for inference, multi-node model deployment, or HuggingFace model serving.
Odin is the Kubernetes operator at the core of the MoAI Inference Framework (MIF). It manages the lifecycle of inference workloads by reconciling InferenceService custom resources into Kubernetes-native Deployments or LeaderWorkerSets, depending on the parallelism configuration.
Odin introduces a template composition system (InferenceServiceTemplate) that allows reusable configurations — runtime-bases and model-specific presets — to be layered and merged using Kubernetes strategic merge patch semantics. This enables a separation of concerns: platform teams maintain runtime-bases, model teams maintain presets, and end users compose them with minimal configuration.
This skill covers:
Out of scope: Heimdall plugin configuration (see guide-heimdall), vLLM engine internals, Gateway controller setup, cluster-level infrastructure.
Key codebase paths:
website/docs/reference/odin/api-reference.mdx — API field referencewebsite/docs/features/preset.mdx — template composition guidewebsite/docs/getting-started/quickstart.mdx — end-to-end deploymentdeploy/helm/moai-inference-preset/templates/runtime-bases/ — runtime-base definitionsdeploy/helm/moai-inference-preset/templates/presets/ — model-specific presetstest/e2e/*/config/inference-service.yaml.tmpl — E2E test InferenceService patternsAdditional references — when to consult:
When you need field-level details beyond this guide (e.g., exact CRD validation rules, all supported env variables, template variable list), consult the reference docs below. Prefer the local file path when filesystem access is available (faster, complete). Use the URL as a fallback when filesystem access is unavailable.
| Topic | Local path | URL (fallback) |
|---|---|---|
| API field reference (CRD spec) | website/docs/reference/odin/api-reference.mdx | https://test-docs.moreh.io/dev/reference/odin/api-reference/ |
| Template composition & presets | website/docs/features/preset.mdx | https://test-docs.moreh.io/dev/features/preset/ |
| End-to-end quickstart | website/docs/getting-started/quickstart.mdx | https://test-docs.moreh.io/dev/getting-started/quickstart/ |
| PV-based model management | website/docs/operations/hf-model-management-with-pv.mdx | https://test-docs.moreh.io/dev/operations/hf-model-management-with-pv/ |
| Monitoring & metrics | website/docs/operations/monitoring/metrics/index.mdx | https://test-docs.moreh.io/dev/operations/monitoring/metrics/ |
flowchart TD
IS[InferenceService] --> TR[Template Resolution]
TR -->|fetch & merge templateRefs in order| VS[Variable Substitution]
VS -->|replace Go templates| WD{workerTemplate?}
WD -->|nil| Deploy[Create Deployment]
WD -->|not nil| LWS[Create LeaderWorkerSet]
Deploy --> IP[InferencePool Integration]
LWS --> IP
IP -->|inject pool selector labels| SP[Status Propagation]
SP -->|reflect workload readiness| IS
| CRD | API Group | Short Names | Purpose |
|---|---|---|---|
InferenceService | odin.moreh.io/v1alpha1 | is, isvc | User-facing resource for deploying inference workloads |
InferenceServiceTemplate | odin.moreh.io/v1alpha1 | ist, isvctmpl | Reusable template for composable configurations |
apiVersion: odin.moreh.io/v1alpha1