Name: Odin Inference Operator — Expert Guide
Author: moreh-dev

Odin Inference Operator — Expert Guide

Expert guide for deploying and managing inference services with Odin, the MoAI Inference Framework operator. Use this skill when working with InferenceService, InferenceServiceTemplate, templateRefs, parallelism (tensor, pipeline, data, expert), workerTemplate, runtime-bases, presets, LeaderWorkerSet, rolloutStrategy, or troubleshooting Odin workloads in Kubernetes clusters. Also use when questions involve deploying a model on Kubernetes, serving LLMs with vLLM, GPU resource configuration for inference, multi-node model deployment, or HuggingFace model serving.

moreh-dev0 starsMar 23, 2026

Occupation
Categories: Framework Internals

Identity and Scope

Odin is the Kubernetes operator at the core of the MoAI Inference Framework (MIF). It manages the lifecycle of inference workloads by reconciling InferenceService custom resources into Kubernetes-native Deployments or LeaderWorkerSets, depending on the parallelism configuration.

Odin introduces a template composition system (InferenceServiceTemplate) that allows reusable configurations — runtime-bases and model-specific presets — to be layered and merged using Kubernetes strategic merge patch semantics. This enables a separation of concerns: platform teams maintain runtime-bases, model teams maintain presets, and end users compose them with minimal configuration.

This skill covers:

InferenceService and InferenceServiceTemplate CRDs
Template composition (templateRefs, merging, variable substitution)
Parallelism configuration (tensor, pipeline, data, expert)
Workload types (Deployment vs. LeaderWorkerSet)
Runtime-bases and preset selection
Odin Helm chart (operator) and inference-service chart
Rollout strategies
Heimdall integration (inferencePoolRefs)

Odin Inference Operator — Expert Guide

moreh-dev0 starsMar 23, 2026

Occupation
Categories: Framework Internals

Identity and Scope

This skill covers:

InferenceService and InferenceServiceTemplate CRDs

Template composition (templateRefs, merging, variable substitution)

Parallelism configuration (tensor, pipeline, data, expert)

Workload types (Deployment vs. LeaderWorkerSet)

Runtime-bases and preset selection

Odin Helm chart (operator) and inference-service chart

Rollout strategies

Heimdall integration (inferencePoolRefs)

Topic	Local path	URL (fallback)
API field reference (CRD spec)	`website/docs/reference/odin/api-reference.mdx`	https://test-docs.moreh.io/dev/reference/odin/api-reference/
Template composition & presets	`website/docs/features/preset.mdx`	https://test-docs.moreh.io/dev/features/preset/
End-to-end quickstart	`website/docs/getting-started/quickstart.mdx`	https://test-docs.moreh.io/dev/getting-started/quickstart/
PV-based model management	`website/docs/operations/hf-model-management-with-pv.mdx`	https://test-docs.moreh.io/dev/operations/hf-model-management-with-pv/
Monitoring & metrics	`website/docs/operations/monitoring/metrics/index.mdx`	https://test-docs.moreh.io/dev/operations/monitoring/metrics/

CRD	API Group	Short Names	Purpose
`InferenceService`	`odin.moreh.io/v1alpha1`	`is`, `isvc`	User-facing resource for deploying inference workloads
`InferenceServiceTemplate`	`odin.moreh.io/v1alpha1`	`ist`, `isvctmpl`	Reusable template for composable configurations

Odin Inference Operator — Expert Guide

Identity and Scope

Odin Inference Operator — Expert Guide

Identity and Scope

Architecture Overview

Reconciliation flow

CRDs

InferenceService Spec

Pytorch Patterns

Regex Vs Llm Structured Text

Effect

Flags

WPF to WinUI 3 Migration Skill

At Dispatch V2