Help users deploy LLM models to Kubernetes using the kubectl kaito plugin. The goal is to ask the right questions, recommend a model and configuration, and produce a ready-to-run kubectl kaito deploy command.

What is KAITO?

KAITO (Kubernetes AI Toolchain Operator) automates AI model inference on Kubernetes. The kubectl kaito plugin simplifies this by turning a few flags into a complete GPU-provisioned deployment. Users don't need to write YAML — the plugin generates the Workspace CRD automatically.

Workflow

When a user wants to deploy a model, walk through these steps:

0. Check if the kubectl kaito plugin is installed

Before anything else, verify the plugin is available:

kubectl kaito --help

If the command fails (not found), stop and ask the user to install it first:

What is KAITO?

Workflow

When a user wants to deploy a model, walk through these steps:

0. Check if the kubectl kaito plugin is installed

Before anything else, verify the plugin is available:

kubectl kaito --help

If the command fails (not found), stop and ask the user to install it first:

Flag	Purpose	When to use
`--workspace-name`	Name for the Workspace resource	Always (required)
`--model`	Model name or HuggingFace ID	Always (required)
`--instance-type`	GPU VM SKU (e.g., `Standard_NC24ads_A100_v4`)	When auto-provisioning is on
`--count`	Number of GPU nodes (default: 1)	Large models needing multi-node
`--model-access-secret`	K8s secret with HuggingFace token	Private/gated HuggingFace models
`--inference-config`	ConfigMap name or path to YAML config	Custom vLLM/runtime params
`--adapters`	LoRA adapters to load	When using fine-tuned adapters
`--enable-load-balancer`	Create external LoadBalancer	When external access is needed
`--dry-run`	Show config without deploying	When user wants to preview
`--namespace`	Target namespace	When not using default namespace

SKU	GPUs	GPU Memory	Good for
`Standard_NC6s_v3`	1× V100	16 GB	Small models (≤7B params)
`Standard_NC24ads_A100_v4`	1× A100	80 GB	Medium models (7B–14B)
`Standard_NC48ads_A100_v4`	2× A100	160 GB	Large models (30B–40B)
`Standard_NC96ads_A100_v4`	4× A100	320 GB	Very large models (70B+)

KAITO Inference Skill

What is KAITO?

Workflow

0. Check if the kubectl kaito plugin is installed

KAITO Inference Skill

What is KAITO?

Workflow

0. Check if the kubectl kaito plugin is installed

1. Understand what they need

2. Check the supported preset models

2b. Search HuggingFace Hub if model is not a preset

3. Generate the deploy command

4. Explain what happens next

Common instance types (Azure)

Inference configuration

Helm Chart Scaffolding

Python Observability

K8s Manifest Generator

Istio Traffic Management

Secrets Management

Gitops Workflow