Use when the user mentions Proxmox, PVE, Talos, annarchy.net, fleet-infra, pve01/pve02/pve03, staging cluster, production cluster, or asks to "spin up staging", "tear down staging", "rebuild the cluster", "reprovision", "upgrade Talos", "upgrade Kubernetes", "check cluster health", "what VMs are running", "create a template for Talos", "generate factory schematic", "new Talos extension", "etcd backup", "generate machine configs", "per-node patches", "bootstrap etcd", "deploy the latest Talos", "run talos-provision-vms", "update talos-proxmox.yaml", "check API connectivity", "reverse proxy for proxmox", "evacuate node", "migrate VM", "manage snapshots", "task pve:*", "task talos:*", "task cluster:*", "talosctl", or any Proxmox VE cluster operations, VM lifecycle, template creation, Talos Linux operations, Ansible-driven provisioning, or Taskfile-based workflows.
You are an expert at managing Proxmox VE clusters, with deep knowledge of the Proxmox REST API, VM lifecycle management, cloud-init templates, storage backends, RBAC, live migration, and cluster operations. You manage the cluster defined in cluster-config.yaml.
CRITICAL: Before any operation, read the cluster configuration file at:
skills/proxmox-manager/cluster-config.yaml (relative to the skill directory)
This file defines the cluster topology, VM defaults, VMID ranges, credential paths, and conventions. Apply these defaults to every operation unless the user explicitly overrides them.
All VM defaults (storage, BIOS, CPU, network, SCSI controller, guest agent) and VMID ranges are defined in cluster-config.yaml under defaults and vmid_ranges. Read those values and apply them.
To find the next available VMID, query existing VMIDs via the API (see references/api-operations.md for the curl pattern), then pick the next unused ID within the appropriate range from vmid_ranges.
NON-NEGOTIABLE RULES -- violations are security incidents:
pass show as a standalone command$(pass show ...) inline within the consuming commandcurl -v or any verbose mode that leaks HTTP headerspass insert -- never to stdoutEvery Proxmox API call follows this pattern:
curl -sk \
-H "Authorization: PVEAPIToken=$(pass show <PASS_PATH> | head -1)=$(pass show <PASS_PATH> | tail -1)" \
"https://<NODE_HOST>:8006/api2/json/<endpoint>"
Substitute <PASS_PATH> from credentials.pass_path and <NODE_HOST> from cluster.nodes[].host in cluster-config.yaml.
ssh <SSH_USER>@<NODE_HOST> '<command>'
| Method | Operations |
|---|---|
REST API (curl) | VM create, start, stop, resize, migrate, clone, status, snapshot, backup, cluster/node info, tag management, configuration changes |
pvesh (via SSH) | Same as REST API but from the node shell -- no auth headers needed, human-readable output. Preferred for SSH-based operations and Ansible tasks delegated to PVE hosts. See references/pvesh-tool.md |
| SSH (direct) | Disk import, template conversion, cloud image download, cloud-init snippets, ISO uploads, qm commands |
Before any operation, verify API reachability:
curl -sk -o /dev/null -w "%{http_code}" \
-H "Authorization: PVEAPIToken=$(pass show <PASS_PATH> | head -1)=$(pass show <PASS_PATH> | tail -1)" \
https://<NODE_HOST>:8006/api2/json/version
Expected: 200. If 401, credentials are invalid. If 000, node is unreachable.
Source of truth for the Proxmox VE API: https://pve.proxmox.com/pve-docs/api-viewer/
The API viewer is the canonical reference for every PVE endpoint, parameter, return type, and permission. When the reference files below lack detail, consult the API viewer directly.
For full API curl examples and pvesh usage, read these reference files on demand:
| Reference File | Contents |
|---|---|
references/api-operations.md | VM CRUD, task polling, migration, status queries |
references/pvesh-tool.md | pvesh CLI tool -- on-node API access without HTTP, command mapping, output formats |
references/bulk-tag-operations.md | Tag-based filtering, bulk start/stop/tag/untag |
references/snapshots-backups-storage.md | Snapshot management, vzdump backups, storage queries, orphaned disks |
references/rbac-bootstrap.md | First-time API credential setup |
references/ansible-integration.md | Ansible delegation, fleet-wide operations, host configuration automation |
Runbooks live in skills/proxmox-manager/runbooks/. Each runbook is a markdown file encoding an operational procedure with YAML frontmatter. At invocation, read available runbooks to know what procedures exist.
See runbooks/_template.md for the standard format. Each runbook defines parameters, prerequisites, step-by-step procedures (API vs SSH), cleanup actions, and notes.
| Runbook | Purpose |
|---|---|
cluster-create.md | Full cluster provisioning from profile |
cluster-teardown.md | Destroy all VMs by profile tags |
node-evacuation.md | Evacuate a node for maintenance |
create-cloudinit-template.md | Template from cloud image |
create-iso-template.md | Template from ISO |
import-qcow2-template.md | Template from qcow2 disk |
bulk-snapshot-by-tag.md | Snapshot all VMs matching a tag |
talos-image-factory.md | Build custom Talos image with extensions |
talos-image-cache.md | Pre-cache container images for air-gapped deployments |
talos-template-create.md | Import Talos image as PVE template |
talos-cluster-bootstrap.md | Full Talos bootstrap (secrets, configs, etcd) |
talos-upgrade.md | Rolling in-place Talos/K8s upgrades |
talos-version-upgrade.md | Major version upgrade via template redeployment |
talos-etcd-backup.md | etcd snapshot procedures |
packer-talos-template.md | Packer-based Talos template (CI/CD) |
proxmox-reverse-proxy.md | HAProxy reverse proxy for PVE web UI |
letsencrypt-gcloud-dns.md | Let's Encrypt via Google Cloud DNS-01 challenge |
When the user provides a URL or raw instructions for a new procedure:
runbooks/_template.md formatCluster profiles live in skills/proxmox-manager/clusters/. Each profile defines an entire cluster as a single YAML file. Read the profile files for the full schema.
name -- unique cluster name (must match filename)type -- talos or generic (determines bootstrap behavior)template -- VMID to clone fromtags -- applied to every VM; used for membership queries and teardownnodes.controlplane / nodes.workers -- count, sizing, VMID assignments, placement strategytalos.* -- version, factory schematic, VIP, config directory (when type: talos)network.* -- API endpoint, pod/service CIDRsflux.* -- GitOps repository, path, branchspread -- distribute VMs round-robin across hypervisors (fault tolerance)pack -- place on fewest nodes (resource conservation)Talos is an immutable, API-driven Kubernetes OS. No SSH -- all management via talosctl (port 50000). Machine config is the single source of truth.
factory.talos.dev with extensions baked in at build timerunbooks/talos-image-factory.md)runbooks/talos-image-cache.md)runbooks/talos-template-create.md)pve:cluster:create or Ansible)runbooks/talos-cluster-bootstrap.md)runbooks/talos-upgrade.md)talosctl upgrade (in-place) for minor/patch Talos OS upgrades within same extension setrunbooks/talos-version-upgrade.md)| Command | Purpose |
|---|---|
talosctl health | Cluster health check |
talosctl get members | List cluster members |
talosctl dashboard | Live cluster dashboard (TUI) |
talosctl logs <service> | Service logs |
talosctl services | List running services |
talosctl version | Show versions |
talosctl get extensions | List installed extensions |
talosctl etcd members | List etcd members |
talosctl etcd snapshot <path> | Create etcd snapshot |
talosctl apply-config | Apply/update machine config |
talosctl upgrade | Upgrade Talos OS |
talosctl upgrade-k8s | Upgrade Kubernetes |
A Taskfile.yml in this skill directory wraps common operations as ergonomic one-liners. Requires go-task v3+, jq, yq, and pass.
Run from the skill directory (skills/proxmox-manager/):
task --list # List all tasks
task pve:check # Verify API connectivity
task pve:vms # List all VMs
task pve:templates # List templates
task pve:vm:config VMID=1031 # Show VM config
task pve:vm:start VMID=1031 # Start a VM
task pve:vm:stop VMID=1031 # Graceful shutdown
task pve:vm:clone TEMPLATE=101 VMID=1040 NAME=test-vm
task pve:vm:set VMID=1031 CORES=4 MEMORY=8192 IP=10.0.0.31/24
task pve:vm:migrate VMID=1031 TARGET=pve02
task pve:vm:resize VMID=1031 SIZE=+50G
task pve:cluster:list # List cluster profiles
task pve:cluster:status PROFILE=talos-staging
task pve:cluster:create PROFILE=talos-staging
task pve:cluster:teardown PROFILE=talos-staging
task talos:health PROFILE=talos-staging
task talos:status PROFILE=talos-staging
pve:vm:kill, pve:vm:delete, pve:cluster:teardown require confirmationpass ls <PASS_PATH> (lists entry without showing content)references/rbac-bootstrap.md)ping -c 1 <NODE_HOST>ssh <SSH_USER>@<NODE_HOST> 'pveum role list --output-format json'task pve:check returns 200, do not re-question permissions unless a specific 403 response names the missing privilegessh <SSH_USER>@<NODE_HOST> 'qm unlock <VMID>'ssh <SSH_USER>@<NODE_HOST> 'qm listsnapshot <VMID>'stop insteadrunbooks/proxmox-reverse-proxy.md)/etc/default/pveproxy with WORKERS=4+DELETE /qemu/{vmid} returns a UPID immediately; the actual destroy happens in the backgroundproxmox_kvm clone task sees "ALREADY EXISTS" and skips cloning, producing VMs with no diskdata: null in the status response after destroycurl -s .../nodes/{node}/qemu/{vmid}/status/currenteth0 -> ens18)talos-upgrade.md "Lessons Learned" for detailed recovery proceduresrunbooks/talos-cluster-bootstrap.md step 5)mkfs.btrfs on thin-provisioned Synology iSCSI LUNs hangs indefinitely (SCSI UNMAP/discard)fsType: ext4 for Synology CSI StorageClasses