Manage GPU compute jobs on the Qizhi (启智) platform using qzcli — a kubectl-style CLI tool. Use when user says "qzcli", "启智平台", "submit job", "stop job", "查计算组", "avail", "list jobs", "batch submit", or needs to manage distributed training jobs on a Qizhi instance.
41:T1bad,
A kubectl/docker-style CLI for managing GPU compute jobs on the Qizhi (启智) platform.
GitHub: tianyilt/qzcli_tool
pip install rich requests prompt_toolkit mcp
git clone https://github.com/tianyilt/qzcli_tool
cd qzcli_tool && pip install -e .
To use qzcli as an MCP tool directly from Claude Code or Codex:
# Claude Code
claude mcp add qzcli -- qzcli-mcp
# Codex
codex mcp add qzcli -- qzcli-mcp
Credentials are read in this priority order:
CLI args > --password-stdin > env vars > QZCLI_ENV_FILE (.env) > ~/.qzcli/config.json > interactive input
# Option A: env file (recommended)
mkdir -p ~/.qzcli
cat > ~/.qzcli/.env <<'EOF'
QZCLI_USERNAME="your_username"
QZCLI_PASSWORD="your_password"
EOF
# Option B: environment variables
export QZCLI_USERNAME="your_username"
export QZCLI_PASSWORD="your_password"
export QZCLI_API_URL="https://qz.yourorg.edu.cn"
Config files are stored in ~/.qzcli/: config.json, .cookie, resources.json, jobs.json.
# 1. Login
qzcli login
# 2. Discover and cache workspaces/compute groups (run once, re-run after joining new workspaces)
qzcli res -u
# 3. Check available nodes
qzcli avail
# 4. List running jobs
qzcli ls -c -r
# Interactive login
qzcli login
# With credentials
qzcli login -u YOUR_USERNAME -p 'YOUR_PASSWORD'
# Read password from stdin (for scripts)
echo 'YOUR_PASSWORD' | qzcli login -u YOUR_USERNAME --password-stdin
# Check current cookie
qzcli cookie --show
# Clear cookie
qzcli cookie --clear
Note: qzcli avail auto-refreshes the cookie if it expires and credentials are configured.
# List cached workspaces
qzcli res --list
# Refresh all workspace resource cache (run this first!)
qzcli res -u
# Refresh a specific workspace
qzcli res -w MY_WORKSPACE -u
# Set a human-readable alias for a workspace
qzcli res -w ws-xxxxxxxx --name "My Workspace"
# All workspaces
qzcli avail
# Including low-priority task nodes (slower but more accurate)
qzcli avail --lp
# Specific workspace
qzcli avail -w MY_WORKSPACE
# Find compute groups with N free nodes
qzcli avail -n 4
# Export IDs for scripting
qzcli avail -n 4 -e
# Show idle node names
qzcli avail -w MY_WORKSPACE -v
# Full interactive selection: workspace → project → compute group → spec
qzcli create -i
# Interactive for a specific workspace only
qzcli create -i -w "My Workspace"
The TUI shows GPU type, availability, and spec status at each level. Press Enter/→ to go deeper, ← to go back.
# Using names (resolved from qzcli res cache)
qzcli create \
--name "my-training-job" \
--command "bash /path/to/train.sh" \
--workspace "My Workspace" \
--compute-group "My Compute Group" \
--image YOUR_REGISTRY/team/image:tag \
--instances 4 \
--priority 10
# Using IDs directly
qzcli create \
--name "my-job" \
--command "bash /path/to/train.sh" \
--workspace ws-YOUR_WORKSPACE_ID \
--compute-group lcg-YOUR_LCG_ID \
--spec YOUR_SPEC_ID \
--image YOUR_REGISTRY/team/image:tag \
--instances 4
Key parameters:
| Parameter | Default | Description |
|---|---|---|
--name / -n | required | Job name |
--command / -c | required | Command to run |
--workspace / -w | Workspace name or ID (ws-...) | |
--compute-group / -g | auto | Compute group name or ID (lcg-...) |
--spec / -s | auto | Resource spec ID |
--image / -m | Docker image | |
--instances | 1 | Number of instances |
--shm | 1200 | Shared memory (GiB) |
--priority | 10 | Priority (1–10) |
--dry-run | Preview only, don't submit | |
--json | JSON output for scripting |
# Preview before submitting
qzcli create --name test --command "echo hi" --workspace "My Workspace" \
--image YOUR_IMAGE --dry-run
# Pass vars directly — do NOT use "export VAR; bash script.sh"
WORKSPACE_ID="ws-YOUR_WORKSPACE_ID" \
LCG_ID="lcg-YOUR_LCG_ID" \
SPEC_ID="YOUR_SPEC_ID" \
CHECKPOINT_DIR="/path/to/checkpoint" \
bash YOUR_SUBMIT_SCRIPT.sh
qzcli hpc \
--name "my-cpu-job" \
--workspace ws-YOUR_WORKSPACE_ID \
--compute-group lcg-YOUR_LCG_ID \
--predef-quota-id YOUR_QUOTA_ID \
--cpu 55 --mem-gi 300 --instances 30 \
--image YOUR_REGISTRY/team/cpu-image:tag \
--entrypoint "cd /path/to/dir && bash run.sh"
# Submit from config file
qzcli batch batch_config.json --delay 3
# Preview all jobs
qzcli batch batch_config.json --dry-run
# Continue on error
qzcli batch batch_config.json --continue-on-error
Config format (batch_config.json):
{
"defaults": {
"workspace": "ws-YOUR_WORKSPACE_ID",
"compute_group": "lcg-YOUR_LCG_ID",
"spec": "YOUR_SPEC_ID",
"image": "YOUR_REGISTRY/team/image:tag",
"instances": 4,
"priority": 10
},
"matrix": {
"checkpoint": ["/path/to/ckpt1", "/path/to/ckpt2"],
"step": [50000, 100000]
},
"name_template": "eval-{checkpoint_basename}-step{step}",
"command_template": "bash eval.sh --checkpoint {checkpoint} --step {step}"
}
Matrix keys are Cartesian-producted (2×2 = 4 jobs above). Use {key_basename} for path basenames.
for step in 040000 050000 060000; do
qzcli create \
--name "eval-step${step}" \
--command "bash eval.sh --step $step" \
--workspace "My Workspace" \
--compute-group "My Compute Group" \
--instances 4
sleep 3
done
# List jobs
qzcli ls -c -w MY_WORKSPACE # specific workspace
qzcli ls -c --all-ws # all workspaces
qzcli ls -c -w MY_WORKSPACE -r # running only
qzcli ls -c -w MY_WORKSPACE -n 50 # show 50
# Stop a job
qzcli stop JOB_ID
# Job status / details
qzcli status JOB_ID
# Watch all running jobs (refresh every 10s)
qzcli watch -i 10
# Workspace view with GPU utilization
qzcli ws
qzcli ws -a # all projects
qzcli ws -p "My Project"
| Problem | Cause | Fix |
|---|---|---|
| Cookie expired | Session gap | Re-run qzcli login |
未找到名称为 'xxx' 的工作空间 | Stale cache | Run qzcli res -u |
No resources in create -i | Cache empty | Run qzcli login && qzcli res -u |
qzcli-mcp not found | Not installed | cd qzcli_tool && pip install -e . |
| Spec not in workspace | ID mismatch | Match spec ID to the correct workspace |
| Silent job failure | Script sys.exit(0) | Check job logs directly |
| zsh glob errors | Remote shell is zsh | Wrap commands in bash -c or use Python |