Name: Rocm Vllm Deployment
Author: openclaw

Rocm Vllm Deployment | Skills Pool

# HuggingFace authentication token (required for gated models)
export HF_TOKEN="hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

# Model cache directory (optional)
export HF_HOME="$HOME/models"

# Apply changes
source ~/.bash_profile

# Basic check (HF_TOKEN optional, HF_HOME optional with default)
./scripts/check-env.sh

# Strict mode (HF_HOME required, fails if not set)
./scripts/check-env.sh --strict

# Quiet mode (minimal output, for automation)
./scripts/check-env.sh --quiet

# Test with environment variables
HF_TOKEN="hf_xxx" HF_HOME="/models" ./scripts/check-env.sh

./scripts/generate-report.sh <model-id> <container-name> <port> <status> [model-load-time] [memory-used]

# Example:
./scripts/generate-report.sh \
  "Qwen-Qwen3-0.6B" \
  "vllm-qwen3-0-6b" \
  "8001" \
  "✅ Success" \
  "3.6" \
  "1.2"

Parameter	Required	Description
`model-id`	Yes	Model ID (with `/` replaced by `-`)
`container-name`	Yes	Docker container name
`port`	Yes	Host port for API endpoint
`status`	Yes	Deployment status (e.g., "✅ Success")
`model-load-time`	No	Model loading time in seconds
`memory-used`	No	Memory consumption in GiB

Parameter	Type	Required	Default	Description
model_id	String	Yes	-	HuggingFace model ID
docker_image	String	No	rocm/vllm-dev:nightly	vLLM Docker image
tensor_parallel_size	Integer	No	1	Number of GPUs
port	Integer	No	9999	API server port
hf_home	String	No	`${HF_HOME}` or `/root/.cache/huggingface/hub`	Model cache directory
hf_token	Secret	Conditional	`${HF_TOKEN}`	HuggingFace token (optional for public models, required for gated models)
max_model_len	Integer	No	Auto-detect	Maximum sequence length
gpu_memory_utilization	Float	No	0.85	GPU memory utilization
auto_install	Boolean	No	true	Auto-install dependencies
log_level	String	No	INFO	Logging verbosity

$HOME/vllm-compose/<model-id-slash-to-dash>/

$HOME/vllm-compose/<model-id>/
├── deployment.log          # Full deployment logs (stdout + stderr)
├── test-results.json       # Functional test results (JSON format)
├── docker-compose.yml      # Generated Docker Compose file
├── .env                    # HF_TOKEN environment (chmod 600, optional)
└── DEPLOYMENT_REPORT.md    # Human-readable deployment summary

# Source ~/.bash_profile to load HF_HOME and HF_TOKEN
source ~/.bash_profile

# If HF_HOME is not defined, it defaults to /root/.cache/huggingface/hub

# Download model to HF_HOME
huggingface-cli download <model_id> --local-dir "$HF_HOME/hub/models--<org>--<model>"

# Or use snapshot_download via Python:
python -c "from huggingface_hub import snapshot_download; snapshot_download(repo_id='<model_id>', cache_dir='$HF_HOME')"

Scenario	Behavior
Public model + no token	✅ Download succeeds
Public model + token provided	✅ Download succeeds
Gated model + no token	❌ Download fails with "authentication required" error
Gated model + invalid token	❌ Download fails with "invalid token" error
Gated model + valid token	✅ Download succeeds

echo "ERROR: Model download failed - authentication required"
echo "This model requires a valid HF_TOKEN."
echo ""
echo "Please add to ~/.bash_profile:"
echo "  export HF_TOKEN=\"hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\""
echo "Then run: source ~/.bash_profile"
exit 1

Variable	Required	If Missing
`HF_TOKEN`	Conditional	Continue without token (public models work; gated models fail at download with clear error)
`HF_HOME`	No	Warning + Default — Use `/root/.cache/huggingface/hub`

Code	Meaning
0	Environment check completed (variables loaded or defaulted)
2	Critical error (e.g., cannot source ~/.bash_profile)

Code	Meaning
0	Report generated successfully
1	Missing required parameters
2	Output directory not found

Rocm Vllm Deployment

ROCm vLLM Deployment Skill

Features

Environment Prerequisites

Rocm Vllm Deployment

ROCm vLLM Deployment Skill

Features

Environment Prerequisites

Environment Variable Detection

Helper Scripts

check-env.sh

generate-report.sh

Input Schema

Output Structure

Execution Workflow

Phase 0: Environment Check & Auto-Repair

Phase 1: Model Download

Phase 2: Model Parameter Detection

Phase 3: Docker Compose Configuration

Github

Openclaw Parallels Smoke

Update Screenshots

Azure Pipelines

Deployment Patterns

Deployment Patterns