AutoDeploy Model Onboarding

Input: HuggingFace model ID. Output: prefill-only custom model file + hierarchical tests + summary report.

Phase 0 — Gather All Resources Upfront

Web/GitHub fetches require user approval and the user may leave. Do ALL network access now and save locally before proceeding.

Step 0 — GPU memory sanity check

Before anything else, check whether the model can fit on the current system.

Run nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits to get the total VRAM (in MiB) across all GPUs on the system.
Estimate the model's memory footprint from the HuggingFace model card or config (number of parameters × bytes per parameter, e.g. 7B params × 2 bytes = ~14 GB for bfloat16).
If the estimated size exceeds total system VRAM, stop and report this to the user — do not proceed with onboarding until the user acknowledges and decides how to proceed. Example message: "This model requires ~Xgb but the system only has Ygb across N GPUs. Onboarding is likely to fail at the e2e run stage."

AutoDeploy Model Onboarding

Input: HuggingFace model ID. Output: prefill-only custom model file + hierarchical tests + summary report.

Phase 0 — Gather All Resources Upfront

Web/GitHub fetches require user approval and the user may leave. Do ALL network access now and save locally before proceeding.

Step 0 — GPU memory sanity check

Before anything else, check whether the model can fit on the current system.

Run nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits to get the total VRAM (in MiB) across all GPUs on the system.
Estimate the model's memory footprint from the HuggingFace model card or config (number of parameters × bytes per parameter, e.g. 7B params × 2 bytes = ~14 GB for bfloat16).
If the estimated size exceeds total system VRAM, stop and report this to the user — do not proceed with onboarding until the user acknowledges and decides how to proceed. Example message: "This model requires ~Xgb but the system only has Ygb across N GPUs. Onboarding is likely to fail at the e2e run stage."

Ad Model Onboard

AutoDeploy Model Onboarding

Phase 0 — Gather All Resources Upfront

Step 0 — GPU memory sanity check

Ad Model Onboard

AutoDeploy Model Onboarding

Phase 0 — Gather All Resources Upfront

Step 0 — GPU memory sanity check

Phase 1 — Survey Existing Coverage & Analyze HF Model

Step 1 — Check for existing AD custom modeling code

Step 2 — Survey the model family in the registry

Step 3 — Analyze HF model architecture

Phase 2 — Write a Lean Prefill-Only Model

Phase 3 — Use AutoDeploy Canonical Ops (CRITICAL)

Phase 4 — Register

Phase 5 — Model Input Contract

Phase 6 — Hierarchical Tests

Phase 7 — Independent Review (MANDATORY)

Phase 8 — Create or Update Model Registry Entries (Including Family)

Phase 9 — AutoDeploy End-to-End Run

⚠️ MANDATORY: You MUST use `build_and_run_ad.py --use-registry` EXACTLY AS-IS ⚠️

Phase 10 — Summary Report

⚠️ MANDATORY: You MUST include ALL raw prompts and generated outputs from the final `build_and_run_ad.py` run ⚠️

Phase 11 — Prepare a Pull Request

⚠️ MANDATORY: Re-run and re-post logs on EVERY PR update — NO EXCEPTIONS ⚠️

⚠️ MANDATORY: Poll PR for new comments every 5 minutes ⚠️

Key Gotchas

Github

Openclaw Parallels Smoke

Update Screenshots

Azure Pipelines

Deployment Patterns

Deployment Patterns

Ad Model Onboard

AutoDeploy Model Onboarding

Phase 0 — Gather All Resources Upfront

Step 0 — GPU memory sanity check

Ad Model Onboard

AutoDeploy Model Onboarding

Phase 0 — Gather All Resources Upfront

Step 0 — GPU memory sanity check

Phase 1 — Survey Existing Coverage & Analyze HF Model

Step 1 — Check for existing AD custom modeling code

Step 2 — Survey the model family in the registry

Step 3 — Analyze HF model architecture

Phase 2 — Write a Lean Prefill-Only Model

Phase 3 — Use AutoDeploy Canonical Ops (CRITICAL)

Phase 4 — Register

Phase 5 — Model Input Contract

Phase 6 — Hierarchical Tests

Phase 7 — Independent Review (MANDATORY)

Phase 8 — Create or Update Model Registry Entries (Including Family)

Phase 9 — AutoDeploy End-to-End Run

⚠️ MANDATORY: You MUST use build_and_run_ad.py --use-registry EXACTLY AS-IS ⚠️

Phase 10 — Summary Report

⚠️ MANDATORY: You MUST include ALL raw prompts and generated outputs from the final build_and_run_ad.py run ⚠️

Phase 11 — Prepare a Pull Request

⚠️ MANDATORY: Re-run and re-post logs on EVERY PR update — NO EXCEPTIONS ⚠️

⚠️ MANDATORY: Poll PR for new comments every 5 minutes ⚠️

Key Gotchas

Github

Openclaw Parallels Smoke

Update Screenshots

Azure Pipelines

Deployment Patterns

Deployment Patterns

⚠️ MANDATORY: You MUST use `build_and_run_ad.py --use-registry` EXACTLY AS-IS ⚠️

⚠️ MANDATORY: You MUST include ALL raw prompts and generated outputs from the final `build_and_run_ad.py` run ⚠️