Name: Cudaq Guide
Author: NVIDIA

Cudaq Guide | Skills Pool

Section	Doc file
Install	`docs/sphinx/using/install/install.rst`, `docs/sphinx/using/quick_start.rst`
Test Program	`docs/sphinx/using/basics/kernel_intro.rst`, `docs/sphinx/using/basics/build_kernel.rst`
GPU Simulation	`docs/sphinx/using/backends/sims/svsims.rst`, `docs/sphinx/using/examples/multi_gpu_workflows.rst`
QPU	`docs/sphinx/using/backends/hardware.rst`, `docs/sphinx/using/backends/cloud.rst`
Applications	`docs/sphinx/using/applications.rst`
Parallelize	`docs/sphinx/using/examples/multi_gpu_workflows.rst`

Argument	Action
`install`	Walk through installation (see Install section)
`test-program`	Build and run a Bell state kernel to verify CUDA-Q is working properly
`gpu-sim`	Explain GPU-accelerated simulation targets (see GPU Simulation section)
`qpu`	Explain how to run on real QPU hardware (see QPU section)
`applications`	Showcase what can be built with CUDA-Q (see Applications section)
`parallelize`	Show how to run circuits in parallel across multiple QPUs (see Parallelize section)
(none)	Print the full menu below and ask what they'd like to explore

CUDA-Q Getting Started

CUDA-Q is NVIDIA's unified quantum-classical programming model for CPUs, GPUs, and QPUs.
Supports Python and C++. Docs https://nvidia.github.io/cuda-quantum/

Choose a topic
  /cudaq-guide install         Install CUDA-Q (Python pip or C++ binary)
  /cudaq-guide test-program    Write and run your quantum kernel
  /cudaq-guide gpu-sim         Accelerate simulation on NVIDIA GPUs
  /cudaq-guide qpu             Connect to real QPU hardware
  /cudaq-guide applications    Explore what you can build
  /cudaq-guide parallelize     Run circuits in parallel across multiple QPUs

Specialized skills
  /cudaq-qec        Quantum Error Correction memory experiments
  /cudaq-chemistry  Quantum chemistry (VQE, ADAPT-VQE)
  /cudaq-add-backend  Add a new hardware backend
  /cudaq-compiler   Work with the CUDA-Q compiler IR
  /cudaq-benchmark  Benchmark and optimize performance

Linux (x86_64, ARM64): full GPU support - pip install cudaq + CUDA Toolkit
macOS (ARM64/Apple Silicon): CPU simulation only - pip install cudaq (no CUDA Toolkit needed)
Windows: use WSL, then follow Linux instructions
C++ (no sudo): bash install_cuda_quantum*.$(uname -m) --accept -- --installpath $HOME/.cudaq
Brev (cloud, no local setup): Log in at the NVIDIA Application Hub, open a CUDA-Q workspace, then SSH in with the Brev CLI:
```
brev open ${WORKSPACE_NAME}
```
CUDA-Q and the CUDA Toolkit are pre-installed.

Target	Description	Use when
`nvidia` (default)	Single-GPU state vector via cuStateVec (up to ~30 qubits)	Default choice for most simulations on a single GPU
`nvidia --target-option fp64`	Double-precision single GPU	Higher numerical precision needed (e.g. chemistry, sensitive observables)
`nvidia --target-option mgpu`	Multi-GPU, pools memory across GPUs (>30 qubits)	Circuit exceeds single-GPU memory; requires MPI
`nvidia --target-option mqpu`	Multi-QPU, one virtual QPU per GPU, parallel execution	Running many independent circuits in parallel (e.g. parameter sweeps, VQE gradients)
`tensornet`	Tensor network simulator	Shallow or low-entanglement circuits; qubit count exceeds statevector feasibility
`qpp-cpu`	CPU-only fallback (OpenMP)	No GPU available; macOS; small circuits for testing

Which QPU technology are you targeting?
  1. Ion trap       (IonQ, Quantinuum)
  2. Superconducting (IQM, OQC, Anyon, TII, QCI)
  3. Neutral atom   (QuEra, Infleqtion, Pasqal)
  4. Cloud / multi-platform (AWS Braket, Scaleway)

Technology	Provider	Doc file
Ion trap	IonQ	`docs/sphinx/using/backends/hardware/iontrap.rst` (IonQ section)
Ion trap	Quantinuum	`docs/sphinx/using/backends/hardware/iontrap.rst` (Quantinuum section)
Superconducting	IQM	`docs/sphinx/using/backends/hardware/superconducting.rst` (IQM section)
Superconducting	OQC	`docs/sphinx/using/backends/hardware/superconducting.rst` (OQC section)
Superconducting	Anyon	`docs/sphinx/using/backends/hardware/superconducting.rst` (Anyon section)
Superconducting	TII	`docs/sphinx/using/backends/hardware/superconducting.rst` (TII section)
Superconducting	QCI	`docs/sphinx/using/backends/hardware/superconducting.rst` (QCI section)
Neutral atom	Infleqtion	`docs/sphinx/using/backends/hardware/neutralatom.rst` (Infleqtion section)
Neutral atom	QuEra	`docs/sphinx/using/backends/hardware/neutralatom.rst` (QuEra section)
Neutral atom	Pasqal	`docs/sphinx/using/backends/hardware/neutralatom.rst` (Pasqal section)
Cloud	AWS Braket	`docs/sphinx/using/backends/cloud/braket.rst`
Cloud	Scaleway	`docs/sphinx/using/backends/cloud/scaleway.rst`

Category	Examples
Optimization	QAOA, ADAPT-QAOA, MaxCut
Chemistry	VQE, UCCSD, ADAPT-VQE -> see `/cudaq-chemistry`
Error Correction	Surface codes, QEC memory -> see `/cudaq-qec`
Algorithms	Grover's, Shor's, QFT, Deutsch-Jozsa, HHL
ML	Quantum neural networks, kernel methods
Simulation	Hamiltonian dynamics, Trotter evolution
Finance	Portfolio optimization, Monte Carlo

Goal	Strategy	Target option
Single circuit too large for one GPU	Pool GPU memory	`nvidia --target-option mgpu`
Many independent circuits at once	Run circuits in parallel	`nvidia --target-option mqpu`
Large Hamiltonian expectation value	Distribute terms across GPUs	`mqpu` + `execution=cudaq.parallel.thread`

import cudaq

cudaq.set_target("nvidia", option="mqpu")
n_qpus = cudaq.get_platform().num_qpus()

futures = [
    cudaq.observe_async(kernel, hamiltonian, params, qpu_id=i % n_qpus)
    for i, params in enumerate(param_sets)
]
results = [f.get().expectation() for f in futures]

# Single node, multiple GPUs
result = cudaq.observe(kernel, hamiltonian, *args,
                       execution=cudaq.parallel.thread)

# Multi-node via MPI
result = cudaq.observe(kernel, hamiltonian, *args,
                       execution=cudaq.parallel.mpi)

Cudaq Guide

CUDA-Q Getting Started Guide

Purpose

Prerequisites

Instructions

Cudaq Guide

CUDA-Q Getting Started Guide

Purpose

Prerequisites

Instructions

References

Routing by Argument

Full Menu (no argument)

Install

Test Program

GPU Simulation

Available GPU Targets

QPU

Applications

Parallelize

Circuit batching with mqpu (`sample_async` / `observe_async`)

Hamiltonian batching

Limitations

Troubleshooting

Healthcare Cdss Patterns

Drug Discovery

Qmd

Attack Tree Construction

Azure Ai Anomalydetector Java

Viboscope

Cudaq Guide

CUDA-Q Getting Started Guide

Purpose

Prerequisites

Instructions

Cudaq Guide

CUDA-Q Getting Started Guide

Purpose

Prerequisites

Instructions

References

Routing by Argument

Full Menu (no argument)

Install

Test Program

GPU Simulation

Available GPU Targets

QPU

Applications

Parallelize

Circuit batching with mqpu (sample_async / observe_async)

Hamiltonian batching

Limitations

Troubleshooting

Healthcare Cdss Patterns

Drug Discovery

Qmd

Attack Tree Construction

Azure Ai Anomalydetector Java

Viboscope

Circuit batching with mqpu (`sample_async` / `observe_async`)