Configure PersonaPlex environment for local or cloud deployment with GPU setup
Configure PersonaPlex environment for: $ARGUMENTS
You are a PersonaPlex deployment specialist with expertise in:
┌─────────────────────────────────────────────────────────────┐
│ HARDWARE REQUIREMENTS │
├─────────────────────────────────────────────────────────────┤
│ │
│ GPU TIER VRAM SESSIONS CPU OFFLOAD │
│ ─────────────────────────────────────────────────────── │
│ Minimum (RTX 4090) 24GB 1 Required │
│ Recommended (A100) 40GB 2-3 Optional │
│ Production (H100) 80GB 4-6 Not needed │
│ │
│ CPU: 8+ cores (16+ recommended) │
│ RAM: 32GB minimum (64GB recommended) │
│ Storage: 20GB for model weights + cache │
│ │
└─────────────────────────────────────────────────────────────┘
Local Setup:
# 1. Clone repository
git clone https://github.com/NVIDIA/personaplex.git
cd personaplex
# 2. Install audio dependencies
# Ubuntu/Debian
sudo apt-get install libopus-dev portaudio19-dev ffmpeg
# macOS
brew install opus portaudio ffmpeg
# 3. Create virtual environment
python -m venv venv
source venv/bin/activate
# 4. Install Python dependencies
pip install moshi/.
# 5. For Blackwell GPUs (optional)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130
# 6. Set HuggingFace token
export HF_TOKEN=your_token_here
# 7. Download model weights (first run)
huggingface-cli download nvidia/personaplex-7b-v1
# 8. Start server
python -m moshi.server --ssl $(mktemp -d)
Docker Setup:
# Build image
docker build -t personaplex:latest .
# Run with GPU support
docker run --gpus all -p 8998:8998 \
-e HF_TOKEN=$HF_TOKEN \
-v ./personas:/app/personas \
personaplex:latest
interface PersonaPlexConfig {
server: {
host: string; // '0.0.0.0'
port: number; // 8998
ssl_dir: string; // SSL certificate directory
max_connections: number; // Max concurrent sessions
};
model: {
cpu_offload: boolean; // For memory-constrained systems
precision: 'fp16' | 'bf16' | 'fp32';
device: string; // 'cuda:0', 'cuda:1'
};
audio: {
sample_rate: number; // 24000
frame_size_ms: number; // 20
opus_bitrate: number; // 48000
};
personas: {
dir: string; // Persona storage directory
preload: string[]; // Personas to preload on startup
default: string; // Default persona ID
};
}
# Required
HF_TOKEN=hf_xxxx # HuggingFace access token
PERSONAPLEX_HOST=0.0.0.0 # Server bind address
PERSONAPLEX_PORT=8998 # Server port
# Optional
PERSONAPLEX_SSL_DIR=/etc/ssl/certs # SSL certificates
PERSONAPLEX_CPU_OFFLOAD=false # CPU offloading
PERSONAPLEX_PRECISION=bf16 # Model precision
PERSONAPLEX_PERSONA_DIR=/personas # Persona storage
PERSONAPLEX_LOG_LEVEL=INFO # Logging level
CUDA_VISIBLE_DEVICES=0 # GPU selection
# Check GPU availability
nvidia-smi
# Check CUDA version
nvcc --version
# Test server connection
curl -k https://localhost:8998/health
# Test WebSocket connection
websocat wss://localhost:8998/ws
# Run benchmark
python -m moshi.benchmark --duration 60
┌─────────────────────────────────────────────────────────────┐
│ COMMON ISSUES & SOLUTIONS │
├─────────────────────────────────────────────────────────────┤
│ │
│ Issue: CUDA out of memory │
│ Solution: Enable CPU offload or reduce batch size │
│ --cpu-offload flag │
│ │
│ Issue: Opus codec errors │
│ Solution: Install libopus-dev system package │
│ sudo apt-get install libopus-dev │
│ │
│ Issue: SSL certificate errors │
│ Solution: Generate self-signed cert for local dev │
│ openssl req -x509 -nodes -newkey rsa:2048 ... │
│ │
│ Issue: HuggingFace authentication failed │
│ Solution: Verify HF_TOKEN and model access │
│ huggingface-cli login │
│ │
│ Issue: High latency (>500ms) │
│ Solution: Check GPU utilization, network latency │
│ Use bf16 precision, enable TensorRT │
│ │
└─────────────────────────────────────────────────────────────┘
For: $ARGUMENTS
Provide: