Use mmx to generate text, images, video, speech, and music via the MiniMax AI platform. Use when the user wants to create media content, chat with MiniMax models, perform web search, or manage MiniMax API resources from the terminal.
Use mmx to generate text, images, video, speech, music, and perform web search via the MiniMax AI platform.
# Install
npm install -g mmx-cli
# Auth (persisted to ~/.mmx/credentials.json)
mmx auth login --api-key sk-xxxxx
# Or pass per-call
mmx text chat --api-key sk-xxxxx --message "Hello"
Region is auto-detected. Override with --region global or --region cn.
Always use these flags in non-interactive (agent/CI) contexts:
| Flag | Purpose |
|---|---|
--non-interactive | Fail fast on missing args instead of prompting |
--quiet |
| Suppress spinners/progress; stdout is pure data |
--output json | Machine-readable JSON output |
--async | Return task ID immediately (video generation) |
--dry-run | Preview the API request without executing |
--yes | Skip confirmation prompts |
Chat completion. Default model: MiniMax-M2.7.
mmx text chat --message <text> [flags]
| Flag | Type | Description |
|---|---|---|
--message <text> | string, required, repeatable | Message text. Prefix with role: to set role (e.g. "system:You are helpful", "user:Hello") |
--messages-file <path> | string | JSON file with messages array. Use - for stdin |
--system <text> | string | System prompt |
--model <model> | string | Model ID (default: MiniMax-M2.7) |
--max-tokens <n> | number | Max tokens (default: 4096) |
--temperature <n> | number | Sampling temperature (0.0, 1.0] |
--top-p <n> | number | Nucleus sampling threshold |
--stream | boolean | Stream tokens (default: on in TTY) |
--tool <json-or-path> | string, repeatable | Tool definition JSON or file path |
# Single message
mmx text chat --message "user:What is MiniMax?" --output json --quiet
# Multi-turn
mmx text chat \
--system "You are a coding assistant." \
--message "user:Write fizzbuzz in Python" \
--output json
# From file
cat conversation.json | mmx text chat --messages-file - --output json
stdout: response text (text mode) or full response object (json mode).
Generate images. Model: image-01.
mmx image generate --prompt <text> [flags]
| Flag | Type | Description |
|---|---|---|
--prompt <text> | string, required | Image description |
--aspect-ratio <ratio> | string | e.g. 16:9, 1:1 |
--n <count> | number | Number of images (default: 1) |
--subject-ref <params> | string | Subject reference: type=character,image=path-or-url |
--out-dir <dir> | string | Download images to directory |
--out-prefix <prefix> | string | Filename prefix (default: image) |
mmx image generate --prompt "A cat in a spacesuit" --output json --quiet
# stdout: image URLs (one per line in quiet mode)
mmx image generate --prompt "Logo" --n 3 --out-dir ./gen/ --quiet
# stdout: saved file paths (one per line)
Generate video. Default model: MiniMax-Hailuo-2.3. This is an async task — by default it polls until completion.
mmx video generate --prompt <text> [flags]
| Flag | Type | Description |
|---|---|---|
--prompt <text> | string, required | Video description |
--model <model> | string | MiniMax-Hailuo-2.3 (default) or MiniMax-Hailuo-2.3-Fast |
--first-frame <path-or-url> | string | First frame image |
--callback-url <url> | string | Webhook URL for completion |
--download <path> | string | Save video to specific file |
--async | boolean | Return task ID immediately |
--no-wait | boolean | Same as --async |
--poll-interval <seconds> | number | Polling interval (default: 5) |
# Non-blocking: get task ID
mmx video generate --prompt "A robot." --async --quiet
# stdout: {"taskId":"..."}
# Blocking: wait and get file path
mmx video generate --prompt "Ocean waves." --download ocean.mp4 --quiet
# stdout: ocean.mp4
Query status of a video generation task.
mmx video task get --task-id <id> [--output json]
Download a completed video by task ID.
mmx video download --file-id <id> [--out <path>]
Text-to-speech. Default model: speech-2.8-hd. Max 10k chars.
mmx speech synthesize --text <text> [flags]
| Flag | Type | Description |
|---|---|---|
--text <text> | string | Text to synthesize |
--text-file <path> | string | Read text from file. Use - for stdin |
--model <model> | string | speech-2.8-hd (default), speech-2.6, speech-02 |
--voice <id> | string | Voice ID (default: English_expressive_narrator) |
--speed <n> | number | Speed multiplier |
--volume <n> | number | Volume level |
--pitch <n> | number | Pitch adjustment |
--format <fmt> | string | Audio format (default: mp3) |
--sample-rate <hz> | number | Sample rate (default: 32000) |
--bitrate <bps> | number | Bitrate (default: 128000) |
--channels <n> | number | Audio channels (default: 1) |
--language <code> | string | Language boost |
--subtitles | boolean | Include subtitle timing data |
--pronunciation <from/to> | string, repeatable | Custom pronunciation |
--sound-effect <effect> | string | Add sound effect |
--out <path> | string | Save audio to file |
--stream | boolean | Stream raw audio to stdout |
mmx speech synthesize --text "Hello world" --out hello.mp3 --quiet
# stdout: hello.mp3
echo "Breaking news." | mmx speech synthesize --text-file - --out news.mp3
Generate music. Model: music-2.5. Responds well to rich, structured descriptions.
mmx music generate --prompt <text> [--lyrics <text>] [flags]
| Flag | Type | Description |
|---|---|---|
--prompt <text> | string | Music style description (can be detailed) |
--lyrics <text> | string | Song lyrics with structure tags. Use "\u65e0\u6b4c\u8bcd" for instrumental. Cannot be used with --instrumental |
--lyrics-file <path> | string | Read lyrics from file. Use - for stdin |
--vocals <text> | string | Vocal style, e.g. "warm male baritone", "bright female soprano", "duet with harmonies" |
--genre <text> | string | Music genre, e.g. folk, pop, jazz |
--mood <text> | string | Mood or emotion, e.g. warm, melancholic, uplifting |
--instruments <text> | string | Instruments to feature, e.g. "acoustic guitar, piano" |
--tempo <text> | string | Tempo description, e.g. fast, slow, moderate |
--bpm <number> | number | Exact tempo in beats per minute |
--key <text> | string | Musical key, e.g. C major, A minor, G sharp |
--avoid <text> | string | Elements to avoid in the generated music |
--use-case <text> | string | Use case context, e.g. "background music for video", "theme song" |
--structure <text> | string | Song structure, e.g. "verse-chorus-verse-bridge-chorus" |
--references <text> | string | Reference tracks or artists, e.g. "similar to Ed Sheeran" |
--extra <text> | string | Additional fine-grained requirements |
--instrumental | boolean | Generate instrumental music (no vocals). Cannot be used with --lyrics or --lyrics-file |
--aigc-watermark | boolean | Embed AI-generated content watermark |
--format <fmt> | string | Audio format (default: mp3) |
--sample-rate <hz> | number | Sample rate (default: 44100) |
--bitrate <bps> | number | Bitrate (default: 256000) |
--out <path> | string | Save audio to file |
--stream | boolean | Stream raw audio to stdout |
At least one of --prompt or --lyrics is required.
# Simple usage
mmx music generate --prompt "Upbeat pop" --lyrics "La la la..." --out song.mp3 --quiet
# Detailed prompt with vocal characteristics
mmx music generate --prompt "Warm morning folk" \
--vocals "male and female duet, harmonies in chorus" \
--instruments "acoustic guitar, piano" \
--bpm 95 \
--lyrics-file song.txt \
--out duet.mp3
# Instrumental (use --instrumental flag)
mmx music generate --prompt "Cinematic orchestral, building tension" --instrumental --out bgm.mp3
Image understanding via VLM. Provide either --image or --file-id, not both.
mmx vision describe (--image <path-or-url> | --file-id <id>) [flags]
| Flag | Type | Description |
|---|---|---|
--image <path-or-url> | string | Local path or URL (auto base64-encoded) |
--file-id <id> | string | Pre-uploaded file ID (skips base64) |
--prompt <text> | string | Question about the image (default: "Describe the image.") |
mmx vision describe --image photo.jpg --prompt "What breed?" --output json
stdout: description text (text mode) or full response (json mode).
Web search via MiniMax.
mmx search query --q <query>
| Flag | Type | Description |
|---|---|---|
--q <query> | string, required | Search query |
mmx search query --q "MiniMax AI" --output json --quiet
Display Token Plan usage and remaining quotas.
mmx quota show [--output json]
Export all commands as Anthropic/OpenAI-compatible JSON tool schemas:
# All tool-worthy commands (excludes auth/config/update)
mmx config export-schema
# Single command
mmx config export-schema --command "video generate"
Use this to dynamically register mmx commands as tools in your agent framework.
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | General error |
| 2 | Usage error (bad flags, missing args) |
| 3 | Authentication error |
| 4 | Quota exceeded |
| 5 | Timeout |
| 10 | Content filter triggered |
# stdout is always clean data — safe to pipe
mmx text chat --message "Hi" --output json | jq '.content'
# stderr has progress/spinners — discard if needed
mmx video generate --prompt "Waves" 2>/dev/null
# Chain: generate image → describe it
URL=$(mmx image generate --prompt "A sunset" --quiet)
mmx vision describe --image "$URL" --quiet
# Async video workflow
TASK=$(mmx video generate --prompt "A robot" --async --quiet | jq -r '.taskId')
mmx video task get --task-id "$TASK" --output json
mmx video download --task-id "$TASK" --out robot.mp4
CLI flags → environment variables → ~/.mmx/config.json → defaults.
# Persistent config
mmx config set --key region --value cn
mmx config show
# Environment
export MINIMAX_API_KEY=sk-xxxxx
export MINIMAX_REGION=cn