Generate captions (descriptions) for images, videos, and documents using ZhiPu GLM-V multimodal model series. Use this skill whenever the user wants to describe, caption, summarize, or interpret the content of images, videos, or files. Supports single/multiple inputs, URLs, local paths, and base64 (images only).
Generate captions for images, videos, and documents using the ZhiPu GLM-V multimodal model.
| Type | Formats | Max Size | Max Count | Base64 |
|---|---|---|---|---|
| Image | jpg, png, jpeg | 5MB / 6000×6000px | 50 | ✅ |
| Video | mp4, mkv, mov | 200MB | — | ❌ |
| File | pdf, docx, txt, xlsx, pptx, jsonl |
| — |
| 50 |
| ❌ |
⚠️ file_url cannot mix with image_url or video_url in the same request. ⚠️ Videos and files only support URLs — local paths and base64 are NOT supported (images only).
| Resource | Link |
|---|---|
| Get API Key | https://bigmodel.cn/usercenter/proj-mgmt/apikeys |
| API Docs | Chat Completions / 对话补全 |
This script reads the key from the ZHIPU_API_KEY environment variable and shares it with other Zhipu skills.
脚本通过 ZHIPU_API_KEY 环境变量获取密钥,与其他智谱技能共用同一个 key。
Get Key / 获取 Key: Visit Zhipu Open Platform API Keys / 智谱开放平台 API Keys to create or copy your key.
Setup options / 配置方式(任选一种):
OpenClaw config (recommended) / OpenClaw 配置(推荐): Set in openclaw.json under skills.entries.glmv-caption.env:
"glmv-caption": { "enabled": true, "env": { "ZHIPU_API_KEY": "你的密钥" } }
Shell environment variable / Shell 环境变量: Add to ~/.zshrc:
export ZHIPU_API_KEY="你的密钥"
.env file / .env 文件: Create .env in this skill directory:
ZHIPU_API_KEY=你的密钥
⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔
python scripts/glmv_caption.pyAfter running the script, you must show the full raw output to the user exactly as returned. Do not summarize, truncate, or only say "generated". Users need the original model output to evaluate quality.
python scripts/glmv_caption.py --images "https://example.com/photo.jpg"
python scripts/glmv_caption.py --images /path/to/photo.png
python scripts/glmv_caption.py --images img1.jpg img2.png "https://example.com/img3.jpg"
python scripts/glmv_caption.py --videos "https://example.com/clip.mp4"
python scripts/glmv_caption.py --files "https://example.com/report.pdf"
python scripts/glmv_caption.py --files "https://example.com/doc1.docx" "https://example.com/doc2.txt"
python scripts/glmv_caption.py --images photo.jpg --prompt "Describe the architecture style in detail"
python scripts/glmv_caption.py --images photo.jpg --output result.json
python scripts/glmv_caption.py --images photo.jpg --thinking
python {baseDir}/scripts/glmv_caption.py (--images IMG [IMG...] | --videos VID [VID...] | --files FILE [FILE...]) [OPTIONS]
| Parameter | Required | Description |
|---|---|---|
--images, -i | One of | Image paths or URLs (supports multiple, base64 OK) |
--videos, -v | One of | Video paths or URLs (supports multiple, mp4/mkv/mov) |
--files, -f | One of | Document paths or URLs (supports multiple, pdf/docx/txt/xlsx/pptx/jsonl) |
--prompt, -p | No | Custom prompt (default: "请详细描述这张图片的内容" / "Please describe this image in detail") |
--model, -m | No | Model name (default: glm-4.6v) |
--temperature, -t | No | Sampling temperature 0-1 (default: 0.8) |
--top-p | No | Nucleus sampling 0.01-1.0 (default: 0.6) |
--max-tokens | No | Max output tokens (default: 1024, max 32768) |
--thinking | No | Enable thinking/reasoning mode |
--output, -o | No | Save result JSON to file |
--pretty | No | Pretty-print JSON output |
--stream | No | Enable streaming output |
Note: --images, --videos, and --files are mutually exclusive per API limits.
{
"success": true,
"caption": "A landscape photo showing a mountain range at sunset...",
"usage": {
"prompt_tokens": 128,
"completion_tokens": 256,
"total_tokens": 384
}
}
Key fields:
success — whether the request succeededcaption — the generated caption textusage — token usage statisticswarning — present when content was blocked by safety reviewerror — error details on failureAPI key not configured:
ZHIPU_API_KEY not configured. Get your API key at: https://bigmodel.cn/usercenter/proj-mgmt/apikeys
→ Show exact error to user, guide them to configure
Authentication failed (401/403): API key invalid/expired → reconfigure
Rate limit (429): Quota exhausted → inform user to wait
File not found: Local file missing → check path
Content filtered: warning field present → content blocked by safety review