스킬 파일

Glmv Caption

Name: Glmv Caption
Author: zai-org

Generate captions (descriptions) for images, videos, and documents using ZhiPu GLM-V multimodal model series. Use this skill whenever the user wants to describe, caption, summarize, or interpret the content of images, videos, or files. Supports single/multiple inputs, URLs, local paths, and base64 (images only).

zai-org2,274 스타2026. 3. 30.

직업
카테고리: 미디어

스킬 내용

GLM-V Caption Skill

Generate captions for images, videos, and documents using the ZhiPu GLM-V multimodal model.

When to Use

Describe, caption, summarize, or interpret image/video/document content
User mentions "describe this image", "caption", "summarize this video", "图片描述", "视频摘要", "文档解读", "看图说话"
Extract visual or textual information from media files
Compare multiple images
User provides an image/video/file and asks what's in it

Supported Input Types

Type	Formats	Max Size	Max Count	Base64
Image	jpg, png, jpeg	5MB / 6000×6000px	50	✅
Video	mp4, mkv, mov	200MB	—	❌
File	pdf, docx, txt, xlsx, pptx, jsonl

관련 스킬

Glmv Caption | Skills Pool

OpenClaw config (recommended) / OpenClaw 配置（推荐）： Set in openclaw.json under skills.entries.glmv-caption.env:
```
"glmv-caption": { "enabled": true, "env": { "ZHIPU_API_KEY": "你的密钥" } }
```
Shell environment variable / Shell 环境变量： Add to ~/.zshrc:
```
export ZHIPU_API_KEY="你的密钥"
```
.env file / .env 文件： Create .env in this skill directory:
```
ZHIPU_API_KEY=你的密钥
```

python scripts/glmv_caption.py --images "https://example.com/photo.jpg"
python scripts/glmv_caption.py --images /path/to/photo.png

python scripts/glmv_caption.py --images img1.jpg img2.png "https://example.com/img3.jpg"

python scripts/glmv_caption.py --videos "https://example.com/clip.mp4"

python scripts/glmv_caption.py --files "https://example.com/report.pdf"
python scripts/glmv_caption.py --files "https://example.com/doc1.docx" "https://example.com/doc2.txt"

python scripts/glmv_caption.py --images photo.jpg --prompt "Describe the architecture style in detail"

python scripts/glmv_caption.py --images photo.jpg --output result.json

python scripts/glmv_caption.py --images photo.jpg --thinking

python {baseDir}/scripts/glmv_caption.py (--images IMG [IMG...] | --videos VID [VID...] | --files FILE [FILE...]) [OPTIONS]

Parameter	Required	Description
`--images`, `-i`	One of	Image paths or URLs (supports multiple, base64 OK)
`--videos`, `-v`	One of	Video paths or URLs (supports multiple, mp4/mkv/mov)
`--files`, `-f`	One of	Document paths or URLs (supports multiple, pdf/docx/txt/xlsx/pptx/jsonl)
`--prompt`, `-p`	No	Custom prompt (default: "请详细描述这张图片的内容" / "Please describe this image in detail")
`--model`, `-m`	No	Model name (default: `glm-4.6v`)
`--temperature`, `-t`	No	Sampling temperature 0-1 (default: 0.8)
`--top-p`	No	Nucleus sampling 0.01-1.0 (default: 0.6)
`--max-tokens`	No	Max output tokens (default: 1024, max 32768)
`--thinking`	No	Enable thinking/reasoning mode
`--output`, `-o`	No	Save result JSON to file
`--pretty`	No	Pretty-print JSON output
`--stream`	No	Enable streaming output

{
  "success": true,
  "caption": "A landscape photo showing a mountain range at sunset...",
  "usage": {
    "prompt_tokens": 128,
    "completion_tokens": 256,
    "total_tokens": 384
  }
}

ZHIPU_API_KEY not configured. Get your API key at: https://bigmodel.cn/usercenter/proj-mgmt/apikeys

Resource	Link
Get API Key	https://bigmodel.cn/usercenter/proj-mgmt/apikeys
API Docs	Chat Completions / 对话补全

Glmv Caption

GLM-V Caption Skill

When to Use

Supported Input Types

Glmv Caption

GLM-V Caption Skill

When to Use

Supported Input Types

Resource Links

Prerequisites

API Key Setup / API Key 配置（Required / 必需）

📋 Output Display Rules (MANDATORY)

How to Use

Caption an Image

Caption Multiple Images

Caption a Video

Caption a Document

Custom Prompt

Save Result

Thinking Mode

CLI Reference

Response Format

Error Handling

Songsee

Video Frames

Gifgrep

Qqbot Media

Camsnap

Openai Whisper Api