Name: Clip Hand Skill
Author: RightNow-AI

스킬 검색.../

Clip Hand Skill | Skills Pool

# Best video up to 1080p + best audio, merged
yt-dlp -f "bv[height<=1080]+ba/b[height<=1080]" --restrict-filenames -o "source.%(ext)s" "URL"

# 720p max (smaller, faster)
yt-dlp -f "bv[height<=720]+ba/b[height<=720]" --restrict-filenames -o "source.%(ext)s" "URL"

# Audio only (for transcription-only workflows)
yt-dlp -x --audio-format wav --restrict-filenames -o "audio.%(ext)s" "URL"

# Get full metadata as JSON (duration, title, chapters, available subs)
yt-dlp --dump-json "URL"

# Key fields: duration, title, description, chapters, subtitles, automatic_captions

# Download auto-generated subtitles in json3 format (word-level timing)
yt-dlp --write-auto-subs --sub-lang en --sub-format json3 --skip-download --restrict-filenames -o "source" "URL"

# Download manual subtitles if available
yt-dlp --write-subs --sub-lang en --sub-format srt --skip-download --restrict-filenames -o "source" "URL"

# List available subtitle languages
yt-dlp --list-subs "URL"

# Extract mono 16kHz WAV (whisper's preferred input format)
ffmpeg -i source.mp4 -vn -ar 16000 -ac 1 -y audio.wav

# Standard transcription with word-level timestamps
whisper audio.wav --model small --output_format json --word_timestamps true --language en

# Faster alternative (same flags, 4x speed)
whisper-ctranslate2 audio.wav --model small --output_format json --word_timestamps true --language en

Model	VRAM	Speed	Quality	Use When
tiny	~1GB	Fastest	Rough	Quick previews, testing pipeline
base	~1GB	Fast	OK	Short clips, clear speech
small	~2GB	Good	Good	Default — best balance
medium	~5GB	Slow	Better	Important content, accented speech
large-v3	~10GB	Slowest	Best	Final production, multiple languages

{
  "text": "full transcript text...",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 4.52,
      "text": " Hello everyone, welcome back.",
      "words": [
        {"word": " Hello", "start": 0.0, "end": 0.32, "probability": 0.95},
        {"word": " everyone,", "start": 0.32, "end": 0.78, "probability": 0.91},
        {"word": " welcome", "start": 0.78, "end": 1.14, "probability": 0.98},
        {"word": " back.", "start": 1.14, "end": 1.52, "probability": 0.97}
      ]
    }
  ]
}

{
  "events": [
    {
      "tStartMs": 1230,
      "dDurationMs": 5000,
      "segs": [
        {"utf8": "hello ", "tOffsetMs": 0},
        {"utf8": "world ", "tOffsetMs": 200},
        {"utf8": "how ", "tOffsetMs": 450},
        {"utf8": "are you", "tOffsetMs": 700}
      ]
    }
  ]
}

Feature	macOS / Linux	Windows (cmd.exe)
Suppress stderr	`2>/dev/null`	`2>NUL`
Filter output	`\| grep pattern`	`\| findstr pattern`
Delete files	`rm file1 file2`	`del file1 file2`
Null output device	`-f null -`	`-f null -` (same)
ffmpeg subtitle paths	`subtitles=clip.srt`	`subtitles=clip.srt` (relative OK, absolute needs `C\\:/path`)

Feature	macOS / Linux	Windows (cmd.exe)
Suppress stderr	`2>/dev/null`	`2>NUL`
Filter output	`\| grep pattern`	`\| findstr pattern`
Delete files	`rm file1 file2`	`del file1 file2`
Null output device	`-f null -`	`-f null -` (same)
ffmpeg subtitle paths	`subtitles=clip.srt`	`subtitles=clip.srt` (relative OK, absolute needs `C\\:/path`)

Clip Hand Skill

Video Clipping Expert Knowledge

Cross-Platform Notes

Clip Hand Skill

Video Clipping Expert Knowledge

Cross-Platform Notes

yt-dlp Reference

Download with Format Selection

Metadata Inspection

YouTube Auto-Subtitles

Useful Flags

Whisper Transcription Reference

Audio Extraction for Whisper

Basic Transcription

Model Sizes

JSON Output Structure

YouTube json3 Subtitle Parsing

Format Structure

Extracting Word Timing

SRT Generation from Transcript

SRT Format

Songsee

Video Frames

Gifgrep

Qqbot Media

Camsnap

Openai Whisper Api