Main orchestration skill for automatic creation of short-form content (TikTok, YouTube Shorts, Instagram Reels) from long videos. Fully automated workflow: download video, transcribe, detect highlights (transcript + laughter + sentiment + scenes), trim segments, resize to 9:16 portrait, and add subtitles. Finds viral-worthy moments like OpusClip and Vizard.ai.
This is the main orchestration skill that combines all other skills to automatically create short-form content from long videos.
This skill automates the entire workflow:
scripts/autocut.pyMain autocut workflow script.
Usage:
python skills/autocut-shorts/scripts/autocut.py <video_or_url> [options]
Options:
--source: Source type (file, youtube) - auto-detected--num-clips: Number of clips to generate (default: 5)--min-duration: Minimum clip duration in seconds (default: 15)--max-duration: Maximum clip duration in seconds (default: 60)--platform: Target platform (tiktok, shorts, reels, facebook) - default: tiktok--output-dir: Output directory (default: ./shorts/)--transcription-model: Transcription model (auto, whisper, gemini, openai, google) - default: auto--whisper-model: Whisper model size (tiny, base, small, medium, large-v3) - default: large-v3--openai-model: OpenAI Whisper model (default: whisper-1)--google-model: Google Speech model (default: latest_long)--diarization-model: Speaker diarization (auto, pyannote, gemini, none) - default: auto--huggingface-token: HuggingFace token for pyannote (or use env var)--focus-speaker: Extract clips only for specific speaker (SPEAKER_00, etc.)--gemini-api-key: Gemini API key (or use env var)--skip-transcribe: Skip transcription if already have transcript--skip-diarization: Skip speaker diarization--skip-scenes: Skip scene detection--skip-laughter: Skip laughter detection--skip-sentiment: Skip sentiment analysis--transcript-path: Use existing transcript file (SRT/VTT/JSON)--word-timestamps-path: Provide word-timestamp JSON for karaoke subtitles--subtitle-mode: Subtitle mode (auto, word, segment) - default: auto--style: Subtitle style (tiktok, shorts, reels, default) - default: tiktokExamples:
Basic autocut from file:
python skills/autocut-shorts/scripts/autocut.py video.mp4
Autocut from YouTube URL:
python skills/autocut-shorts/scripts/autocut.py "https://www.youtube.com/watch?v=VIDEO_ID"
Generate 10 clips for Instagram Reels:
python skills/autocut-shorts/scripts/autocut.py video.mp4 --num-clips 10 --platform reels --style reels
Use Gemini for transcription:
python skills/autocut-shorts/scripts/autocut.py video.mp4 --transcription-model gemini
Quick local test with Whisper tiny:
python skills/autocut-shorts/scripts/autocut.py video.mp4 --transcription-model whisper --whisper-model tiny
Use OpenAI Whisper API for word-level captions:
python skills/autocut-shorts/scripts/autocut.py video.mp4 --transcription-model openai --subtitle-mode word
Use Google Speech-to-Text for word-level captions:
python skills/autocut-shorts/scripts/autocut.py video.mp4 --transcription-model google --subtitle-mode word
Custom duration range:
python skills/autocut-shorts/scripts/autocut.py video.mp4 --min-duration 20 --max-duration 45
Use existing transcript:
python skills/autocut-shorts/scripts/autocut.py video.mp4 --transcript-path video.srt --skip-transcribe
Use word timestamps JSON directly:
python skills/autocut-shorts/scripts/autocut.py video.mp4 --word-timestamps-path words.json --subtitle-mode word
scripts/quick_cut.pyQuick cut without full analysis (faster).
Usage:
python skills/autocut-shorts/scripts/quick_cut.py <video_path> [options]
Options:
--timestamps: JSON file with timestamps to cut--output-dir: Output directory--platform: Target platformExample:
python skills/autocut-shorts/scripts/quick_cut.py video.mp4 --timestamps cuts.json
If URL provided:
Extracts audio and transcribes:
Runs detection modules:
Combines all signals:
Virality Score =
35% Transcript (hooks, viral content) +
25% Laughter (humor) +
25% Sentiment (emotion) +
15% Scenes (visual transitions)
Ranks all segments and selects top N.
For each highlight:
Converts to 9:16:
Burns in captions:
Saves final clips:
{original}_short_{index}.mp4shorts/
<video_slug>_<YYYYMMDD-HHMMSS>/
clip_001/
master.mp4
data.json
clip_002/
master.mp4
data.json
{
"success": true,
"source": {
"type": "youtube",
"url": "https://youtube.com/watch?v=...",
"title": "Video Title",
"duration": 1200.5
},
"processing": {
"transcription_model": "gemini-flash-lite-latest",
"detection_methods": ["transcript", "laughter", "sentiment", "scenes"],
"platform": "tiktok"
},
"results": {
"total_clips": 5,
"clips": [
{
"rank": 1,
"filename": "video_short_001.mp4",
"start_time": 45.2,
"end_time": 72.5,
"duration": 27.3,
"virality_score": 0.92,
"text": "This is the key moment...",
"output_path": "shorts/video_short_001.mp4"
}
],
"total_duration": 135.5,
"avg_virality_score": 0.78
},
"performance": {
"total_time": 180.5,
"transcription_time": 45.2,
"analysis_time": 67.3,
"processing_time": 68.0
}
}
_tiktok_{index}.mp4_shorts_{index}.mp4_reels_{index}.mp4_facebook_{index}.mp4Transcript (35% weight):
Laughter (25% weight):
Sentiment (25% weight):
Scenes (15% weight):
virality_score = (
transcript_score * 0.35 +
laughter_score * 0.25 +
sentiment_score * 0.25 +
scene_score * 0.15
)
Premium Clips (0.8-1.0): Must include Excellent Clips (0.6-0.8): High priority Good Clips (0.4-0.6): Consider including
Default Behavior (--diarization-model auto): The AI agent automatically selects based on context:
# Use pyannote when:
if "podcast" in user_request or "interview" in user_request:
return "pyannote" # Multi-speaker, needs accuracy
if "accurate" in user_request or "precise" in user_request:
return "pyannote" # User explicitly wants accuracy
if "panel" in user_request or "debate" in user_request:
return "pyannote" # Complex multi-speaker scenarios
if "overlapping" in user_request or "talk over" in user_request:
return "pyannote" # Overlapping speech detection
if "privacy" in user_request or "offline" in user_request:
return "pyannote" # Local processing needed
# Use Gemini when:
if "quick" in user_request or "fast" in user_request:
return "gemini" # Speed priority
if "single speaker" in user_request or "monologue" in user_request:
return "gemini" # Simple scenario
if "no diarization" in user_request or "skip speakers" in user_request:
return "none" # User doesn't want speaker detection
# Default for ambiguous cases:
return "pyannote" if likely_multi_speaker(video) else "gemini"
Decision Matrix:
| Scenario | Recommended | Reason |
|---|---|---|
| Podcast with 2-3 hosts | pyannote | High accuracy for multi-speaker |
| Interview (host + guest) | pyannote | Precise speaker separation |
| Panel discussion | pyannote | Handles 4+ speakers well |
| Single speaker vlog | gemini | Faster, good enough |
| Gaming commentary | gemini | Usually 1-2 speakers |
| Tutorial video | gemini | Single speaker, speed matters |
| Debate/competitive | pyannote | Overlapping speech detection |
| Privacy-sensitive | pyannote | Local processing |
Examples by Use Case:
# Podcast - use pyannote automatically
python skills/autocut-shorts/scripts/autocut.py podcast.mp4
# Interview - use pyannote for accuracy
python skills/autocut-shorts/scripts/autocut.py interview.mp4
# Vlog - use gemini (single speaker, faster)
python skills/autocut-shorts/scripts/autocut.py vlog.mp4
# Force pyannote explicitly
python skills/autocut-shorts/scripts/autocut.py video.mp4 --diarization-model pyannote
# Skip diarization for simple content
python skills/autocut-shorts/scripts/autocut.py tutorial.mp4 --diarization-model none
# Extract only host's segments
python skills/autocut-shorts/scripts/autocut.py podcast.mp4 --focus-speaker SPEAKER_00
The agent automatically detects:
Override any time:
Users can always override with --diarization-model flag.
This skill uses all other skills:
youtube-downloader: Download from URLvideo-transcriber: Transcribe audioscene-detector: Find visual cut pointslaughter-detector: Find funny momentssentiment-analyzer: Find emotional peakshighlight-scanner: Combine all signalsvideo-trimmer: Cut segmentsportrait-resizer: Convert to 9:16subtitle-overlay: Add captionspython skills/autocut-shorts/scripts/autocut.py podcast.mp4 --num-clips 10 --platform shorts
python skills/autocut-shorts/scripts/autocut.py vlog.mp4 --num-clips 5 --platform tiktok
python skills/autocut-shorts/scripts/autocut.py "https://youtube.com/watch?v=..." --platform tiktok
python skills/autocut-shorts/scripts/autocut.py tutorial.mp4 --min-duration 30 --max-duration 60
Processing Time (approximate):
Breakdown: