Extracts one readable markdown transcript from a YouTube URL or video ID using yt-dlp with Korean-first subtitle selection, then routes output through an LLM workflow for translation, summary writing, and humanized Korean style cleanup. Use when users ask to extract, structure, timestamp, or translate YouTube transcripts (especially into Korean).
Generate one readable markdown transcript file. Use Korean subtitle tracks first; if the extracted text is non-Korean, your LLM workflow translates the readable file into Korean and overwrites the same file. After translation/summarization, run a humanization pass to remove AI-style Korean writing patterns.
artifacts/ unless the user asks otherwise.~/.gemini/tmp/...), relative output paths are mapped to the original project path from .project_root.Pick <PYTHON_CMD> by environment:
python3 (or python if python --version is 3.x)py -3 (or if it points to Python 3)pythonPIP_DISABLE_PIP_VERSION_CHECK=1 <PYTHON_CMD> -m pip install yt-dlp
Run from this skill directory:
<PYTHON_CMD> scripts/extract_youtube_ko_transcript.py \
--video "https://www.youtube.com/watch?v=VIDEO_ID" \
--output-dir artifacts
If YouTube blocks requests, retry with network/auth options:
<PYTHON_CMD> scripts/extract_youtube_ko_transcript.py \
--video "https://www.youtube.com/watch?v=VIDEO_ID" \
--output-dir artifacts \
--cookies /path/to/cookies.txt
or
<PYTHON_CMD> scripts/extract_youtube_ko_transcript.py \
--video "https://www.youtube.com/watch?v=VIDEO_ID" \
--output-dir artifacts \
--proxy-url "http://user:pass@host:port"
If your environment has TLS interception/certificate issues:
<PYTHON_CMD> scripts/extract_youtube_ko_transcript.py \
--video "https://www.youtube.com/watch?v=VIDEO_ID" \
--output-dir artifacts \
--no-check-certificate
Expect one file:
<video_id>.ko.readable.md--section-minutes.Frontmatter/YAML properties to check:
title: ...url: ...video_id: ...channel: ...published: YYYY-MM-DDcreated: YYYY-MM-DDneeds_llm_translation: true|falseSummary/template sections to check:
## 📎 Summary Source (Compact)## 📌 Executive Summary## 🔍 Detailed Summary## 💡 Key Insights & Action Items## 📝 TranscriptToken-saving execution policy (MUST):
replace or one-shot file edit). Avoid multi-turn read/replace/write loops.## 📎 Summary Source (Compact) as the primary input for summary generation.Executive Summary: 4~6 sentencesDetailed Summary: 4~6 bulletsKey Insights & Action Items: 3~5 bullets## 📎 Summary Source (Compact) before final delivery unless the user explicitly asks to keep it.Case A: needs_llm_translation: true
[HH:MM:SS] timestamps exactly unchanged.<video_id>.ko.readable.md) in place.needs_llm_translation: false.Case B: needs_llm_translation: false
After step 5, you MUST apply a humanized Korean style pass.
Core rewriting rules:
핵심적, 획기적, 중대한, 새 지평, 패러다임 전환) unless objectively justified.전문가들은, 업계 관계자에 따르면) with concrete sources, or remove.이를 통해, 아울러, 나아가, 이러한 맥락에서).~적 adjectives and inflated Sino-Korean wording.~에 있어서, ~함에 있어; rewrite to direct Korean.Use one of the templates below in your current environment (Gemini/OpenAI/Claude).
Template A (Korean default):
다음 Markdown 파일을 수정해 주세요.
파일 경로: <READABLE_FILE_PATH>
규칙:
1) 이 작업은 한 번의 편집 호출로 끝냅니다.
2) `## 📎 Summary Source (Compact)`만 사용해 요약을 작성합니다.
3) `needs_llm_translation: true`이면 `## 📝 Transcript` 본문을 자연스러운 한국어로 번역하고, `false`이면 Transcript는 수정하지 않습니다.
4) `## 📝 Transcript` 아래의 소제목과 `[HH:MM:SS]` 타임스탬프는 원문을 그대로 유지합니다.
5) `## 📌 Executive Summary`는 4~6문장, `## 🔍 Detailed Summary`는 4~6개 bullet, `## 💡 Key Insights & Action Items`는 3~5개 bullet로 작성합니다.
6) 고유명사(인명, 제품명, 프로젝트명)는 의미가 어색해지면 원문을 유지합니다.
7) YAML frontmatter는 유지하되 `needs_llm_translation: true`는 `needs_llm_translation: false`로 바꿉니다.
8) 마지막으로 AI 글쓰기 패턴을 제거합니다: 과장어, 모호한 출처, `~적`/`~에 있어서` 남용, 연결어 남발을 줄이고 더 자연스러운 한국어 문장으로 다듬습니다.
9) 최종본 저장 전 `## 📎 Summary Source (Compact)` 섹션은 삭제합니다(사용자가 유지 요청한 경우 제외).
10) 설명 출력 없이 파일만 덮어써서 저장합니다.
Template B (target language variable):
Edit this markdown file in place.
File path: <READABLE_FILE_PATH>
Target language: <TARGET_LANGUAGE>
Requirements:
1) Complete this in one edit call.
2) Use only `Summary Source (Compact)` for summary writing.
3) If `needs_llm_translation: true`, translate transcript body into <TARGET_LANGUAGE>; otherwise, do not edit transcript body.
4) Preserve transcript section headings and `[HH:MM:SS]` timestamps exactly.
5) Keep summary lengths: Executive 4-6 sentences, Detailed 4-6 bullets, Action Items 3-5 bullets.
6) Keep proper nouns in original form when translation hurts clarity.
7) Keep YAML frontmatter unchanged except setting `needs_llm_translation: false` when translation is done.
8) Apply a humanization pass: remove hype-heavy wording, vague attribution, repetitive connectors, and overly formulaic AI phrasing.
9) Remove the `Summary Source (Compact)` section before final delivery unless the user asks to keep it.
10) Save changes to the same file with no extra commentary.
HH:MM:SS.Summary Source (Compact) content.Summary Source (Compact) is removed in the final file by default.extract_youtube_ko_transcript.py supports:
--video: YouTube URL or 11-char video ID (required)--output-dir: destination folder (default: artifacts; relative paths map to original project path in Gemini temp sessions, otherwise current working directory)--prefer-korean-languages: preferred Korean track codes (default: ko,ko-KR)--source-languages: fallback source track order (default: en,en-US,ja,es,fr,de)--languages: backward-compatible alias of --source-languages--proxy-url: proxy URL for yt-dlp--proxy-http-url: legacy alias for HTTP proxy--proxy-https-url: legacy alias for HTTPS proxy--cookies: Netscape-format cookies file path--user-agent: custom User-Agent string--no-check-certificate: disable TLS certificate verification (use only when needed)--no-chapters: disable chapter-based organization and always use fixed section ranges--section-minutes: readable markdown section size (default: 10)--summary-compact-sections: max sections included in compact summary source (default: 8)--summary-compact-points: max representative points per section in compact summary source (default: 2)--summary-compact-chars: max characters per representative point in compact summary source (default: 180)--cookies/--proxy-url.--no-check-certificate only in restricted environments.Edit PDFs with natural-language instructions using the nano-pdf CLI.