Build an LLM-friendly workspace from long-form video or audio, including content from YouTube, Xiaoyuzhou, and local files. Use MM Harness when the user asks you to work on tasks involving long videos or audio recordings. MM Harness converts raw media into a structured workspace so you can understand, search, and process it more effectively.
Use MM Harness when the user gives you a long video or audio source and wants you to turn it into a form that is easier to read, search, summarize, and reuse.
MM Harness takes long-form media and converts it into a structured workspace on disk.
After it runs, you get an output directory that is much easier to work with than the original raw media. You can then use that workspace for follow-up tasks such as:
MM Harness is a preprocessing and structuring tool. It does not directly finish every downstream task for you.
Use MM Harness when:
Do not use MM Harness when the task is unrelated to long-form media processing.
Supported inputs:
Well-supported content types in the current release:
Do not rely on MM Harness for strongly visual content, such as:
If the input mainly depends on visual storytelling rather than spoken language, the current release may fail or produce an unsatisfactory result. In that case, tell the user that this version of MM Harness is intended for speech-led content.
Check installation first if needed:
mm-harness info
mm-harness doctor
Create from a URL:
mm-harness create \
--url "https://www.youtube.com/watch?v=..." \
--output-dir ./outputs
Create from a local video file:
mm-harness create \
--video-file ./input/example.mp4 \
--output-dir ./outputs
Create from a local audio file:
mm-harness create \
--audio-file ./input/example.wav \
--output-dir ./outputs
Create from a local subtitle file:
mm-harness create \
--subtitle-file ./input/example.srt \
--output-dir ./outputs
If the user gives you a specific output requirement, add it with --structure-request.
MM Harness writes a structured output directory to the user-provided --output-dir.
Use that output directory as the basis for the next step of work.
The result is not just a plain transcript. It is a structured workspace intended to make the original media easier for you to inspect and process.
export LLM_API_BASE_URL=...
export LLM_API_KEY=...
export GROQ_API_KEY=...
Optional YouTube cookies:
export YOUTUBE_COOKIES_FILE=/path/to/cookies.txt
# or
export YOUTUBE_COOKIES_FROM_BROWSER=chrome
If something fails:
mm-harness doctor firstGROQ_API_KEYffmpeg, yt-dlp, or deno, explain that to the user before continuing