Single-stage speech enhancement pipeline — ffmpeg + ClearerVoice-Studio MossFormer2 GPU inference in one Modal container.

Pipeline code is bundled at ./denoise.py and ./src/. After npx skills add, runs from any directory.

Workflow

1. Prepare slug and identify files

Slug = task identifier (volume directory name). Use user-provided value, or generate denoise_YYYYMMDD_HHMMSS if none given.

Directory input? Scan for audio/video (.m4a, .mp3, .mp4, .wav, .flac, .ogg, .aac, .mov, .avi), list with index, ask user to confirm selection.

Specific files? Use directly, no listing needed.

2. Upload to volume

Ensure volume exists (idempotent):

Speech Denoise

Workflow

1. Prepare slug and identify files

2. Upload to volume

Speech Denoise

Workflow

1. Prepare slug and identify files

2. Upload to volume

3. Run pipeline

4. Download results

5. Clean up

6. Report

Setup

Error Handling

Songsee

Video Frames

Gifgrep

Qqbot Media

Camsnap

Openai Whisper Api