Transcribe the most recent audio file in Downloads and create an org-mode note from it. Use when the user says "transcribe", "transcribe note", "transcribe audio", "audio to note", "voice note", or wants to turn a recording into a written note.
Finds the most recent audio file in ~/Downloads, transcribes it with whisperx, then uses the transcript to create a well-organized org-mode note.
$ARGUMENTS may contain a language code (e.g. en, es) and/or a specific filename. If a filename is given, use that file instead of the most recent one. If a language is given, pass it to whisperx with --language. If no language is given, let whisperx auto-detect.If a filename was given in the arguments, look for it in ~/Downloads/. Otherwise, find the most recent file by modification time matching these extensions: m4a, mp3, wav, ogg, flac, aac, opus, , , .
webmmp4wmafind ~/Downloads -maxdepth 1 -type f \( -name '*.m4a' -o -name '*.mp3' -o -name '*.wav' -o -name '*.ogg' -o -name '*.flac' -o -name '*.aac' -o -name '*.opus' -o -name '*.webm' -o -name '*.mp4' -o -name '*.wma' \) -print0 | xargs -0 ls -t 2>/dev/null | head -1
If no audio file is found, tell the user and stop.
Print the filename and file size so the user can confirm the right file was found.
Run whisperx with the large-v3 model and int8 compute type (required on Apple Silicon CPU), outputting plain text to a temporary directory:
TMPDIR=$(mktemp -d)
whisperx --model large-v3 --compute_type int8 --output_dir "$TMPDIR" --output_format txt [--language LANG] "AUDIO_FILE"
Read the resulting .txt file from $TMPDIR. Clean up the temporary directory afterwards.
If whisperx fails, do not fall back to whisper (it runs on CPU without ctranslate2 and is extremely slow). Instead, diagnose the failure: spawn a subagent to investigate the error, identify the root cause (e.g. compute type incompatibility, broken ffmpeg linkage, missing libraries), and fix it. Then retry whisperx. Only proceed to step 3 once whisperx succeeds.
Use AskUserQuestion to ask the user what the note should be titled. Show them the first ~200 words of the transcript so they have context.
Using the full transcript as source material, write a well-organized org-mode note. This is not a verbatim copy of the transcript. Instead:
** level, since the top-level * heading is the title).The note should follow the org-note format:
Slugify the title. Run emacsclient -e '(simple-extras-slugify "TITLE")' to get the slug. Use the result (minus surrounding quotes) as the filename, appending .org.
Write the file to /Users/pablostafforini/My Drive/notes/SLUG.org with this structure:
#+title: TITLE
* TITLE
RESTRUCTURED CONTENT HERE
Generate an org ID. Run:
emacsclient -e '(with-current-buffer (find-file-noselect "FILEPATH") (goto-char (point-min)) (org-next-visible-heading 1) (org-id-get-create))'
Tell the user the note has been saved, showing the file path and a brief summary of the sections created.