Autonomously remix an Instagram Reel for the Zero Proof sobriety app — download the video, analyze the original hook text, generate new hook copy and a conversion-focused CTA, get one user confirmation, then render the final video. Use when given an Instagram Reel URL and asked to remix it for Zero Proof. Produces a single H.264/AAC .mp4 saved to video-output/.
Turn any Instagram Reel meme into a Zero Proof branded video that drives downloads and ultimately paying customers. The hook adapts the meme's existing premise to the sobriety niche. The CTA drives conversion to the Zero Proof app.
Run steps 1–5 silently. At step 6, pause for one user confirmation. Then execute steps 7–13 without interruption. Only surface errors.
Detect OS and install missing deps accordingly:
_install() {
local pkg=$1
if [[ "$OSTYPE" == "darwin"* ]]; then
brew install "$pkg"
elif command -v apt-get &>/dev/null; then
sudo apt-get install -y "$pkg"
elif command -v dnf &>/dev/null; then
sudo dnf install -y "$pkg"
elif command -v pacman &>/dev/null; then
sudo pacman -S --noconfirm "$pkg"
else
echo "ERROR: Cannot auto-install $pkg — unsupported package manager. Install manually and retry." >&2
exit 1
fi
}
# yt-dlp (pip-based on Linux, brew on macOS)
if ! command -v yt-dlp &>/dev/null; then
if [[ "$OSTYPE" == "darwin"* ]]; then
brew install yt-dlp
else
pip3 install yt-dlp --user 2>/dev/null || pip install yt-dlp --user
fi
fi
command -v ffmpeg &>/dev/null || _install ffmpeg
command -v magick &>/dev/null || _install imagemagick
ffmpeg -filters 2>/dev/null | grep -q drawtext \
&& echo "drawtext=yes" || echo "drawtext=no (PNG overlay will be used)"
If any install fails, stop and surface the error.
WORK_DIR="$(pwd)/reel-remix-work"
mkdir -p "$WORK_DIR"
yt-dlp --no-playlist -o "$WORK_DIR/original.mp4" "<INSTAGRAM_URL>"
Verify it downloaded. If yt-dlp fails (geo-block, login-required, private), surface the error — do not continue.
ffprobe -v quiet \
-show_entries stream=codec_name,width,height,duration \
-show_entries format=duration \
-of default "$WORK_DIR/original.mp4"
Extract and store:
VIDEO_W — width (typically 720)VIDEO_H — height (typically 1280)AUDIO_DUR — format duration (the true total length; this is the master clock)Extract 3 probe frames and auto-detect where the video content starts (= height of the text zone):
# Extract frames at positions 0, 15, 30
ffmpeg -i "$WORK_DIR/original.mp4" \
-vf "select=eq(n\,0)+eq(n\,15)+eq(n\,30)" \
-vsync 0 "$WORK_DIR/probe_%d.png" -y 2>/dev/null
# For each probe frame, find the y-offset where non-background content begins
# Take the minimum across all frames (most conservative = safest full coverage)
TEXT_ZONE_H=9999
for f in "$WORK_DIR"/probe_*.png; do
OFFSET=$(magick "$f" -fuzz 8% -trim -format "%O" info: 2>/dev/null \
| grep -oE '\+[0-9]+$' | tr -d '+')
[ -n "$OFFSET" ] && [ "$OFFSET" -lt "$TEXT_ZONE_H" ] && TEXT_ZONE_H=$OFFSET
done
# Sanity check — if detection failed or video fills full frame, warn and stop
if [ "$TEXT_ZONE_H" -eq 9999 ] || [ "$TEXT_ZONE_H" -lt 50 ]; then
echo "ERROR: Could not detect text zone (TEXT_ZONE_H=$TEXT_ZONE_H). Inspect probe frames manually." >&2
exit 1
fi
# The trim offset is where the letterbox ENDS (below the caption text).
# The original caption text typically ends ~150px before this point.
# Add 150px buffer so the overlay fully covers the caption including descenders.
TEXT_ZONE_H=$(( TEXT_ZONE_H + 150 ))
echo "TEXT_ZONE_H=$TEXT_ZONE_H (with 150px buffer)"
Read $WORK_DIR/probe_1.png to visually confirm the text zone and identify:
HOOK_BG — background color of the text zone (e.g. black or white) — match exactlyHOOK_FG — foreground/text color (e.g. white or black) — match exactlyUsing the original hook text and meme context, generate 3 hook options for the user to choose from.
Rules for each option:
: if the video reveals the punchline visually)Example angles (use variety, don't repeat the same angle across 3 options):
Generate 3 CTA options (2–3 lines each). Each should:
Examples:
Generate:
FOLDER — {MM-DD-YY}-{topic-slug} (today's date + 2–3 word slug from meme topic)FILENAME — 3–4 keyword-rich SEO words, no generics. Think: what would someone search on Instagram?
sober-dance-freedom.mp4, alcohol-free-confidence.mp4reel.mp4, video1.mp4, final.mp4Present to the user in this exact format — do not render until confirmed:
PROPOSED REMIX — Zero Proof Reel
━━━ HOOK OPTIONS ━━━
[1] Line 1 / Line 2
[2] Line 1 / Line 2
[3] Line 1 / Line 2
━━━ CTA OPTIONS ━━━
[A] Full CTA text (2-3 lines)
[B] Full CTA text (2-3 lines)
[C] Full CTA text (2-3 lines)
━━━ OUTPUT ━━━
video-output/{FOLDER}/{FILENAME}
CTA duration: 3 seconds
Reply with hook # and CTA letter (e.g. "2A") or request changes.
Wait for the user's reply. Accept changes if requested, then confirm once more if the changes are substantial.
Use the chosen hook text. drawtext is unavailable — always use ImageMagick PNG overlay.
Resolve a font name that works with this ImageMagick build (use logical font names, NOT file paths):
# Use ImageMagick logical font names — file paths do NOT work with all builds
if [[ "$OSTYPE" == "darwin"* ]]; then
FONT="Helvetica" # always available on macOS via ImageMagick
else
# Try common logical names; fallback to empty (ImageMagick default)
magick -list font 2>/dev/null | grep -qi "DejaVu-Sans" && FONT="DejaVu-Sans" || FONT=""
fi
FONT_ARG=${FONT:+-font "$FONT"}
Split the hook into Line 1 and Line 2, then produce an overlay that fully covers the original text zone with the matching background color and text color:
Text is centered both horizontally and vertically within the detected text zone. Offsets are computed from TEXT_ZONE_H — no hardcoded values:
# Center lines within TEXT_ZONE_H (line spacing ~80px for pointsize 48)
LINE1_Y=$(( TEXT_ZONE_H / 2 - 40 ))
LINE2_Y=$(( TEXT_ZONE_H / 2 + 40 ))
magick -size ${VIDEO_W}x${TEXT_ZONE_H} xc:${HOOK_BG} \
$FONT_ARG \
-pointsize 48 \
-fill ${HOOK_FG} \
-draw "gravity North text 0,${LINE1_Y} '{{LINE_1}}'" \
-draw "gravity North text 0,${LINE2_Y} '{{LINE_2}}'" \
"$WORK_DIR/hook_overlay.png"
Read the PNG to verify text is centered within the zone and fully covers the original text before continuing.
Use black background with white text (matches the style of the hook zone). Place each line with -draw text at explicit vertical offsets centered in the frame:
# LINE_SPACING ~80px for pointsize 54; adjust if more/fewer lines
magick -size ${VIDEO_W}x${VIDEO_H} xc:black \
$FONT_ARG \
-pointsize 54 \
-fill white \
-draw "gravity Center text 0,-50 '{{CTA_LINE_1}}'" \
-draw "gravity Center text 0,50 '{{CTA_LINE_2}}'" \
"$WORK_DIR/cta_frame.png"
For 3-line CTAs add a third -draw at text 0,150. Read the PNG to verify text is centered before continuing.
ffmpeg -i "$WORK_DIR/original.mp4" \
-vn -c:a copy \
"$WORK_DIR/audio.aac" -y
Overlay the hook PNG on the full video — do not trim any content:
ffmpeg -i "$WORK_DIR/original.mp4" \
-i "$WORK_DIR/hook_overlay.png" \
-filter_complex "[0:v][1:v]overlay=0:0" \
-an -c:v libx264 -preset fast \
"$WORK_DIR/remixed_full.mp4" -y
CTA_DUR=3
ffmpeg -loop 1 -i "$WORK_DIR/cta_frame.png" \
-t ${CTA_DUR} -r 30 -an \
-c:v libx264 -preset fast -pix_fmt yuv420p \
"$WORK_DIR/cta_silent.mp4" -y
Full video plays completely, then CTA is appended. Audio is looped to cover the full duration (original + CTA), so the reel sounds continuous throughout.
cat > "$WORK_DIR/concat.txt" << EOF
file '$WORK_DIR/remixed_full.mp4'
file '$WORK_DIR/cta_silent.mp4'
EOF
ffmpeg -f concat -safe 0 -i "$WORK_DIR/concat.txt" \
-c:v libx264 -preset fast \
"$WORK_DIR/video_only.mp4" -y
# Loop the original audio to cover the full output duration (video + CTA)
TOTAL_DUR=$(ffprobe -v quiet -show_entries format=duration -of csv=p=0 "$WORK_DIR/video_only.mp4")
ffmpeg -stream_loop -1 -i "$WORK_DIR/audio.aac" \
-t "$TOTAL_DUR" \
-c:a aac \
"$WORK_DIR/audio_looped.m4a" -y
OUTPUT_DIR="$(pwd)/video-output/${FOLDER}"
mkdir -p "$OUTPUT_DIR"
ffmpeg -i "$WORK_DIR/video_only.mp4" \
-i "$WORK_DIR/audio_looped.m4a" \
-c:v copy -c:a copy -shortest \
"$OUTPUT_DIR/${FILENAME}" -y
Verify:
ffprobe -v quiet \
-show_entries format=duration,size \
-show_entries stream=codec_name,width,height,duration \
-of default "$OUTPUT_DIR/${FILENAME}"
Confirm: duration ≈ AUDIO_DUR + 3s (CTA appended), both video and audio streams match, h264 + aac.
Report the final output path to the user.
$(pwd)/reel-remix-work/; never use /tmp$(pwd)/video-output/{MM-DD-YY}-{topic}/{3-4-word-seo-keyword-filename}.mp4