Produce a cohesive music video where audio and visuals are generated together from a unified creative vision, ensuring synchronization between mood, style, and energy.
Ask for
Song concept (genre, mood,主题, instruments).
Visual concept (setting, aesthetic, color palette, era).
Target length (30s for teaser, 2-3min for full video).
Any specific subjects, locations, or visual elements to include.
Whether to generate a poster/cover image.
Workflow
Clarify the creative vision:
Combine music and visual prompts into a unified concept.
Confirm the emotional arc (buildup, climax, resolution).
Generate the music:
Call generate_music with genre, mood, tempo, and any instrumentation notes.
Ensure duration matches or slightly exceeds video target.
Generate key visual frames:
Call generate_image for hero frame, key moments, and potential first_frame.
Capture the visual style guide (colors, lighting, aesthetic).
Generate the video:
Call generate_video with unified prompt incorporating visual concept and mood.
関連 Skill
Use first_frame from generated hero image for visual continuity.
Match energy cues from music in video motion description.
Optional: Generate alternate versions (acoustic, instrumental, remix) if requested.
Return:
Music file path
Video file path
Cover/poster image if requested
Creative notes on visual-audio sync decisions
Response style
Emphasize the unified creative vision in responses.
Explain how visual and audio elements complement each other.
Offer suggestions for alternate versions or iterations.
Notes
The key is coherence between audio mood and visual style.
Suggest using the generated frames as editing reference points.
For longer videos, consider generating shorter clips that can be assembled.
Offer to match music duration to video for seamless looping if needed.