Name: Ad Video Create
Author: inclusionAI

Buscar habilidades.../

Ad Video Create | Skills Pool

Prepare input images:
- Product image (original or compressed if >50KB)
- Character image (from Phase 2 or user-provided)

Call image_generator with composition directive:

{
  "content": "Compose [character description] with [product description] in [environment setting]. 
              Requirements:
              - Only ONE character in the scene
              - Realistic home environment (floor, walls, natural lighting, plants, furniture)
              - Natural interaction between character and product
              - Professional product photography style",
  "info": {
    "image_urls": ["product.jpg", "character.jpg"],
    "size": "1328x1328",
    "guidance_scale": 4.5-5.0,
    "num_inference_steps": 30-35,
    "watermark": false,
    "output_path": "./composed_ad_image.png"
  }
}

Element	Meaning
Visual hooks (视觉因子)	Strong focal points, contrast, color, light, or composition that hold attention
Product presence (产品出现)	Clear establishment of the product in frame—viewer knows what is being advertised
Product / hero shots (产品镜头)	Dedicated beats where the product is the clear subject (center framing, readable silhouette)
Detail showcase (细节展示)	Close-ups or slow emphasis on materials, texture, craftsmanship, or key parts
Function / benefit expression (功能表达)	Motion that implies use, outcome, or core selling point (interaction, before/after feel, problem–solution rhythm)
Dynamic visuals (动态视觉)	Varied motion: camera (push, pan, subtle orbit), parallax, light shifts, or subject micro-movement—avoid one flat move for the whole clip

Generate video WITHOUT audio first via video_diffusion:

{
  "content": "Create dynamic advertisement video (mini-commercial pacing, ~10s):
              - Visual hooks: strong focal points, light/color contrast where fitting
              - Product presence: early establishment of the product in frame
              - Product hero shots: beats where the product is clearly the subject
              - Detail showcase: close-up or emphasis on texture/material/key parts
              - Function expression: motion suggesting use, benefit, or core value
              - Dynamic visuals: varied motion (camera push/pan/subtle orbit, parallax, light shifts, optional character micro-movements)
              - Professional commercial quality",
  "info": {
    "image_url": "./composed_ad_image.png",
    "resolution": "720p",
    "duration": 10,
    "fps": 24,
    "output_dir": "./",
    "sound": "off"
  }
}

Merge video with user's MP3 using FFmpeg:

ffmpeg -i generated_video.mp4 -i user_audio.mp3 -t 10 \
       -c:v copy -c:a aac -b:a 192k \
       -map 0:v:0 -map 1:a:0 -shortest \
       final_ad_video.mp4 -y

Call video_diffusion with audio generation enabled:

{
  "content": "Create dynamic advertisement video with suitable background music (mini-commercial pacing, ~10s):
              - Visual hooks; product presence; hero product shots; detail showcase; function/benefit expression; dynamic visuals (varied camera and motion)
              - AI-generated background music matching product mood
              - Professional commercial quality",
  "info": {
    "image_url": "./composed_ad_image.png",
    "resolution": "720p",
    "duration": 10,
    "fps": 24,
    "output_dir": "./",
    "sound": "on"
  }
}

Always check file size before reading images with media_comprehension

Compress if >50KB using PIL/Pillow:

from PIL import Image
img = Image.open(path)
if img.mode in ('RGBA', 'LA', 'P'):
    img = img.convert('RGB')
img.save(output_path, 'JPEG', quality=85, optimize=True)

Issue	Solution
Multiple characters appear in composition	Add explicit constraint in prompt: "ONLY ONE [character], no other characters"
Plain white background	Specify environment details: "in a modern living room with wooden floor, beige walls, natural window light"
Image file too large	Compress before analysis using provided Python script
Audio sync issues	Ensure `-shortest` flag in FFmpeg to trim to shortest stream
Video generation timeout	Use background task spawning for long operations

Input: cat_tower.jpg, calico_cat.jpg
→ Compose: Cat on tower in cozy living room
→ Video: 10s with gentle camera pan + user's "Cat Republic.mp3"
Output: final_ad_video.mp4

Input: modern_sofa.jpg
→ Generate: Lifestyle character reading on sofa
→ Compose: Character + sofa in bright apartment
→ Video: 10s with AI-generated ambient music
Output: final_ad_video.mp4

Input: wireless_earbuds.jpg
→ Generate: Hands holding earbuds
→ Compose: Hands + earbuds on minimalist desk
→ Video: 10s with AI-generated tech music
Output: final_ad_video.mp4

Ad Video Create

Workflow Architecture

Phase 1: Asset Preparation & Analysis

Phase 2: Character Generation (Conditional)

Ad Video Create

Workflow Architecture

Phase 1: Asset Preparation & Analysis

Phase 2: Character Generation (Conditional)

Phase 3: Image Composition with Environment

Phase 4: Video Generation

Case A: User-Provided Audio (MP3 exists in directory)

Case B: No User Audio (Generate with AI audio)

Best Practices

Image Compression

Prompt Engineering for Composition

Video Motion Guidelines

Audio Integration

Error Handling

Common Issues & Solutions

Generalization Notes

Adaptability Across Product Categories

Scalability Considerations

Example Use Cases

Use Case 1: Pet Product (With Character Image)

Use Case 2: Furniture (No Character Image)

Use Case 3: Tech Gadget (No Character, No Audio)

Technical Requirements

Dependencies

File Naming Conventions

Conclusion

Article Writing

Article Writing

Content Engine

Brand Voice

Article Writing

Article Writing