Veo 3 Prompting Guide: The First AI Video Model That Hears
guide
8 min read

Veo 3 Prompting Guide: The First AI Video Model That Hears

F

Film Fun Academy

February 22, 2026

Veo 3 generates video AND synchronized audio from one prompt — dialogue, ambient sound, music, and foley. Here's how to direct both picture and sound.

Veo 3 from Google DeepMind generates synchronized audio-visual content from a single prompt. Dialogue, ambient sound, foley, and music are all part of the output — not layered on afterward. This changes how you write prompts.

Developer: Google DeepMind Available on Splice: Yes — splice.film.fun (as Google Veo 3 Fast) Resolution: 1080p with audio Aspect ratios: 16:9 (landscape), 9:16 (vertical) Duration: 8 seconds Features: Reference Images, Advanced settings, native audio generation (dialogue, sound, music)


Prompt Structure

Veo 3 prompts are production instructions covering both picture and sound. Include visual direction AND audio direction in every prompt.

The 7-Element Framework

ElementWhat to IncludeExample
1. SubjectWho/what, with physical details"A woman in her 30s with short dark hair in a linen shirt"
2. ContextWhere, when, conditions"Cobblestone café terrace, late afternoon"
3. ActionWhat happens"She sets her coffee down and leans forward"
4. StyleVisual aesthetic"Warm indie film tone, shallow depth of field"
5. CameraShot type, movement, composition"Medium shot, slow push-in"
6. AmbianceMood, lighting"Golden hour backlight, muted earth tones"
7. AudioSound, dialogue, musicAmbient sound, dialogue in quotes, music description

The Golden Rules

  • Always prompt audio — If you skip it, Veo guesses (and often guesses wrong — unwanted studio audience laughter is common)
  • Dialogue after a colon, not in quotesHe says: My name is Ben reduces subtitle generation
  • Keep dialogue short — Under 10 words per line. More = unnaturally fast speech
  • "No music" is valid — Pure environmental sound is often more powerful
  • Be specific about style — "Documentary realism" and "commercial" produce very different results
  • Change prompts for variety — Unlike other models, same prompt = very similar result across seeds

Prompt Examples

Example 1: Dialogue Scene

Medium shot, cozy kitchen. A mother and daughter sit at a breakfast 
table. Morning sunlight streams through gauze curtains. The mother 
pours coffee and says: You're going to be great today. The daughter 
smiles and replies: Thanks, mom. Clinking dishes, birds outside. 
Warm indie film tone. (no subtitles)

Example 2: Action with Sound Design

Low angle tracking shot. A motorcycle roars down a rain-soaked 
highway at night. Tires hiss on wet asphalt. Engine growl builds 
as it accelerates. Red taillights blur in the rain. Thunder rumbles 
in the distance. Dark, moody, cinematic.

Example 3: Atmospheric Nature

Wide aerial shot slowly descending over a misty forest at dawn. 
Fog threads between redwood trees. A river catches the first 
golden light. Wind through canopy, distant waterfall, single bird 
call echoing. No music. Documentary realism.

Example 4: Selfie Video

A selfie video of a travel blogger exploring a bustling Tokyo 
street market. She's wearing a vintage denim jacket, excitement 
in her eyes. Afternoon sun creates shadows between vendor stalls. 
She samples street food while talking, occasionally glancing at 
camera then turning to point at stalls. Slightly grainy, film-like. 
She says: Okay, you have to try this place when you visit Tokyo. 
The takoyaki here is absolutely incredible. (no subtitles)

Selfie tip: Start with "A selfie video of..." and make the arm visible for authenticity.

Example 5: Musical Performance

Close-up of a street musician's fingers on guitar strings. 
Flamenco style, fast rhythmic strumming. Camera slowly pulls back 
to reveal him on a stone step in a Spanish courtyard. Afternoon 
light, long shadows. Guitar music fills the space, echoing off 
stone walls. Passersby pause to listen.

Example 6: Commercial Product Shot

Slow motion close-up of coffee being poured into a white cup. 
Steam rises in golden morning light. Rich dark liquid swirls. 
Sound of pouring, soft ceramic clink as cup settles on saucer. 
Warm, premium. Shallow depth of field, macro quality.

Voice and Dialogue

Writing Dialogue

  • Use colon format: He says: My name is Ben (not quotes — reduces subtitle generation)
  • Keep lines short — What can be said in 8 seconds. Too many words = unnaturally fast
  • Too few words = AI gibberish — Give enough for the model to fill the time naturally
  • Implicit works too: "A guy introduces himself" — Veo decides the words
  • Spell names phonetically: "foh-fur" not "fofr" for correct pronunciation
  • Specify who speaks: "The woman in pink says: ..." / "The man with glasses replies: ..."

Avoiding Subtitles

Veo often bakes in subtitles. Three fixes:

  1. Use colon format for speech (not quotes)
  2. Add (no subtitles) to the prompt
  3. Repeat if persistent: No subtitles. No subtitles!

The Unwanted Studio Audience

Veo hallucinates live studio audience laughter if you don't specify ambient audio. Always describe the soundscape you want:

❌ "A standup comic tells a joke at a festival"
✅ "A standup comic tells a joke at a festival. Sounds of 
   distant bands, noisy crowd, ambient background of a busy 
   festival field. (no studio audience)"

Audio Prompting

✅ Do

  • Tie sounds to visible actions: "She sets the glass down with a clink"
  • Use spatial cues: "Distant thunder," "footsteps from behind camera"
  • Specify absence: "No music, only natural sound"
  • Name instruments: "Solo cello" beats "music plays"
  • Describe mood: "Ominous low drone," "playful piano melody"

❌ Don't

  • Describe a full soundtrack — sounds will compete
  • Layer more than 3-4 audio elements — they muddy
  • Use song titles or artist names — won't work
  • Skip audio direction — you'll get random ambient noise

Audio Techniques

Silence as a tool:

A crowded restaurant full of chatter. Everything goes quiet. 
A single glass falls and shatters.

Off-screen audio:

Footsteps approaching from behind the camera.

Sync points:

A blacksmith hammers red-hot metal. Each strike sends sparks. 
Clang of metal rings with each impact.

Reference Images

Veo 3 on Splice supports Reference Images — upload images to guide the generation.

Style Preservation

Feed any image (cartoon, painting, photograph) and Veo 3 maintains the visual style:

Keep the style the same

That's often enough. For more control:

The man runs through wild shrubbery. He says to his microphone: 
This is Echo 1, I'm being pursued. Camera swivels to reveal 
jungle terrain. Maintain the animation style of the original 
image. (no subtitles)

Image-to-Video Strategy

Generate your perfect still with an image model, then animate with Veo 3. This offloads style decisions to the image step:

Make him run!

Simple motion prompts work when the reference image carries the style.

Selective Animation

Animate only part of the image:

Rotate the shoe, keep everything else still.

Creates cinematic cinemagraph effects — one element moves, rest stays frozen.


Character Consistency (Without Reference Images)

Veo 3 is unusually consistent across seeds — same prompt often gives identical clothing, earrings, even room layout. Leverage this:

  • Create character description sheets with exact wording
  • Reuse the description verbatim across prompts
  • The more unique the description, the better consistency
John, a man in his 40s with short brown hair, wearing a blue 
jacket and glasses, looking thoughtful

Use this exact string in every prompt featuring John.

Note: Different seeds with the same prompt give similar (not varied) results. Change the prompt for variety.


Style Transfer

Veo 3 knows many visual styles. Prefix with In the style of [style]::

Proven styles: LEGO, Claymation, South Park, Pixar animation, 8-bit retro, Graphic novel, Origami, Simpsons, Blueprint, Anime, Marble

Style affects motion too — claymation characters move jerkily, Pixar characters move smoothly.


What Veo 3 Excels At

StrengthDetails
Native audio-visual syncDialogue, foley, ambient, music — all synchronized to the visual
Style transfer12+ visual styles that transform motion as well as look
Character consistencySame prompt = remarkably consistent character across seeds
Selfie videosSurprisingly realistic first-person handheld footage
Reference image preservationMaintains artistic style, color grading, and visual identity from input images

What to Avoid

AvoidWhyDo This Instead
Skipping audio directionRandom ambient noise, unwanted laughterAlways describe the soundscape
Long dialogueMore than ~10 words = too fastKeep lines short, under 10 words
Same prompt for varietyVeo 3 gives very similar results per promptChange the prompt itself
Dialogue in quotesTriggers subtitle generationUse colon format: says:
MonologuesCan't fit in 8 seconds1-2 short exchanges maximum
No style specifiedDefaults to generic "well-produced live action"Name the style explicitly

Using Veo 3 on Splice

On Splice, Veo 3 is available as Google Veo 3 Fast with these settings:

SettingOptions
Resolution1080p with audio
Aspect ratio16:9 (landscape), 9:16 (vertical)
Duration8 seconds
Reference ImagesToggle on to upload reference images
AdvancedAdditional generation settings

Choosing Your Aspect Ratio

RatioUse Case
16:9Cinematic widescreen — films, YouTube, presentations, most content
9:16Vertical — TikTok, Instagram Reels, Stories, selfie videos

Working with 8 Seconds

8 seconds is your canvas. Plan for it:

  • One scene, one moment — Don't try to fit a whole story
  • 1-2 dialogue exchanges maximum — More gets rushed
  • One camera movement — Dolly in OR pan, not both
  • Audio fills the time — Even when visual action is minimal, ambient sound keeps it alive
  • Build longer sequences by generating multiple 8s clips and editing them together in Splice

Common Mistakes

❌ Ignoring audio entirely

Bad: "A dog runs through flowers."
Good: "A golden retriever bounds through wildflowers. Panting, 
paws rustling grass. Distant birdsong. Gentle breeze. Joyful."

❌ Running the same prompt for variety

Unlike other models, Veo 3 produces very similar results across seeds. Change the prompt itself for different outputs.

❌ Dialogue in quotation marks

Bad: He says "My name is Ben"
Good: He says: My name is Ben

Colon format significantly reduces unwanted subtitle generation.

❌ No ambient audio specified

Bad: "A comedian tells jokes on stage"
Good: "A comedian tells jokes on a festival stage. Distant 
music from other stages, crowd murmur, outdoor breeze. 
(no studio audience)"

Pro Tips

  1. Write for picture AND sound — Every prompt needs audio direction
  2. Colon format for dialoguesays: not says "..." — kills subtitles
  3. "No music" is powerful — Pure environmental sound often beats a score
  4. Specify ambient audio — Or risk hallucinated studio audience laughter
  5. Reference images for style control — Generate perfect stills, then animate
  6. Selective animation creates cinemagraphs — "Rotate the shoe, keep everything else still"
  7. Character sheets for consistency — Same exact description string across prompts
  8. Change prompts for variety — Rerolling same prompt won't give different results
  9. Spell names phonetically — For correct pronunciation in dialogue
  10. Selfie videos work — "A selfie video of..." with visible arm unlocks the format

*Ready to put these techniques into practice? Try Splice — film.fun's AI Creator Studio. Generate video, edit in the browser, and bring your stories to life.

📬 Enjoyed this? Get weekly AI filmmaking tips

Join thousands of creators getting guides like this delivered to their inbox every week.