Veo 3.1 — Google's AI for generating video with audio

Cinematic clips with dialogue, sound effects, and music — from text or photos. No monthly subscription, from ~$0.25 per video.

No subscription — pay per useCard (Stripe) or cryptoAudio & dialogueUp to 4KFrom ~$0.25 per clip

Veo 3.1 is Google DeepMind's flagship video generator and Sora's main rival. Its signature feature is full audio: the model doesn't produce silent video — it generates a scene with dialogue, ambient sound, and music synchronized with the picture. Characters speak with lip sync, doors creak, rain patters — all from a single prompt.

Veo's visuals are the most film-like among video generators: staged lighting, deliberate camera work, realistic faces. It supports text-to-video, image-to-video, first/last-frame control, and — in Fast and Quality modes — generation from reference images (drop your character or object into a new scene).

Officially Veo 3.1 is available through the Gemini API at $0.15 per second (Fast) up to $0.40 per second (Quality). On NeuralBox the same Veo costs noticeably less than the official rate — an 8-second Fast clip runs about $0.50 versus $1.20 on the Gemini API. The price is fixed per clip and doesn't depend on duration (4, 6, or 8 seconds). No subscription: pay per use by card (Stripe) or crypto, tokens never expire, and one balance covers 300+ AI models.

Veo 3.1 modes on NeuralBox

Lite

7,500 tokens ≈ $0.25 (720p)

The cheapest way to try Veo: good quality for social media and drafts. 1080p costs slightly more.

Fast

15,000 tokens ≈ $0.50 (720p)

The sweet spot of price and quality: fast generation, audio, reference support. Officially this is $0.15/sec.

Quality

62,500 tokens ≈ $2.08 (720p)

Maximum picture and audio quality for production. Officially — $0.40/sec.

from 37,500 tokens ≈ $1.25

All three modes are available in 4K: Lite ≈ $1.25, Fast ≈ $1.50, Quality ≈ $3.08 per clip.

What Veo 3.1 can do

🔊

Audio & dialogue

Lip-synced speech, ambient sound, and music are generated together with the video — write the lines right into the prompt.

🎬

Cinematic look

Staged lighting, smooth camera moves, realistic faces — the most film-like picture among video generators.

🖼️

Photo to video

Bring an image to life or set the first and last frame of the clip — Veo builds the transition between them.

👤

References

In Fast and Quality modes you can pass reference images — your character or object lands in a new scene.

📺

Up to 4K

720p, 1080p, or 4K. Duration of 4, 6, or 8 seconds — same price either way.

⚙️

API

Veo 3.1 generation is available through the NeuralBox API for your apps and bots.

Prompt examples

Write the sounds and lines right into the prompt — Veo syncs them with the picture.

Scene with dialogue

Seaside café, a woman speaks to the camera: "Good morning! Today I'll show you three places you have to see", sound of waves, seagulls crying

Product ad

A barista pouring latte art into a white cup in close-up, the sound of pouring milk and soft background jazz, warm coffee-shop light

Nature

A drone flying over a misty autumn forest at dawn, forest sounds and birdsong, cinematic light

Animate a photo

Upload a photo: "The camera slowly pulls back, the person in the photo smiles and waves, city noise in the background"

Action

A racer drifting a sports car on a night track, screeching tires and roaring engine, neon lighting, side camera angle

First and last frame

Upload two images — Veo generates a smooth transition from the first to the second

Parameters and limits

Duration	4, 6, or 8 seconds (price does not depend on duration)
Resolution	720p / 1080p / 4K
Audio	yes — lip-synced dialogue, sound effects, music
Aspect ratios	16:9, 9:16
Modes	text → video, photo → video, first/last frame, references (Fast/Quality)
Quality	Lite / Fast / Quality
API access	yes — NeuralBox API

Veo 3.1 pricing

Mode	Our price	≈ USD	Gemini API (8 sec)
Lite 720p	7,500 tokens	≈ $0.25	≈ $0.24–0.40
Fast 720p	15,000 tokens	≈ $0.50	$1.20 ($0.15/sec)
Quality 720p	62,500 tokens	≈ $2.08	$3.20 ($0.40/sec)
Fast 4K	45,000 tokens	≈ $1.50	higher, premium tier
Quality 4K	92,500 tokens	≈ $3.08	higher, premium tier

Prices are per 720p clip of any duration (4–8 sec). Token rate shown for the Basic plan ($5 = 150,000 tokens); larger plans cut token cost by up to 40%. The official Gemini API bills per second.

An 8-second Veo Fast clip costs about $0.50 here — roughly 2.5× cheaper than the official Gemini API ($1.20) — with no subscription and no developer setup.

How to start

Top up your balance

Pay by card (Stripe) or crypto (USDT, BTC, ETH). No subscription, tokens never expire.

Generate

Video tab → Veo 3.1 → describe the scene with its sounds or upload a photo.

Try Veo 3.1

Frequently asked questions

What is Veo 3.1?

Google DeepMind's flagship video generation model. Its key difference from competitors is full audio: lip-synced dialogue, sound effects, and music are generated together with the picture from a single prompt.

Why is it cheaper than the Gemini API?

NeuralBox bills a fixed token price per clip instead of per-second API rates: a Veo Fast 8-second clip is about $0.50 here versus $1.20 on the official Gemini API. No subscription or API key setup needed — it works right in the browser.

How much does a video cost?

A 720p clip: Lite ≈ $0.25, Fast ≈ $0.50, Quality ≈ $2.08 — regardless of duration (4–8 sec). That's below official Gemini API prices: Fast 8 sec there costs $1.20.

Do the characters really speak?

Yes. Write the line in your prompt and the character delivers it with lip sync. Multiple languages are supported.

How is Veo different from Sora, Kling, and Grok Imagine?

Veo has the best audio and the most cinematic picture. Sora is strong in complex scenes, Kling in motion realism, Grok Imagine is the cheapest and goes up to 30 seconds. All of them are on NeuralBox.

Can I animate a photo?

Yes: upload an image and Veo turns it into a living scene with sound. You can also set the first and last frame of the clip.

How do I pay, and do tokens expire?

By card via Stripe or with crypto (USDT, BTC, ETH and more). Tokens never expire — no subscription, no monthly credit burn, one balance for 300+ AI models.

Is there an API?

Yes, Veo 3.1 is available through the NeuralBox API — docs at neuralbox.io/api.

Create a video with sound

Get started