Tech
New AI videos: now with sound and smoother motion

Turn product, food, fashion, and home photos into videos with real sound and smoother, more natural motion. In the app, or at catalog scale through the API.
Until now, videos made in the Claid app came out silent. Motion, yes. Sound, no. From today, image to video generates real, synced sound together with the picture: effects, ambience, even spoken lines, from a single photo. The motion looks more natural and holds together better, too. And it works the same whether you animate one photo in the app or thousands of listings through the API.
What changed
- Clips now carry real, synced sound. A steak sears, a can cracks open, a zipper zips, a person says your line. You decide; quiet stays the default.
- Smoother, more natural motion. Movement has more real-world weight and clips hold together better frame to frame.
- The Prompt Assistant you already use now writes sound direction, not just motion (more on that below).
- Everything else stays put: same credits per video, same flow in the app, same API endpoint. API users: here's an updated documentation.
The rest of this post is mostly examples, because sound is the kind of thing you show.
You can hear the quality
Nobody can pick up your product online. Sound is the closest thing shoppers get to touch: it carries weight, texture, freshness. A zipper that sounds solid reads as a jacket that's made well. A pour that crackles over ice reads as cold. Your customers scroll with the sound on; now your clips can meet them there.
Cosmetics
Food & drinks
Fashion
Home & furniture
Any product
๐ก Each prompt above is the simple input you can type and the Prompt Assistant will expand it into full sound and motion direction before the clip renders.
It talks, too
Type a line and the person in your photo says it, lips synced, from a single still.
๐ก Speech is the newest and most variable piece here. Spoken lines vary between takes, so plan on a few generations to land the one you'll use.
The assistant learned sound
If you've made videos in Claid before, you know the flow: type a plain description and the Prompt Assistant turns it into a proper video prompt. What's new is that it now directs audio too. Name a sound and it's in the clip, synced.
Recipes, with the exact words to type
Here are some templates to start with. Each line is the whole input: type it, get the clip.
1. The menu tile (restaurants, delivery listings)
Start from your best dish photo.
Type: "Slow close-up push-in. Steam rises off the plate, steady sizzle, like it just left the kitchen."
A static menu photo becomes a loop that makes people hungry. One render per listing.
2. The pour (drinks, CPG)
Type: "Cola pours over ice into a glass. Fizz, ice crackle, condensation building."
Three seconds of satisfying, made for sound-on feeds.
3. The click (beauty, skincare)
Type: "Macro shot. A finger presses the pump twice, two soft clicks, serum lands on a glass tray."
Product ASMR: the click is the proof of touch.
4. The fabric (fashion)
Type: "Macro on the jacket zipper. One smooth pull down, crisp zipper sound, fabric shifts and settles."
Or on-model: "She turns once toward the camera, the coat swings and swishes, soft fabric sound."
Pairs with AI Fashion Models: generate the on-model still, then bring it to life.
5. The soft close (furniture, home goods)
Type: "Slow close-up on the dresser. The drawer glides shut and lands with a soft, cushioned click. Quiet room."
That click is what good hardware sounds like. Works for any listing where build quality is the sell.
6. The line (testimonial-style, founder clips)
Type: "She looks into the camera, casual and a little hushed, and says: okay, I did not think this would do anything, but my skin actually calmed down in like a week."
Make two or three takes and pick the best read.
A few habits that pay off
- Start from a clean, sharp photo. The clip inherits everything from the source, including its flaws. Enhance the photo first if it needs it.
- Your clip follows the photo's aspect ratio: vertical in, vertical out.
- Want music? Add your own track in the edit. Ask the clip for the real sounds only: the engine is reliable on effects, ambience, and speech, not soundtracks.
- For a hero clip, generate a couple of takes. Variation between takes is normal.
Quiet by default
How sound actually works, because it matters if you publish at volume: the engine generates an audio track with every clip; that's what keeps picture and sound in sync. When you don't ask for sound, Claid Prompt Assistant directs that track to near-silence, so quiet clips stay quiet. When you describe a sound, you get it.
Need guaranteed silence rather than near-silence? Strip the audio track after download; there's a one-line recipe in the API docs. For pipelines that must deliver silent video at scale, guaranteed-silent output is part of enterprise contracts.
Need this at scale?
If you call the Claid API, your integration already does all of this: same endpoint, image in, MP4 out. Two things to know: clips now include an audio track (near-silent unless the prompt asks for sound), and if your pipeline expects fully silent files, use the mute recipe in the docs.
For marketplaces, aggregators, and large retailers, the math is the point: a static listing photo becomes a sounded loop, one render per listing, thousands of listings per run. And it isn't only small goods: the same one-render-per-listing works across a diverse catalog, a sofa with a soft-close drawer, an appliance, a vehicle. Custom pipelines go further: localized talking variants of one asset for different markets, custom motion types, guaranteed-silent delivery where a destination requires it. Talk to us about custom solutions.
What it's not great at (yet)
So you spend credits on the right things:
- Voices are convincing, not studio-grade. The read, accent, and occasionally a word shift between takes. No voice selection or cloning.
- Keep motion simple: one continuous move per clip. Complex hand actions and fast choreography are where AI video still trips.
- Music on demand isn't reliable. Ask for real sounds; add music in post-production.
- Clips render at 720p, which covers feeds, product pages, and menu tiles. Need delivery above that? Upscaled output is something we handle in custom pipelines.
- Takes vary. For anything hero, generate a few and pick.
Turn the sound on
Open a product photo, describe what it should sound like, and see what comes back.
๐ Make a video with sound ยท ๐ Updated docs for API users
Need this at scale?
Process thousands of images via API, or let our team handle it for you.

Claid.ai
June 19, 2026