
No animation software. No rigging. No timeline. You upload one flat cartoon drawing, give it a line, and DomoAI handles the rest — Image to Video or Character to Video drives the walk, Talking Avatar syncs the mouth to your voice or script. Two generations later, a static character is moving and speaking — no keyframes required.
Most people who want a 2D character to walk and talk hit the same wall: traditional animation. Rigging means building a skeleton, weighting joints, and posing a walk cycle frame by frame in After Effects or Toon Boom. That's days of work before the mouth even opens. Then lip sync is a whole second project, timing visemes to every syllable by hand.
So the drawing just sits there. The gap between "I have a character" and "my character is alive" feels huge, and the usual advice is to go learn an animation suite first. Skip that. You bring the art and the words; the model generates the motion and the lip sync. The real skill is taste — picking the right pose, the right line, the right take — not software.
It's three moves: get the character walking, give it a voice, then assemble. You can run these in either order depending on the shot.
Use one clean, full-body image — a single character, simple background, visible limbs. A side or three-quarter pose walks better than a flat front-on shot, because it gives the model a clear silhouette to drive. Flat 2D art, thick-line cartoons, and your own drawings all work. No character yet? Make one with GEN Image / Text to Image: describe it, pick an anime or Fusion model, and upscale the result before you animate it.
Two ways to move your character — pick by whether you have a walk reference.
Image to Video (Animate) — no reference needed. Send the still into Image to Video and describe the walk in a motion prompt. It generates the walk cycle straight from your art, so it's the fastest path when you just want the character moving and have no footage to copy. Best for simple walk cycles and cartoon bounce.
Character to Video — copy a real walk. Send the character image into Character to Video with a walking reference clip; it copies that gait onto your art and preserves it exactly. Turn on Subject Only so it isolates and drives your character cleanly against the background. Each generation runs up to 30 seconds. Best when you want a specific, natural performance transferred.
Pick Image to Video when you have no reference and want a quick walk from the prompt; pick Character to Video when you have a reference clip and want its exact gait.
Now send the same character into Talking Avatar to make it speak. Type a script for text-to-speech, paste a Suno link, or upload your own MP3/WAV — it lip-syncs the mouth to whatever audio you feed it, hitting 90%+ accuracy on clear input. Add a separate action prompt (smile, nod, wave) so the delivery reads as a performance, not a floating talking head. Standard durations are 5/10/20s; 30s and 60s fast mode are on the Pro plan.
Render, review, and regenerate if the gait or mouth timing drifts. Need a wide walking shot and a close-up talking shot? Render the move with Character to Video and the talk with Talking Avatar, then cut them together. Clips run short, so stitch your takes in CapCut, Premiere Pro, or DaVinci Resolve for a longer scene.
Say your character is Pip, a round-faced courier in a yellow raincoat, drawn full-body in three-quarter view on a plain grey background. Here's a clean end-to-end pass:
No reference clip? Animate Pip's walk straight from the still with Image to Video:
Image to Video (Animate) — 9:16, 10s. Pip, the round-faced courier in a yellow raincoat, walking forward at a steady, bouncy cartoon pace, full body in frame, three-quarter view. Arms and legs swing in a clean loop, raincoat hem and shoulder bag bouncing with each step, feet landing in rhythm. Camera tracks alongside at a steady distance, no shake. Flat 2D cartoon style, plain grey background, even lighting. Keep Pip's design, colors, and proportions consistent — no limb warping, no extra limbs, no background drift, no text or watermark.
Two generations, one cut, and Pip walks up and delivers his line. Swap the script and you've got a whole conversation.
Viggle is the tool most people reach for to move a character from a reference — it's well known for character animation and motion transfer. If raw motion is all you care about, it's a fair comparison, and a strong one.
DomoAI takes a different approach: it's all-in-one. Character to Video handles the walk, Talking Avatar handles lip sync, and Text to Speech generates the voice — all in the same workspace, no second tool for the mouth and no third for the audio. For a character that needs to walk and talk, keeping the lip sync and voice in one pipeline saves a couple of round-trips between apps. Pick Viggle if you only need motion; pick DomoAI if you need motion, a voice, and a synced mouth in one place.
For atmosphere and wide establishing shots — rain on the street, a slow push down the hallway — render them with Seedance 2.0 and cut them in around your character beats. That's the difference between a talking clip and a scene.
Do I need any animation experience?
No. The model generates the walk cycle and the lip sync. You upload art, pick a movement, add a line, and generate — no rigging, keyframes, or timeline.
What image works best?
One clean, full-body character on a simple background, in a side or three-quarter pose with limbs visible. Flat 2D art, thick-line cartoons, and your own drawings all work.
Walk and talk in one clip, or stitch?
Character to Video renders up to 30 seconds per generation. For a longer walk-and-talk scene, render the walk and the talk separately and stitch the clips in CapCut, Premiere Pro, or DaVinci Resolve.
Why is the lip sync slightly off?
Long lines and noisy audio cause drift. Break dialogue into short beats, use clean source audio, and regenerate — clear input lands 90%+ lip-sync accuracy.
Can I use my own voice?
Yes. Upload an MP3 or WAV and the mouth syncs to it. You can also clone a voice or type a line for text-to-speech.
How much does it cost?
Paid plans start at $6.99/month Basic, $19.59 Standard, and $48.99 Pro (billed yearly). Standard and Pro add Relax Mode for credit-free generation. See pricing for current rates.
Upload your character and generate a walk-and-talk clip with DomoAI today. For more scene builds, see a three-character animated sitcom scene, a style-locked animated show intro, and a sakura romance music video.
Make every scene
worth sharing.