SekoTalk

We are excited to introduce SekoTalk, an audio-driven digital human generation model.
Partnering deeply with LightX2V, SekoTalk requires only 4 NFEs for generation, leveraging the proven method behind Qwen-Image-Lightning and Wan2.2-Lightning. In addition to demonstrating impressive generalization across different visuals and sounds, SekoTalk leverages LightX2V inference with 8 H100 GPUs to generate 5 seconds of 480P video in nearly 5 seconds.
A free online generation trial is available, supporting audio clips up to 1 minutes. Enjoy making your character talk! (fast and effortlessly) 🚀

Lip-Sync

SekoTalk accurately synchronizes lip movements to audio input, handling speeds from normal to rap-level, and supports driving various body proportions, including portrait, half-body, and full-body.

Singing

SekoTalk excels across diverse vocal styles, accommodating genres like Peking Opera, rap, bel canto, lyrical, folk, and K-pop.

Long Video

SekoTalk explores best practices for reference image injection and temporal continuation, ensuring excellent ID consistency for stable video generation up to 15 minutes.

Multi-Style

SekoTalk demonstrates strong generalization across different image styles, supporting realistic photos, anime, animals, and even sketches.

Multi-Lingual

SekoTalk offers comprehensive language support, including English, French, Italian, Portuguese, Japanese, Korean, Mandarin, Cantonese, Hokkien, and other Chinese dialects.

Multi-Person

SekoTalk can handle multiple speakers in a scene, supporting sequential speaking (e.g., podcasts, mini-series) and simultaneous speaking (e.g., discussions, debates).

loading...

Prompt Control

SekoTalk allows for character motion control via prompts.

Potential Applications

SekoTalk can be applied to e-commerce live streaming, online education, virtual tourism, news broadcasting, virtual customer service, and more.