SekoTalk

We are excited to introduce SekoTalk, an audio-driven digital human generation model.
Partnering deeply with LightX2V, SekoTalk requires only 4 NFEs for generation, leveraging the proven method behind Qwen-Image-Lightning and Wan2.2-Lightning. In addition to demonstrating impressive generalization across different visuals and sounds, SekoTalk leverages LightX2V inference with 8 H100 GPUs to generate 5 seconds of 480P video in nearly 5 seconds.
A free online generation trial is available, supporting audio clips up to 1 minutes. Enjoy making your character talk! (fast and effortlessly) 🚀

Lip-Sync

SekoTalk accurately synchronizes lip movements to audio input, handling speeds from normal to rap-level, and supports driving various body proportions, including portrait, half-body, and full-body.

Singing

SekoTalk excels across diverse vocal styles, accommodating genres like Peking Opera, rap, bel canto, lyrical, folk, and K-pop.

Long Video

SekoTalk explores best practices for reference image injection and temporal continuation, ensuring excellent ID consistency for stable video generation up to 15 minutes.

Multi-Style

SekoTalk demonstrates strong generalization across different image styles, supporting realistic photos, anime, animals, and even sketches.

Multi-Lingual

SekoTalk offers comprehensive language support, including English, French, Italian, Portuguese, Japanese, Korean, Mandarin, Cantonese, Hokkien, and other Chinese dialects.

Multi-Person

SekoTalk can handle multiple speakers in a scene, supporting sequential speaking (e.g., podcasts, mini-series) and simultaneous speaking (e.g., discussions, debates).

loading...

Prompt Control

SekoTalk allows for character motion control via prompts.

Potential Applications

SekoTalk can be applied to e-commerce live streaming, online education, virtual tourism, news broadcasting, virtual customer service, and more.

Contributors

SekoTalk is a collaborative effort of the following contributors:

Core Contributors

Models Lead · Xiangyu Fan

Audio Zhiqian Lin
Reference-to-Long-Video Tianxiang Ren
Distillation Zesong Qiu, Zhuguanyu Wu, Fanzhou Wang

Data Lead · Zhongang Cai

Acquisition & Management Chen Wei, Chenyang Gu, Shuang Yang, Zhiqian Lin, Wenjie Ye
Automated Annotation Wanqi Yin, Ruisi Wang, Yubo Wang, Chen Wei, Chenyang Gu, Zesong Qiu, Fanzhou Wang, Zhitao Yang, Tianxiang Ren
Visual Analytics Tools Zhengyu Lin, Zhitao Yang

Infrastructure (LightX2V) Lead · Ruihao Gong

Overall Architecture Yang Yong, Shiqiao Gu, Liang Liu
Quantization & Sparse Attention & Compile Yang Yong, Shiqiao Gu, Shankun Wang
Parallel Yingrui Wang, Shiqiao Gu, Dingyu Chen
Offload & Light VAE Shiqiao Gu
ComfyUI Peng Gao
Prompt Enhancement Jiaqi Li
MetaX & Ascend & Cambrican & DCU Deployment Qian Wang, Haiwen Fu, Yang Yong
Realtime System Liang Liu, Yusong Wang, Jian Hu, Xinyi Qin, Huiwen Yi, Zhiqian Lin, Yang Gao, Xuetong Xiang

Online Demo & Project Page Lead · Zhiqian Lin

Designer Huimuk Jang
Developers Peng Gao, Yang Gao, Xuetong Xiang, Zhitao Yang, Jiarui Xu, Shenyuan Luo, Xiuyuan Fu
Demo Zhiqian Lin, Zhengyu Lin

Corresponding Authors

Ruihao Gong
Quan Wang
Dahua Lin
Lei Yang