We are excited to introduce SekoTalk, an audio-driven digital
human generation model.
Partnering deeply with
LightX2V,
SekoTalk requires only 4 NFEs for generation, leveraging the proven method behind
Qwen-Image-Lightning
and
Wan2.2-Lightning.
In addition to demonstrating impressive generalization across different visuals and sounds,
SekoTalk leverages LightX2V inference with 8 H100 GPUs to generate 5 seconds of 480P video in nearly 5 seconds.
A free online generation trial
is available, supporting audio clips up to 1 minutes.
Enjoy making your character talk! (fast and effortlessly) 🚀
Lip-Sync
SekoTalk accurately synchronizes lip movements to audio input, handling speeds from normal to rap-level, and supports driving various body proportions, including portrait, half-body, and full-body.
Singing
SekoTalk excels across diverse vocal styles, accommodating genres like Peking Opera, rap, bel canto, lyrical, folk, and K-pop.
Long Video
SekoTalk explores best practices for reference image injection and temporal continuation, ensuring excellent ID consistency for stable video generation up to 15 minutes.
Multi-Style
SekoTalk demonstrates strong generalization across different image styles, supporting realistic photos, anime, animals, and even sketches.
Multi-Lingual
SekoTalk offers comprehensive language support, including English, French, Italian, Portuguese, Japanese, Korean, Mandarin, Cantonese, Hokkien, and other Chinese dialects.
Multi-Person
SekoTalk can handle multiple speakers in a scene, supporting sequential speaking (e.g., podcasts, mini-series) and simultaneous speaking (e.g., discussions, debates).
Prompt Control
SekoTalk allows for character motion control via prompts.
Potential Applications
SekoTalk can be applied to e-commerce live streaming, online education, virtual tourism, news broadcasting, virtual customer service, and more.
Contributors
SekoTalk is a collaborative effort of the following contributors:
- Audio Zhiqian Lin
- Reference-to-Long-Video Tianxiang Ren
- Distillation Zesong Qiu, Zhuguanyu Wu, Fanzhou Wang
- Acquisition & Management Chen Wei, Chenyang Gu, Shuang Yang, Zhiqian Lin, Wenjie Ye
- Automated Annotation Wanqi Yin, Ruisi Wang, Yubo Wang, Chen Wei, Chenyang Gu, Zesong Qiu, Fanzhou Wang, Zhitao Yang, Tianxiang Ren
- Visual Analytics Tools Zhengyu Lin, Zhitao Yang
- Overall Architecture Yang Yong, Shiqiao Gu, Liang Liu
- Quantization & Sparse Attention & Compile Yang Yong, Shiqiao Gu, Shankun Wang
- Parallel Yingrui Wang, Shiqiao Gu, Dingyu Chen
- Offload & Light VAE Shiqiao Gu
- ComfyUI Peng Gao
- Prompt Enhancement Jiaqi Li
- MetaX & Ascend & Cambrican & DCU Deployment Qian Wang, Haiwen Fu, Yang Yong
- Realtime System Liang Liu, Yusong Wang, Jian Hu, Xinyi Qin, Huiwen Yi, Zhiqian Lin, Yang Gao, Xuetong Xiang
- Designer Huimuk Jang
- Developers Peng Gao, Yang Gao, Xuetong Xiang, Zhitao Yang, Jiarui Xu, Shenyuan Luo, Xiuyuan Fu
- Demo Zhiqian Lin, Zhengyu Lin
- Ruihao Gong
- Quan Wang
- Dahua Lin
- Lei Yang