SekoTalk-1.0

SekoTalk Team

SekoTalk (Free Trial) LightX2V (Free Trial)

We are excited to introduce SekoTalk, an audio-driven digital human generation model.
Partnering deeply with LightX2V, SekoTalk requires only 4 NFEs for generation, leveraging the proven method behind Qwen-Image-Lightning and Wan2.2-Lightning. In addition to demonstrating impressive generalization across different visuals and sounds, SekoTalk leverages LightX2V inference with 8 H100 GPUs to generate 5 seconds of 480P video in nearly 5 seconds.
A free online generation trial is available, supporting audio clips up to 1 minutes. Enjoy making your character talk! (fast and effortlessly) 🚀

Lip-Sync

SekoTalk accurately synchronizes lip movements to audio input, handling speeds from normal to rap-level, and supports driving various body proportions, including portrait, half-body, and full-body.

Singing

SekoTalk excels across diverse vocal styles, accommodating genres like Peking Opera, rap, bel canto, lyrical, folk, and K-pop.

Long Video

SekoTalk explores best practices for reference image injection and temporal continuation, ensuring excellent ID consistency for stable video generation up to 15 minutes.

Multi-Style

SekoTalk demonstrates strong generalization across different image styles, supporting realistic photos, anime, animals, and even sketches.

Multi-Lingual

SekoTalk offers comprehensive language support, including English, French, Italian, Portuguese, Japanese, Korean, Mandarin, Cantonese, Hokkien, and other Chinese dialects.

Multi-Person

SekoTalk can handle multiple speakers in a scene, supporting sequential speaking (e.g., podcasts, mini-series) and simultaneous speaking (e.g., discussions, debates).

loading...

Prompt Control

SekoTalk allows for character motion control via prompts.

Potential Applications

SekoTalk can be applied to e-commerce live streaming, online education, virtual tourism, news broadcasting, virtual customer service, and more.

Contributors

SekoTalk is a collaborative effort of the following contributors:

Core Contributors
Models Lead · Xiangyu Fan
  • Audio Zhiqian Lin
  • Reference-to-Long-Video Tianxiang Ren
  • Distillation Zesong Qiu, Zhuguanyu Wu, Fanzhou Wang
Data Lead · Zhongang Cai
  • Acquisition & Management Chen Wei, Chenyang Gu, Shuang Yang, Zhiqian Lin, Wenjie Ye
  • Automated Annotation Wanqi Yin, Ruisi Wang, Yubo Wang, Chen Wei, Chenyang Gu, Zesong Qiu, Fanzhou Wang, Zhitao Yang, Tianxiang Ren
  • Visual Analytics Tools Zhengyu Lin, Zhitao Yang
Infrastructure (LightX2V) Lead · Ruihao Gong
  • Overall Architecture Yang Yong, Shiqiao Gu, Liang Liu
  • Quantization & Sparse Attention & Compile Yang Yong, Shiqiao Gu, Shankun Wang
  • Parallel Yingrui Wang, Shiqiao Gu, Dingyu Chen
  • Offload & Light VAE Shiqiao Gu
  • ComfyUI Peng Gao
  • Prompt Enhancement Jiaqi Li
  • MetaX & Ascend & Cambrican & DCU Deployment Qian Wang, Haiwen Fu, Yang Yong
  • Realtime System Liang Liu, Yusong Wang, Jian Hu, Xinyi Qin, Huiwen Yi, Zhiqian Lin, Yang Gao, Xuetong Xiang
Online Demo & Project Page Lead · Zhiqian Lin
  • Designer Huimuk Jang
  • Developers Peng Gao, Yang Gao, Xuetong Xiang, Zhitao Yang, Jiarui Xu, Shenyuan Luo, Xiuyuan Fu
  • Demo Zhiqian Lin, Zhengyu Lin
Corresponding Authors
  • Ruihao Gong
  • Quan Wang
  • Dahua Lin
  • Lei Yang