1.LatentSync: Taming Audio-Conditioned Latent Diffusion Models for Lip Sync with SyncNet Supervision 字节 2024
文章地址:https://arxiv.org/pdf/2412.09262
代码地址:https://github.com/bytedance/LatentSync 训练推理都有
2.wan2.2-s2v 阿里通义 20250826
文章:[2508.18621] Wan-S2V: Audio-Driven Cinematic Video Generation
代码:https://github.com/Wan-Video/Wan2.2 只有推理
3.Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation 中山大学an美团20250528
文章:https://arxiv.org/pdf/2505.22647
代码:https://github.com/MeiGen-AI/MultiTalk 只有推理
4.Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency
字节and浙大 支持Singing
文章:https://arxiv.org/pdf/2409.02634 ICLR2025
代码:只有demo Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency
5.EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
AAAI 2025 20240711 蚂蚁 Pretrained models with better sing performance to be released
项目EchoMimic: Lifelike Audio-Driven Portrait Animations
代码https://github.com/antgroup/echomimic 只有推理
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation. GitHub
20250227 CVPR 2025
EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation. GitHub 20250708
6.EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions 20250227 阿里 ECCV2024
项目EMO
EMO2: End-Effector Guided Audio-Driven Avatar Video Generation 20250118阿里
项目:EMO2。支持Singing
文章:[2501.10687] EMO2: End-Effector Guided Audio-Driven Avatar Video Generation
7.VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Microsoft Research Asia 20240416 NeurIPS 2024 (Oral)
项目:https://www.microsoft.com/en-us/research/project/vasa-1/
文章:[2404.10667] VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
8.FaceFormer: Speech-Driven 3D Facial Animation with Transformers, CVPR 2022.
文章:https://arxiv.org/pdf/2112.05329
代码:https://github.com/EvelynFan/FaceFormer?tab=readme-ov-file 有训练代码
9.SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers MimicMotion SkyReels Team, Skywork AI 20250601 支持唱歌
文章https://arxiv.org/pdf/2506.00830
代码https://skyworkai.github.io/skyreels-audio.github.io/ 仅推理
SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers
SkyReels-A2: Compose Anything in Video Diffusion Transformers
SkyReels-A3:Towards Ultra-Long Audio-Conditioned Video Generation
10.InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing
20250819 多家单位
文章[2508.14033] InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing
11.MusicFace: Music-driven expressive singing face synthesis
20240201 厦大 没开源
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10897677&tag=1[2508.14033] InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing
12.FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
阿里 MM2025
文章https://arxiv.org/pdf/2504.04842
代码 https://github.com/Fantasy-AMAP/fantasy-talking只有推理
13.HHunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters腾讯混元 20250603
文章https://arxiv.org/pdf/2505.20156
代码https://github.com/Tencent-Hunyuan/HunyuanVideo-Avatar 只有推理
14.DiffSynth-Studio
开源项目GitHub - modelscope/DiffSynth-Studio: Enjoy the magic of Diffusion models!
15.SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
CVPR2023
16.Speech2Vid
17.Wav2Lip
18.DeepFaceLive
19.Easy-Wav2
20.VideoReTalking
21.UniTalker: Conversational Speech-Visual Synthesis
20250807 MM2025
文章
代码https://github.com/AI-S2-Lab/UniTalker 没内容
数据集
1.VOCASET VOCA
2.BIWI dataset Biwi 3D Audiovisual Corpus of Affective Communication
3.Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset 2021CVPR 网易伏羲
4.MMhead MM2025 https://openreview.net/pdf?id=L99kOQk12i
专门唱歌
1.SingAvatar: High-fidelity Audio-driven Singing Avatar Synthesis
ICME2024
文章https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10687925
说是会开源,实际没开源
2.MusicFace: Music-driven Expressive Singing Face Synthesis 上面有 没开源
数据集
1.SingingHead: A Large-scale 4D Dataset for Singing Head Animation
20240714 上海交大 https://openreview.net/profile?id=~Sijing_Wu1
文章https://arxiv.org/pdf/2312.04369