📝 Publications

🎙 Audio Generation

IJCAI 2024
sym

BATON: Aligning Text-to-Audio Model with Human Preference Feedback
Huan Liao, Haonan Han, Kai Yang, Tianjiao Du, Rui Yang, Zunnan Xu, Qinmei Xu, Jingquan Liu, Jiasheng Lu, Xiu Li

[Project] [Paper] [Dataset&Code]

  • The first text-to-audio (TTA) system finetuned from human preference feedback.
  • Curated a dataset containing both prompts and the corresponding generated audio, annotated based on human feedback.
  • Addressed the audio event semantic omission and temporal disarray with a weighted preference strategy
ICME 2024
sym

Controllable Text-to-Audio Generation with Training-Free Temporal Guidance Diffusion
Tianjiao Du, Jun Chen, Jiasheng Lu, Qinmei Xu, Huan Liao, Yupeng Chen, Zhiyong Wu

[Paper]

  • Training-free approach for controllable TTA generation based on the location and duration of corresponding sound events.
ICASSP 2025
sym

Rhythmic Foley: A Framework for Seamless Audio-Visual Alignment in Video-to-Audio Synthesis
Zhiqi Huang Dan Luo Jun Wang Huan Liao Zhiheng Li Zhiyong Wu

[Project] [Paper]

  • An innovative framework for video-to-audio synthesis, characterized by semantic integrity and precise beat point synchronization.

🧙 3D/Motion Generation

ICCV 2025
sym

REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment
Haonan Han, Rui Yang, Huan Liao, Jiankai Xing, Zunnan Xu, Xiaoming Yu, Junwei Zha, Xiu Li, Wanhua Li

[Project] [Paper] [Code]

  • A novel approach for compositional 3D asset generation from single images.
CVPR 2025
sym

AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward
Haonan Han, Xiangzuo Wu, Huan Liao, Zunnan Xu, Ronghui Li, Yachao Zhang, Xiu Li

[Project] [Paper] [Code]

  • Enhances the event-level alignment between generated motion and text prompts by leveraging reward from GPT-4Vision.