👋 About me

I am currently a second-year master’s student at Tsinghua University, based in Shenzhen.

I am now working on Text-to-audio generation, Video-to-audio generation research. If you would like to have an academic discussion or cooperation, please feel free to email me at liaoh22@mails.tsinghua.edu.cn.

My research interests include:

Applications: Audio Generation
Technologies: Generative Model, Multimodel Understanding and Learning, RLHF

🔥 News

2025.06: One Paper of a two-stage compositional 3D generation is accepted by ICCV 2025.
2025.02: One Paper of aligning text-to-motion model with RLAIF is accepted by CVPR 2025.
2024.12: One Paper of Video-to-Audio generation is accepted by ICASSP 2025.
2024.05: I join Tencent AI lab as a research intern.
2024.04: One Paper of Text-to-Audio system finetuned from human preference feedback is accepted by IJCAI 2024.
2024.03: One Paper of controllable Text-to-Audio generation is accepted by ICME 2024.
2023.05: I join Huawei 2012 lab as a research intern.

📝 Publications

🎙 Audio Generation

IJCAI 2024

BATON: Aligning Text-to-Audio Model with Human Preference Feedback
Huan Liao^★, Haonan Han^★, Kai Yang, Tianjiao Du, Rui Yang, Zunnan Xu, Qinmei Xu, Jingquan Liu, Jiasheng Lu, Xiu Li^†

[Project] [Paper] [Dataset&Code]

The first text-to-audio (TTA) system finetuned from human preference feedback.
Curated a dataset containing both prompts and the corresponding generated audio, annotated based on human feedback.
Addressed the audio event semantic omission and temporal disarray with a weighted preference strategy

ICME 2024

Controllable Text-to-Audio Generation with Training-Free Temporal Guidance Diffusion
Tianjiao Du, Jun Chen, Jiasheng Lu, Qinmei Xu, Huan Liao, Yupeng Chen, Zhiyong Wu^†

[Paper]

Training-free approach for controllable TTA generation based on the location and duration of corresponding sound events.

ICASSP 2025

Rhythmic Foley: A Framework for Seamless Audio-Visual Alignment in Video-to-Audio Synthesis
Zhiqi Huang^★ Dan Luo^★ Jun Wang^† Huan Liao Zhiheng Li^† Zhiyong Wu^†

[Project] [Paper]

An innovative framework for video-to-audio synthesis, characterized by semantic integrity and precise beat point synchronization.

🧙 3D/Motion Generation

ICCV 2025

REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment
Haonan Han^★, Rui Yang^★, Huan Liao^★, Jiankai Xing, Zunnan Xu, Xiaoming Yu, Junwei Zha, Xiu Li^†, Wanhua Li^†

[Project] [Paper] [Code]

A novel approach for compositional 3D asset generation from single images.

CVPR 2025

AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward
Haonan Han^★, Xiangzuo Wu^★, Huan Liao^★, Zunnan Xu, Ronghui Li, Yachao Zhang^†, Xiu Li^†

[Project] [Paper] [Code]

Enhances the event-level alignment between generated motion and text prompts by leveraging reward from GPT-4Vision.

🎖 Honors and Awards

2024.12 Second Class Scholarship at Tsinghua University
2023.10 Second Class Scholarship at Tsinghua Shenzhen International Graduate School
2023.08 Tsinghua & Huawei - Information and Media Technology Outstanding Practice Project
2022.06 Outstanding Graduate and Outstanding College Student Party Members of Hunan Province
2021.10 National Scholarship (Top 1%)

📖 Educations

2022.09 - 2025.06, Master, Tsinghua University, Beijing.

💻 Internships

2024.05 - 2024.07, Tencent AI lab, Shenzhen.
2023.05 - 2024.03, Huawei 2012 lab, Shenzhen.

🎵 Music Backgroud

Guzheng (Chinese instrumental exam-Grade 9)
Vice president of the 100-member music club

📚 Class

Digital Processing of Speech Signals (A)
Introduction to Statistical Learning Theory (A-)

Thanks so much for RayeRen’s open-sourced template version AcadHomepage .

Huan Liao (廖欢)