Biography

Yifan Yang is a Ph.D. student at Shanghai Jiao Tong University (SJTU), a member of Cross Media (X-)Language Intelligence Lab (X-LANCE) in the Department of Computer Science and Engineering, under the supervision of Prof. Xie Chen, and the leadership of Prof. Kai Yu.

His research focuses on spoken language processing, spanning speech synthesis, speech recognition, speech representation learning, and speech interaction. He has published 10+ first-author papers at top-tier conferences (ACL, ACMMM, ICASSP, Interspeech, and ICME), and has received the Hunyuan Fellowship.

He was a core contributor to the Next-gen Kaldi project led by Dr. Daniel Povey, contributing to the open-source toolkit Icefall and Lhotse. He led the curation and release of the open-source speech dataset GigaSpeech 2, and was a core contributor to Libriheavy and LibriheavyMix.

Interests

  • Text-to-Speech Synthesis and Evaluation

  • Speech Representation Learning

  • Multilingual Speech Recognition

  • Speech Interaction: Dialogue Systems and Proactive Interaction

Education

  • Ph.D., Computer Science and Technology, Shanghai Jiao Tong University, 2023.09-now

  • B.E., Computer Science and Technology, Tianjin University, 2019.09-2023.07

    GPA: 3.91/4.0, Rank: 1/139. [Transcript]

Experiences

  • Research Intern, Qwen Omni Team, Alibaba, 2026.03.09-now

    Investigate advanced speech interaction.

    Advised by Dr. Jin Xu.

  • Research Intern, Hunyuan Speech Team, Tencent TEG, 2025.08.20-2026.03.06

    Investigate speech understanding for speaking style modeling and style-controllable text-to-speech.

    Co-advised by Dr. Long Zhou and Dr. Xu Tan.

  • Research Intern, VALL-E Team & CoreAI Speech, Microsoft, 2024.03.05-2025.08.10

    Investigate advanced language modeling for text-to-speech synthesis and streaming text-to-speech synthesis.

    Co-advised by Dr. Shujie Liu and Dr. Jinyu Li.

  • Machine Learning Engineer Intern, Next-gen Kaldi Team, Xiaomi AI Lab, 2022.11.01-2023.08.28

    Investigate advanced and efficient open-source end-to-end automatic speech recognition.

    Develop the Next-gen Kaldi, including Icefall, Lhotse, k2.

    Advised by Dr. Daniel Povey.

News

  • [2026.01] 1 paper is accepted by ICASSP 2026.

  • [2026.01] 1 paper is accepted by IEEE JSTSP (IF=13.6).

  • [2025.08] I join the Hunyuan speech team in Tencent.

  • [2025.08] 1 paper is accepted by IEEE SPL.

  • [2025.07] I am honored to be funded by the CIE-Tencent Doctoral Research Incentive Project.

  • [2025.07] 3 papers are accepted by ACMMM 2025.

  • [2025.05] 2 papers are accepted by Interspeech 2025.

  • [2025.05] 3 papers are accepted by ACL 2025 (2 Main, 1 Findings).

  • [2025.03] 1 paper is accepted by ICME 2025.

  • [2024.12] 1 paper is accepted by ICASSP 2025.

  • [2024.12] 1 paper is accepted by AAAI 2025.

  • [2024.06] 3 papers are accepted by Interspeech 2024.

  • [2024.03] I join the speech team in Microsoft

  • [2024.01] Zipformer is accepted for oral presentation by ICLR 2024. Congratulations!

  • [2023.12] 3 papers are accepted by ICASSP 2024.

  • [2023.09] I start to pursue my Ph.D. at Shanghai Jiao Tong University.

  • [2023.06] I earn my Bachelor's degree in engineering with an excellent student title.

  • [2023.05] 2 papers are accepted by Interspeech 2023.

  • [2022.11] I join the Next-gen Kaldi team in Xiaomi.

  • [2022.06] I join X-LANCE lab in Shanghai Jiao Tong University.

Research

Selected Publications

Check out full publications on Google Scholar.

Zero-Shot Text-to-Speech Synthesis and Evaluation

Speech Representation Learning

Speech Recognition

Open-Source Projects

Awards

Academic Service

Conference Reviewer

  • International Conference on Machine Learning (ICML 2026)

  • International Conference on Learning Representations (ICLR 2026, Notable Reviewer at 2025)

  • ACM International Conference on Multimedia (ACM MM 2026, 2025)

  • AAAI Conference on Artificial Intelligence (AAAI 2026)

  • ACL Rolling Review (ACL ARR 2026 January, 2025 October, 2025 May, 2025 February, 2024 December, 2024 October, 2024 June, 2023 October)

  • Conference on Neural Information Processing Systems (NeurIPS 2025)

  • International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2026, 2025, 2024)

  • Conference of the International Speech Communication Association (Interspeech 2026)

  • IEEE International Conference on Multimedia & Expo (ICME 2026, 2025)

  • IEEE Spoken Language Technology Workshop (SLT 2026, 2024)

  • IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2025)

  • International Conference on Computational Linguistics (COLING 2025, LREC-COLING 2024)

  • Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)

Journal Reviewer

  • IEEE Transactions on Audio, Speech and Language Processing (IEEE TASLP)

  • IEEE Open Journal of Signal Processing (IEEE OJSP)

Activities

  • [Invited Talk] Open-source Sharing of F5-TTS and GigaSpeech 2, ModelScope DevCon 2025, 2025.06

  • [Invited Talk] GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement, Nanyang Technological University (NTU), 2024.06

  • CS-BAOYAN Owner, the largest nonprofit CS postgraduate recommendation exchange platform in China, 2022.09-2023.09

Teaching Assistance

  • SJTU CS1501 Programming