Biography

I am a Ph.D. student at Shanghai Jiao Tong University (SJTU), a member of Cross Media (X-)Language Intelligence Lab (X-LANCE) in the Department of Computer Science and Engineering, advised by Prof. Xie Chen, and under the leadership of Prof. Kai Yu. As the second Ph.D. student supervised by Prof. Chen, I am dedicating these 5 years to contribute to the spoken language processing field.

I worked at Xiaomi AI lab as an algorithm engineer intern during my senior undergraduate year, developing the Next-gen Kaldi under the leadership of Daniel Povey.

My recent work focuses on the following research topics. If you would like to discuss anything, please feel free to contact me.

  • Text-to-speech synthesis

  • Speech representation learning from continuous to discrete / Speech tokenization

  • Multilingual speech recognition

Education

  • Ph.D., Computer Science and Technology, Shanghai Jiao Tong University, 2023.09-Now

  • B.E., Computer Science and Technology, Tianjin University, 2019.09-2023.07

    GPA: 3.91/4.0, Rank: 1/139. [Transcript]

Experiences

  • Research Intern, Speech Team, Microsoft Research, 2024.03.05-Now

    Co-advised by Shujie Liu and Jinyu Li.

    Investigate advanced zero-shot text-to-speech synthesis and streaming text-to-speech synthesis.

  • Machine Learning Engineer Intern, Next-gen Kaldi Team, Xiaomi AI Lab, 2022.11.01-2023.08.28

    Investigate advanced and efficient open-source E2E Automatic Speech Recognition.

    Develop the Next-gen Kaldi, including Icefall, Lhotse, k2.

    Advised by Daniel Povey.

News

  • [2025.03] 1 paper is accepted by ICME 2025.

  • [2024.12] 1 paper is accepted by ICASSP 2025.

  • [2024.12] 1 paper is accepted by AAAI 2025.

  • [2024.06] 3 papers are accepted by Interspeech 2024.

  • [2024.03] I join the speech team in Microsoft Research.

  • [2024.01] Zipformer is accepted for oral presentation by ICLR 2024. Congratulations!

  • [2023.12] 3 papers are accepted by ICASSP 2024.

  • [2023.09] I start to pursue my Ph.D. at Shanghai Jiao Tong University.

  • [2023.06] I earn my Bachelor's degree in engineering with an excellent student title.

  • [2023.05] 2 papers are accepted by Interspeech 2023.

  • [2022.11] I join the Next-gen Kaldi team in Xiaomi.

  • [2022.06] I join X-LANCE lab in Shanghai Jiao Tong University.

Research

Selected Publications

Check out full publications on Google Scholar.

Efficient End-to-end Speech Recognition

Speech Representation Learning

Zero-Shot Text to Speech Synthesis

Speech Dataset

Open-Source Projects

Awards

  • Chu Xin Scholarship, Tianjin University, 2022

  • Baosteel Scholarship, Baosteel Education Foundation, 2021

  • "Bingchang Zhuang" Scholarship, Tianjin University, 2020

Academic Service

  • [Conference Reviewer] The Thirteenth International Conference on Learning Representations (ICLR 2025)

  • [Conference Reviewer] IEEE International Conference on Multimedia & Expo (ICME 2025)

  • [Conference Reviewer] International Conference on Computational Linguistics (COLING 2025, LREC-COLING 2024)

  • [Conference Reviewer] 2024 IEEE Spoken Language Technology Workshop (SLT 2024)

  • [Conference Reviewer] International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025, 2024)

  • [Conference Reviewer] ACL Rolling Review (ACL ARR 2025 February, 2024 December, 2024 October, 2024 June, 2023 October)

  • [Conference Reviewer] The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)

Activities

  • [Invited Talk] GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement, Nanyang Technological University (NTU), 2024.06

  • CS-BAOYAN Owner, the largest nonprofit CS postgraduate recommendation exchange platform in China, 2022.09-2023.09

Teaching Assistance

  • SJTU CS1501 Programming