Biography

I am a Ph.D. student at Shanghai Jiao Tong University (SJTU), a member of Cross Media (X-)Language Intelligence Lab (X-LANCE) in the Department of Computer Science and Engineering, supervised by Prof. Xie Chen, and under the leadership of Prof. Kai Yu. As the second Ph.D. student supervised by Prof. Chen, I am dedicating these 5 years to contribute to the speech processing field.

I worked at Xiaomi AI lab as an algorithm engineer intern during my senior undergraduate year, developing the Next-gen Kaldi under the leadership of Daniel Povey.

My recent work focuses on the following research topics. If you would like to discuss anything, please feel free to contact me.

  • Speech representation learning from continuous to discrete

  • Low-resource languages speech recognition with in-the-wild data

  • Optimizing key issues in end-to-end speech recognition

Education

  • Ph.D., Computer Science and Technology, Shanghai Jiao Tong University, 2023.09-

  • B.E., Computer Science and Technology, Tianjin University, 2019.09-2023.07

    GPA: 3.91/4.0, Rank: 1/139. [Transcript]

Experiences

  • Machine Learning Engineer Intern, The Next-gen Kaldi Team, Xiaomi AI Lab, 2022.11.01-2023.08.28

    Investigate advanced and efficient open-source E2E Automatic Speech Recognition.

    Develop the Next-gen Kaldi, including Icefall, Lhotse, k2.

    Supervised by Daniel Povey.

News

  • [2024.06] 3 papers are accepted by INTERSPEECH2024.

  • [2024.03] I join CS SPEECH & TRANSLATION group in Microsoft Research Asia (MSRA).

  • [2024.01] Zipformer is accepted for oral presentation by ICLR2024. Congratulations!

  • [2023.12] 3 papers are accepted by ICASSP2024.

  • [2023.09] I start to pursue my Ph.D. at Shanghai Jiao Tong University.

  • [2023.06] I earn my Bachelor's degree in engineering with an excellent student title.

  • [2023.05] 2 papers are accepted by INTERSPEECH2023.

  • [2022.11] I join the Next-gen Kaldi team in Xiaomi.

  • [2022.06] I join X-LANCE.

Research

Selected Publications

Check out full publications on Google Scholar.

Efficient End-to-end Speech Recognition

Speech Recognition Dataset

Discretized Speech Representation

Open-Source Projects

Competitions

Awards

  • Chu Xin Scholarship, Tianjin University, 2022

  • Baosteel Scholarship, Baosteel Education Foundation, 2021

  • "Bingchang Zhuang" Scholarship, Tianjin University, 2020

Academic Service

  • [Conference Reviewer] The Thirteenth International Conference on Learning Representations (ICLR 2025)

  • [Conference Reviewer] International Conference on Computational Linguistics (COLING 2025, LREC-COLING 2024)

  • [Conference Reviewer] 2024 IEEE Spoken Language Technology Workshop (SLT 2024)

  • [Conference Reviewer] International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025, 2024)

  • [Conference Reviewer] ACL Rolling Review (ACL ARR 2024 October, 2024 June, 2023 October)

  • [Conference Reviewer] The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)

Teaching Assistance

  • SJTU CS1501 Programming