Yifan Yang
Biography
I am a Ph.D. student at Shanghai Jiao Tong University (SJTU), a member of Cross Media (X-)Language Intelligence Lab (X-LANCE) in the Department of Computer Science and Engineering, supervised by Prof. Xie Chen, and under the leadership of Prof. Kai Yu. As the second Ph.D. student supervised by Prof. Chen, I am dedicating these 5 years to contribute to the speech processing field.
I worked at Xiaomi AI lab as an algorithm engineer intern during my senior undergraduate year, developing the Next-gen Kaldi under the leadership of Daniel Povey.
My recent work focuses on the following research topics. If you would like to discuss anything, please feel free to contact me.
-
Speech representation learning from continuous to discrete
-
Low-resource languages speech recognition with in-the-wild data
-
Optimizing key issues in end-to-end speech recognition
Education
-
Ph.D., Computer Science and Technology, Shanghai Jiao Tong University, 2023.09-
-
B.E., Computer Science and Technology, Tianjin University, 2019.09-2023.07
GPA: 3.91/4.0, Rank: 1/139. [Transcript]
Experiences
-
Research Intern, CS SPEECH & TRANSLATION Group, Microsoft Research Asia (MSRA), 2024.03-
Co-supervised by Shujie Liu and Jinyu Li.
-
Machine Learning Engineer Intern, The Next-gen Kaldi Team, Xiaomi AI Lab, 2022.11.01-2023.08.28
Investigate advanced and efficient open-source E2E Automatic Speech Recognition.
Develop the Next-gen Kaldi, including Icefall, Lhotse, k2.
Supervised by Daniel Povey.
News
-
[2024.06] 3 papers are accepted by INTERSPEECH2024.
-
[2024.03] I join CS SPEECH & TRANSLATION group in Microsoft Research Asia (MSRA).
-
[2024.01] Zipformer is accepted for oral presentation by ICLR2024. Congratulations!
-
[2023.12] 3 papers are accepted by ICASSP2024.
-
[2023.09] I start to pursue my Ph.D. at Shanghai Jiao Tong University.
-
[2023.06] I earn my Bachelor's degree in engineering with an excellent student title.
-
[2023.05] 2 papers are accepted by INTERSPEECH2023.
-
[2022.11] I join the Next-gen Kaldi team in Xiaomi.
-
[2022.06] I join X-LANCE.
Research
Selected Publications
Check out full publications on Google Scholar.
Efficient End-to-end Speech Recognition
-
Zipformer: A faster and better encoder for automatic speech recognition
Zengwei Yao, Liyong Guo, Xiaoyu Yang, Wei Kang, Fangjun Kuang, Yifan Yang, Zengrui Jin, Long Lin, Daniel Povey
Oral in Proc. ICLR, 2024
-
Blank-regularized CTC for Frame Skipping in Neural Transducer
Yifan Yang, Xiaoyu Yang, Liyong Guo, Zengwei Yao, Wei Kang, Fangjun Kuang, Long Lin, Xie Chen, Daniel Povey
Proc. Interspeech, 2023
-
PromptASR for contextualized ASR with controllable style
Xiaoyu Yang, Wei Kang, Zengwei Yao, Yifan Yang, Liyong Guo, Fangjun Kuang, Long Lin, Daniel Povey
Oral in Proc. ICASSP, 2024
Speech Recognition Dataset
-
Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, Jinpeng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen
Preprint in arXiv, 2024
GigaSpeech 2 powers Typhoon-Audio, which represents the state-of-the-art open-source audio language model for Thai tasks.
-
Zengrui Jin*, Yifan Yang*, Mohan Shi*, Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Lingwei Meng, Long Lin, Yong Xu, Shi-Xiong Zhang, Daniel Povey
Oral in Proc. INTERSPEECH, 2024
[Dataset]
-
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Yifan Yang, Liyong Guo, Long Lin, Daniel Povey
Oral in Proc. ICASSP, 2024
Discretized Speech Representation
-
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen
Oral in Proc. ICASSP, 2024
Open-Source Projects
Competitions
-
Ranked 7/36 in ICASSP2024 ICMC-ASR Grand Challenge Track I, 2023.12
Awards
-
Chu Xin Scholarship, Tianjin University, 2022
-
Baosteel Scholarship, Baosteel Education Foundation, 2021
-
"Bingchang Zhuang" Scholarship, Tianjin University, 2020
Academic Service
-
[Conference Reviewer] The Thirteenth International Conference on Learning Representations (ICLR 2025)
-
[Conference Reviewer] International Conference on Computational Linguistics (COLING 2025, LREC-COLING 2024)
-
[Conference Reviewer] 2024 IEEE Spoken Language Technology Workshop (SLT 2024)
-
[Conference Reviewer] International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025, 2024)
-
[Conference Reviewer] ACL Rolling Review (ACL ARR 2024 October, 2024 June, 2023 October)
-
[Conference Reviewer] The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)
Teaching Assistance
-
SJTU CS1501 Programming