Yifan Yang
Biography
I am a Ph.D. student at Shanghai Jiao Tong University (SJTU), a member of Cross Media (X-)Language Intelligence Lab (X-LANCE) in the Department of Computer Science and Engineering, supervised by Prof. Xie Chen, and under the leadership of Prof. Kai Yu. As the second Ph.D. student supervised by Prof. Chen, I am dedicating these 5 years to contribute to the speech recognition field.
I worked at Xiaomi AI lab as an algorithm engineer intern during my senior undergraduate year, developing the Next-gen Kaldi under the leadership of Daniel Povey.
My recent work focuses on optimizing key issues in end-to-end speech recognition. If you are also interested, please feel free to contact me.
Education
-
Ph.D., Computer Science and Technology, Shanghai Jiao Tong University, 2023.09-
-
B.E., Computer Science and Technology, Tianjin University, 2019.09-2023.07
GPA: 3.9/4.0, Rank: 1/139. [Transcript]
Experiences
-
Research Intern, Speech to Text Group, Microsoft Research Asia (MSRA), 2024.03-
Co-supervised by Shujie Liu and Jinyu Li.
-
Machine Learning Engineer Intern, The Next-gen Kaldi Team, Xiaomi AI Lab, 2022.11.01-2023.08.28
Investigate advanced and efficient open-source E2E Automatic Speech Recognition.
Develop the Next-gen Kaldi, including Icefall, Lhotse, k2.
Supervised by Daniel Povey.
News
-
[2024.03] I join the Speech to Text group in Microsoft Research Asia (MSRA).
-
[2024.01] Zipformer is accepted for oral presentation by ICLR2024. Congratulations!
-
[2023.12] 3 papers are accepted by ICASSP2024.
-
[2023.09] I start to pursue my Ph.D. at Shanghai Jiao Tong University.
-
[2023.06] I earn my Bachelor's degree in engineering with an excellent student title.
-
[2023.05] 2 papers are accepted by INTERSPEECH2023.
-
[2022.11] I join the Next-gen Kaldi team in Xiaomi.
-
[2022.06] I join X-LANCE.
Research
Publications
-
Zipformer: A faster and better encoder for automatic speech recognition
Zengwei Yao, Liyong Guo, Xiaoyu Yang, Wei Kang, Fangjun Kuang, Yifan Yang, Zengrui Jin, Long Lin, Daniel Povey
Proc. ICLR, 2024
-
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen
Proc. ICASSP, 2024
-
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context
Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Yifan Yang, Liyong Guo, Long Lin, Daniel Povey
Proc. ICASSP, 2024
-
PromptASR for contextualized ASR with controllable style
Xiaoyu Yang, Wei Kang, Zengwei Yao, Yifan Yang, Liyong Guo, Fangjun Kuang, Long Lin, Daniel Povey
Proc. ICASSP, 2024
-
Blank-regularized CTC for Frame Skipping in Neural Transducer
Yifan Yang, Xiaoyu Yang, Liyong Guo, Zengwei Yao, Wei Kang, Fangjun Kuang, Long Lin, Xie Chen, Daniel Povey
Proc. Interspeech, 2023
-
Delay-penalized CTC implemented based on Finite State Transducer
Zengwei Yao, Wei Kang, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Yifan Yang, Long Lin, Daniel Povey
Proc. Interspeech, 2023
Open-Source Projects
Competitions
-
7th in Track I of ICASSP2024 ICMC-ASR Grand Challenge, 2023.12
Awards
-
Chu Xin Scholarship, Tianjin University, 2022
-
Baosteel Scholarship, Baosteel Education Foundation, 2021
-
"Bingchang Zhuang" Scholarship, Tianjin University, 2020
Academic Service
-
[Reviewer] The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
-
[Reviewer] International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)
-
[Reviewer] ACL Rolling Review (ACL ARR 2023 October)
-
[Reviewer] The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)