Yifan Yang
Biography
I am a Ph.D. student at Shanghai Jiao Tong University (SJTU), a member of Cross Media (X-)Language Intelligence Lab (X-LANCE) in the Department of Computer Science and Engineering, advised by Prof. Xie Chen, and under the leadership of Prof. Kai Yu. As the second Ph.D. student supervised by Prof. Chen, I am dedicating these 5 years to contribute to the spoken language processing field.
I worked at Xiaomi AI lab as an algorithm engineer intern during my senior undergraduate year, developing the Next-gen Kaldi under the leadership of Daniel Povey.
My recent work focuses on the following research topics. If you would like to discuss anything, please feel free to contact me.
-
Text-to-speech synthesis
-
Speech representation learning from continuous to discrete / Speech tokenization
-
Multilingual speech recognition
Education
-
Ph.D., Computer Science and Technology, Shanghai Jiao Tong University, 2023.09-Now
-
B.E., Computer Science and Technology, Tianjin University, 2019.09-2023.07
GPA: 3.91/4.0, Rank: 1/139. [Transcript]
Experiences
-
Research Intern, Speech Team, Microsoft Research, 2024.03.05-Now
Co-advised by Shujie Liu and Jinyu Li.
Investigate advanced zero-shot text-to-speech synthesis and streaming text-to-speech synthesis.
-
Machine Learning Engineer Intern, Next-gen Kaldi Team, Xiaomi AI Lab, 2022.11.01-2023.08.28
Investigate advanced and efficient open-source E2E Automatic Speech Recognition.
Develop the Next-gen Kaldi, including Icefall, Lhotse, k2.
Advised by Daniel Povey.
News
-
[2025.03] 1 paper is accepted by ICME 2025.
-
[2024.12] 1 paper is accepted by ICASSP 2025.
-
[2024.12] 1 paper is accepted by AAAI 2025.
-
[2024.06] 3 papers are accepted by Interspeech 2024.
-
[2024.03] I join the speech team in Microsoft Research.
-
[2024.01] Zipformer is accepted for oral presentation by ICLR 2024. Congratulations!
-
[2023.12] 3 papers are accepted by ICASSP 2024.
-
[2023.09] I start to pursue my Ph.D. at Shanghai Jiao Tong University.
-
[2023.06] I earn my Bachelor's degree in engineering with an excellent student title.
-
[2023.05] 2 papers are accepted by Interspeech 2023.
-
[2022.11] I join the Next-gen Kaldi team in Xiaomi.
-
[2022.06] I join X-LANCE lab in Shanghai Jiao Tong University.
Research
Selected Publications
Check out full publications on Google Scholar.
Efficient End-to-end Speech Recognition
-
Zipformer: A Faster and Better Encoder for Automatic Speech Recognition
Zengwei Yao, Liyong Guo, Xiaoyu Yang, Wei Kang, Fangjun Kuang, Yifan Yang, Zengrui Jin, Long Lin, Daniel Povey
Oral in Proc. ICLR, 2024
-
Blank-regularized CTC for Frame Skipping in Neural Transducer
Yifan Yang, Xiaoyu Yang, Liyong Guo, Zengwei Yao, Wei Kang, Fangjun Kuang, Long Lin, Xie Chen, Daniel Povey
Proc. Interspeech, 2023
-
Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration
Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, Jiaming Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen
Oral in Proc. AAAI, 2025
Speech Representation Learning
-
k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning
Yifan Yang, Jianheng Zhuo, Zengrui Jin, Ziyang Ma, Xiaoyu Yang, Zengwei Yao, Liyong Guo, Wei Kang, Fangjun Kuang, Long Lin, Daniel Povey, Xie Chen
Proc. ICME, 2025
-
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen
Oral in Proc. ICASSP, 2024
Zero-Shot Text to Speech Synthesis
-
Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers
Yifan Yang, Ziyang Ma, Shujie Liu, Jinyu Li, Hui Wang, Lingwei Meng, Haiyang Sun, Yuzhe Liang, Ruiyang Xu, Yuxuan Hu, Yan Lu, Rui Zhao, Xie Chen
Preprint in arXiv, 2024
-
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
Chenpeng Du, Yiwei Guo, Hankun Wang, Yifan Yang, Zhikang Niu, Shuai Wang, Hui Zhang, Xie Chen, Kai Yu
Proc. ICASSP, 2025
Speech Dataset
-
Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, Jinpeng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen
Preprint in arXiv, 2024
GigaSpeech 2 powers Dolphin, which represents the state-of-the-art multilingual and multitask ASR model for Eastern languages.
GigaSpeech 2 powers Typhoon-Audio, which represents the state-of-the-art open-source audio language model for Thai tasks.
-
Zengrui Jin*, Yifan Yang*, Mohan Shi*, Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Lingwei Meng, Long Lin, Yong Xu, Shi-Xiong Zhang, Daniel Povey
Oral in Proc. Interspeech, 2024
[Dataset]
-
Libriheavy: A 50,000 hours ASR Corpus with Punctuation Casing and Context
Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Yifan Yang, Liyong Guo, Long Lin, Daniel Povey
Oral in Proc. ICASSP, 2024
Open-Source Projects
Awards
-
Chu Xin Scholarship, Tianjin University, 2022
-
Baosteel Scholarship, Baosteel Education Foundation, 2021
-
"Bingchang Zhuang" Scholarship, Tianjin University, 2020
Academic Service
-
[Conference Reviewer] The Thirteenth International Conference on Learning Representations (ICLR 2025)
-
[Conference Reviewer] IEEE International Conference on Multimedia & Expo (ICME 2025)
-
[Conference Reviewer] International Conference on Computational Linguistics (COLING 2025, LREC-COLING 2024)
-
[Conference Reviewer] 2024 IEEE Spoken Language Technology Workshop (SLT 2024)
-
[Conference Reviewer] International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025, 2024)
-
[Conference Reviewer] ACL Rolling Review (ACL ARR 2025 February, 2024 December, 2024 October, 2024 June, 2023 October)
-
[Conference Reviewer] The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)
Activities
-
[Invited Talk] GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement, Nanyang Technological University (NTU), 2024.06
-
CS-BAOYAN Owner, the largest nonprofit CS postgraduate recommendation exchange platform in China, 2022.09-2023.09
Teaching Assistance
-
SJTU CS1501 Programming