Ziyang Ma (马子阳)

Currently I am within the Joint Ph.D. Programme of Shanghai Jiao Tong University (SJTU) and Nanyang Technological University (NTU), co-supervised by Prof. Xie Chen from SJTU and Prof. Chng Eng Siong from NTU. I am also a member in Cross Media (X-) Language Intelligence Lab (X-LANCE), working closely with Prof. Kai Yu. As the first Ph.D. supervised by Prof. Chen, I will try my best in the next five exciting years! 💪

My research usually follows the KISS philosophy. My recent work focuses on speech, language, audio and music processing with Self-Supervised Learning (SSL) and Large Language Model (LLM). If you are also interested, please feel free to contact me.

Education

Interests

NEWS

[2025.5] Check out our MMAR, a new benchmark designed to evaluate the deep reasoning capabilities of Audio-Language Models (ALMs).[arXiv][Demo][GitHub][Benchmark]
[2025.5] 1 papers was accpeted by ISCA INTERSPEECH2025.
[2025.5] 5 papers were accpeted by ACL2025.
[2025.4] 1 paper was accpeted by IEEE TASLP.
[2025.3] 2 papers were accpeted by ICME2025.
[2025.3] 🔥 Check out our Spark-TTS (along with BiCodec and VoxBox dataset), a LLM-based controllable TTS with both voice cooing and generation abilities.
[2025.1] Check out our Audio-CoT, the first work to explore chain-of-thought reasoning in large audio language model (LALM).
[2025.1] Full reproduction (including all data preparation, model training, inference and checkpoints) for SLAM-Omni has been supported!
[2025.1] MUPT was accpeted by ICLR2025.
[2025.1] LSLM, SLAM-ASR and ELLA-V have been selected for Oral presentation at AAAI2025.
[2024.12] 3 papers were accpeted by ICASSP2025.
[2024.12] 4 papers were accpeted by AAAI2025.
[2024.10] Check out our SLAM-AAC, a new member of SLAM-LLM family with SOTA audio captioning performance.
[2024.10] 1 paper was accpeted by IEEE TASLP.
[2024.10] Check out our F5-TTS, a bilingual DiT-based TTS model with flow-matching!
[2024.8] 1 paper was accpeted by IEEE TMM.
[2024.8] 2 papers were accpeted by IEEE SLT2024.
[2024.7] Chinese Tiny LLM was accepted by the 1st Conference on Language Modeling (COLM).
[2024.7] MER24 Baseline Paper was accpeted by MRAC24 Workshop@ACM Multimedia.
[2024.7] Check out FunAudioLLM family, including a speech understanding model SenseVoice and a speech generation model CosyVoice.
[2024.6] We organize Speech Processing in LLM Era @ISCSLP 2024 Special Session which has been open for submission.
[2024.6] 4 papers were accpeted by ISCA INTERSPEECH2024.
[2024.5] SLAM-LLM, a toolkit focusing on speech, language, audio, music processing with LLM, has been released!
[2024.5] emotion2vec and ChatMusician were accepted by ACL 2024 Findings.
[2024.5] BAT was accepted by ICML 2024.
[2024.4] MER24 Challenge@IJCAI and MRAC24 Workshop@ACM Multimedia are coming! [Baseline Paper][Baseline Code][Challenge Homepage]
[2024.4] EAT was accepted by IJCAI 2024.
[2024.3] We won the 1st place in Categorical Emotion Recognition at Odyssey 2024 Emotion Recognition Challenge.[Technical Report]
[2024.1] Check out our Repo for EAT, a new audio representation model with both effectiveness and efficiency.
[2023.12] Check out our Repo for emotion2vec, the first universal speech emotion representation model.
[2023.12] 4 papers were accpeted by IEEE ICASSP2024.
[2023.9] Check out our Repo for Fast-HuBERT. We accelerate HuBERT pre-training in 5.2X speedup without performance drop.
[2023.9] 2 papers were accpeted by IEEE ASRU2023.
[2023.8] MT4SSL was nominated in ISCA Interspeech Best Student Paper Shortlist.
[2023.5] 4 papers were accpeted by ISCA INTERSPEECH2023.
[2023.2] 2 papers were accpeted by IEEE ICASSP2023.
[2022.11] Check out our Repo for MT4SSL, a multi-task learning framework for self-supervised learning.
[2022.09] We won 3rd place in Avatar Track of AIWIN, held by WAIC2022.[Report][Invited Talk]

Ziyang Ma (马子阳)

Biography

Education

Interests

NEWS

Research

Selected Publications

Experiences

Academic Service

Open-Source Projects

Projects

Accomplishments

Awards

Competitions

Activities