Ziyang Ma (马子阳)

Currently I am within the Joint Ph.D. Programme of Shanghai Jiao Tong University (SJTU) and Nanyang Technological University (NTU), co-supervised by Prof. Xie Chen from SJTU and Prof. Chng Eng Siong from NTU. I am also a member in Cross Media (X-) Language Intelligence Lab (X-LANCE), working closely with Prof. Kai Yu. As the first Ph.D. supervised by Prof. Chen, I will try my best in the next five exciting years! 💪

My research usually follows the KISS philosophy. I have published 10+ first author papers at top-tier conferences (NeurIPS, ICLR, ACL, AAAI, ICASSP, Interspeech, ASRU, etc.), nominated Best Student Paper Shortlist at Interspeech 2023, and Best Student Paper Award at Nanyang Speech Technology Forum (NYSF) 2025.

I am seeking full-time positions. If you are interested, please feel free to contact me!

Education

Interests

NEWS

[2026.4] 1 paper was accpeted by IEEE TASLP.
[2026.4] 4 papers were accpeted by ACL 2026.
[2026.3] 🔥 We released Qwen3.5-Omni, try the audio-visual captioning/coding/interaction! [Blog][HuggingFace Offline Mode][HuggingFace Online Mode][ModelScope Offline Mode][ModelScope Online Mode]
[2026.3] We released Omni-Cloze, a new fine-grained audio-visual captioning benchmark.[Omni-Cloze Dataset][Omni-Captioner Paper][GitHub]
[2026.2] Final results of the Interspeech 2026 Audio Reasoning Challenge are now available on the leaderboard page. Check out our challenge report and MMAR-Rubrics data & code.
[2026.1] 4 papers were accpeted by ICLR 2026.
[2026.1] 3 papers were accpeted by ICASSP 2026.
[2026.1] SLAM-LLM was accpeted by IEEE JSTSP (IF=13.6).
[2025.12] Interspeech 2026 Audio Reasoning Challenge is open for registration now!
[2025.11] emotion2vec+ large has hit the 50 million downloads milestone on ModelScope!
[2025.10] We released Omni-Captioner Techinical Report, key technique in Qwen3-Omni-Captioner.[GitHub][Techinical Report][HuggingFace][ModelScope]
[2025.9] We released Qwen3-Omni series, including -Instruct, -Thinking, and -Captioner.[GitHub][Techinical Report][HuggingFace][ModelScope]
[2025.9] 2 papers were accpeted by NeurIPS 2025.
[2025.8] 2 papers were accpeted by EMNLP 2025.
[2025.8] MuQ was accpeted by IEEE TASLP.
[2025.8] Audio-CoT was accpeted by IEEE ASRU 2025.
[2025.7] 1 paper was accpeted by INTERSPEECH 2025 MLC-SLM Workshop.
[2025.7] EmoVoice was accpeted by ACM Multimedia 2025.
[2025.5] Check out our MMAR, a new benchmark designed to evaluate the deep reasoning capabilities of Audio-Language Models (ALMs).[arXiv][Demo][GitHub][Benchmark]
[2025.5] 1 papers was accpeted by ISCA INTERSPEECH 2025.
[2025.5] 5 papers were accpeted by ACL 2025.
[2025.4] 1 paper was accpeted by IEEE TASLP.
[2025.3] 2 papers were accpeted by ICME 2025.
[2025.3] Check out our Spark-TTS (along with BiCodec and VoxBox dataset), a LLM-based controllable TTS with both voice cooing and generation abilities.
[2025.1] Check out our Audio-CoT, the first work to explore chain-of-thought reasoning in large audio language model (LALM).
[2025.1] Full reproduction (including all data preparation, model training, inference and checkpoints) for SLAM-Omni has been supported!
[2025.1] MUPT was accpeted by ICLR 2025.
[2025.1] LSLM, SLAM-ASR and ELLA-V have been selected for Oral presentation at AAAI2025.
[2024.12] 3 papers were accpeted by ICASSP 2025.
[2024.12] 4 papers were accpeted by AAAI 2025.
[2024.10] Check out our SLAM-AAC, a new member of SLAM-LLM family with SOTA audio captioning performance.
[2024.10] 1 paper was accpeted by IEEE TASLP.
[2024.10] Check out our F5-TTS, a bilingual DiT-based TTS model with flow-matching!
[2024.8] 1 paper was accpeted by IEEE TMM.
[2024.8] 2 papers were accpeted by IEEE SLT 2024.
[2024.7] Chinese Tiny LLM was accepted by the 1st Conference on Language Modeling (COLM).
[2024.7] MER24 Baseline Paper was accpeted by MRAC24 Workshop@ACM Multimedia.
[2024.7] Check out FunAudioLLM family, including a speech understanding model SenseVoice and a speech generation model CosyVoice.
[2024.6] We organize Speech Processing in LLM Era @ISCSLP 2024 Special Session which has been open for submission.
[2024.6] 4 papers were accpeted by ISCA INTERSPEECH 2024.
[2024.5] SLAM-LLM, a toolkit focusing on speech, language, audio, music processing with LLM, has been released!
[2024.5] emotion2vec and ChatMusician were accepted by ACL 2024 Findings.
[2024.5] BAT was accepted by ICML 2024.
[2024.4] MER24 Challenge@IJCAI and MRAC24 Workshop@ACM Multimedia are coming! [Baseline Paper][Baseline Code][Challenge Homepage]
[2024.4] EAT was accepted by IJCAI 2024.
[2024.3] We won the 1st place in Categorical Emotion Recognition at Odyssey 2024 Emotion Recognition Challenge.[Technical Report]
[2024.1] Check out our Repo for EAT, a new audio representation model with both effectiveness and efficiency.
[2023.12] Check out our Repo for emotion2vec, the first universal speech emotion representation model.
[2023.12] 4 papers were accpeted by IEEE ICASSP 2024.
[2023.9] Check out our Repo for Fast-HuBERT. We accelerate HuBERT pre-training in 5.2X speedup without performance drop.
[2023.9] 2 papers were accpeted by IEEE ASRU 2023.
[2023.8] MT4SSL was nominated in ISCA Interspeech Best Student Paper Shortlist.
[2023.5] 4 papers were accpeted by ISCA INTERSPEECH 2023.
[2023.2] 2 papers were accpeted by IEEE ICASSP 2023.
[2022.11] Check out our Repo for MT4SSL, a multi-task learning framework for self-supervised learning.
[2022.09] We won 3rd place in Avatar Track of AIWIN, held by WAIC2022.[Report][Invited Talk]

Ziyang Ma (马子阳)

Biography

Education

Interests

NEWS

Research

Selected Publications

Experiences

Academic Service

Open-Source Projects

Projects

Accomplishments

Awards

Competitions

Activities