Skip to content
View AaronZ345's full-sized avatar
🎯
Focusing. I may be slow to respond.
🎯
Focusing. I may be slow to respond.

Block or report AaronZ345

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
AaronZ345/README.md

Hi there 👋

I am Yu Zhang (张彧). Now, I am a Research Scientist at ByteDance. If you are seeking any form of academic cooperation, please feel free to email me at aaron9834@icloud.com.

I earned my PhD in the College of Computer Science and Technology, Zhejiang University (浙江大学计算机科学与技术学院), under the supervision of Prof. Zhou Zhao (赵洲). Previously, I graduated from Chu Kochen Honors College, Zhejiang University (浙江大学竺可桢学院), with dual bachelor's degrees in Computer Science and Automation. I have also served as a visiting scholar at University of Rochester with Prof. Zhiyao Duan and University of Massachusetts Amherst with Prof. Przemyslaw Grabowicz.

My research interests primarily focus on Multi-Modal Generative AI, specifically in Spatial Audio, Music, Singing Voice, and Speech. I have published 10+ first-author papers at top international AI conferences, such as NeurIPS, ACL, and AAAI.

📎 Homepages

📝 First-Author Publications

*denotes co-first authors

🔊 Spatial Audio

🎼 Music

🎙️ Singing Voice

💬 Speech

Pinned Loading

  1. ISDrama ISDrama Public

    Dataset and evaluation code of ISDrama(ACM-MM 2025): Immersive Spatial Drama Generation through Multimodal Prompting

    Python 237

  2. GTSinger GTSinger Public

    Dataset and code of GTSinger(NeurIPS 2024 Spotlight): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks

    Python 357 16

  3. VersBand VersBand Public

    PyTorch Implementation of VersBand(EMNLP 2025): Versatile Framework for Song Generation with Prompt-based Control

    Python 224 41

  4. TCSinger2 TCSinger2 Public

    PyTorch Implementation of TCSinger 2(ACL 2025): Customizable Multilingual Zero-shot Singing Voice Synthesis

    Python 175 31

  5. TCSinger TCSinger Public

    PyTorch Implementation of TCSinger(EMNLP 2024): Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control

    Python 377 46

  6. StyleSinger StyleSinger Public

    PyTorch Implementation of StyleSinger(AAAI 2024): Style Transfer for Out-of-Domain Singing Voice Synthesis

    Python 419 27