ga642381

🎯

Focusing

Kai-Wei Chang (張凱爲) ga642381

🎯

Focusing

✊ MIT Postdoc ✊ NTU Ph.D. ✊ former Research Scientist Intern @ Meta Reality labs

311 followers · 112 following

Cambridge, USA
kwchang.org

Sponsoring

Achievements

Highlights

Lists (2)

Sort

✨ Inspiration

1 repository

🚀 My stack

1 repository

Starred repositories

taigrr / spank

Slap your MacBook, it yells back. Uses Apple Silicon accelerometer via IOKit HID.

Go 3,390 147 Updated Mar 17, 2026

mhamilton723 / DenseAV

Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

Jupyter Notebook 87 13 Updated Jun 12, 2024

nicolvisser / ZeroSyl

Simple Zero-Resource Syllable Tokenization for Spoken Language Modeling

Python 9 2 Updated Mar 2, 2026

cheng-yeh / MindTransformer

Official implementation of "The Mind's Transformer" (ICLR 2026).

Shell 7 Updated Mar 1, 2026

kyutai-labs / hibiki-zero

A real-time and multilingual speech translation model

Python 223 21 Updated Feb 13, 2026

yu-haoyuan / fd-badcat

fd-sds

Python 13 Updated Feb 2, 2026

QwenLM / Qwen3-TTS

Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…

Python 9,886 1,248 Updated Mar 17, 2026

huggingface / trl

Train transformer language models with reinforcement learning.

Python 17,760 2,582 Updated Mar 23, 2026

facebookresearch / pixio

Pixio: a capable vision encoder dedicated to dense prediction, simply by pixel reconstruction

Python 362 10 Updated Jan 22, 2026

facebookresearch / perception_models

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 2,206 148 Updated Mar 12, 2026

facebookresearch / sam-audio

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,403 300 Updated Jan 5, 2026

tylerL404 / seeing-isnt-hearing

Code and resources from Seeing is Hearing: Benchmarking Vision Language Models at Interpreting Spectrograms (IJCNLP-AACL, 2025)

2 Updated Oct 26, 2025

roudimit / Omni-R1

[ASRU 2025] Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?

Python 44 1 Updated Nov 21, 2025

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...)…

Python 13,303 1,292 Updated Mar 23, 2026

kyutai-labs / nanoGPTaudio

Forked from karpathy/nanoGPT

Code for the blog "Neural audio codecs: how to get audio into LLMs"

Python 159 4 Updated Oct 20, 2025

Kai-Wei Chang (張凱爲) ga642381

Sponsoring

Highlights

Lists (2)

✨ Inspiration

🚀 My stack

Starred repositories

Raspberry Pi

Qt

Python

Machine learning

Data visualization