Starred repositories

SWivid / Habibi-TTS

Official code for "Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis"

Python 292 31 Updated Mar 7, 2026

csteinmetz1 / pyloudnorm

Flexible audio loudness meter in Python with implementation of ITU-R BS.1770-4 loudness algorithm

Python 763 60 Updated Jan 4, 2026

ysharma3501 / LuxTTS

A high-quality rapid TTS voice cloning model that reaches speeds of 150x realtime.

Python 2,948 351 Updated Mar 12, 2026

FunAudioLLM / Fun-ASR

Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.

Python 952 81 Updated Feb 25, 2026

penberg / awesome-low-latency

Patterns and resources of low latency programming.

1,201 64 Updated Jul 30, 2025

wenet-e2e / opencpop

Opencpop: A High-Quality Open Source Chinese Popular Song Database for Singing Voice Synthesis

232 11 Updated Dec 10, 2025

facebookresearch / omnilingual-asr

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,721 243 Updated Dec 30, 2025

wenet-e2e / west

We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction

Python 196 17 Updated Mar 19, 2026

Soul-AILab / SoulX-Podcast

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 3,243 422 Updated Dec 11, 2025

stepfun-ai / Step-Audio2

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,368 101 Updated Mar 16, 2026

xingchensong / CosyVoice-ttsfrd

Python 25 3 Updated Jun 19, 2025

abjadai / catt

The official implementation of CATT Arabic diacritization models.

Python 67 9 Updated Jul 18, 2025

index-tts / index-tts

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Python 19,505 2,400 Updated Mar 16, 2026

atcelen / IDesign

Python 90 20 Updated Jul 21, 2025

frankyoujian / Edge-Punct-Casing

Python 29 7 Updated Feb 4, 2025

DataoceanAI / Dolphin

Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.

Python 702 64 Updated Mar 19, 2026

datalab-to / marker

Convert PDF to markdown + JSON quickly with high accuracy

Python 32,897 2,276 Updated Mar 10, 2026

jishengpeng / WavChat

A Survey of Spoken Dialogue Models (60 pages)

315 18 Updated Nov 28, 2024

zai-org / GLM-4-Voice

GLM-4-Voice | 端到端中英语音对话模型

Python 3,153 277 Updated Dec 5, 2024

resemble-ai / Resemblyzer

A python package to analyze and compare voices with deep learning

Python 3,232 478 Updated Oct 12, 2023

voilet1996 / practice-demo

前端实践项目

Vue 1 Updated Jan 18, 2024

MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

Jupyter Notebook 5,443 500 Updated Feb 23, 2026

2noise / ChatTTS

A generative speech model for daily dialogue.

Python 38,959 4,227 Updated Jan 18, 2026

Tele-AI / TeleSpeech-ASR

Python 839 75 Updated Jun 7, 2024

ga642381 / speech-trident

Awesome speech/audio LLMs, representation learning, and codec models

1,212 73 Updated Aug 13, 2025

tincans-ai / gazelle

Joint speech-language model - respond directly to audio!

Python 373 33 Updated Jul 1, 2024

minzwon / musicfm

Python 252 12 Updated Feb 14, 2024

TencentGameMate / chinese_speech_pretrain

chinese speech pretrained models

Shell 1,194 89 Updated Aug 23, 2024

microsoft / NeuralSpeech

Python 1,459 187 Updated Feb 11, 2024

attapol / tltk

Thai Language Toolkit

Python 29 5 Updated Dec 20, 2025

speaker-diarization

spell-check

audio-alignment

conformer

vad

rnn-transducer

audio-processing

ctc

crnn-tensorflow

Python

See all starred topics

MXuer