Starred repositories
Official code for "Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis"
Flexible audio loudness meter in Python with implementation of ITU-R BS.1770-4 loudness algorithm
A high-quality rapid TTS voice cloning model that reaches speeds of 150x realtime.
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
Patterns and resources of low latency programming.
Opencpop: A High-Quality Open Source Chinese Popular Song Database for Singing Voice Synthesis
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
The official implementation of CATT Arabic diacritization models.
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.
Convert PDF to markdown + JSON quickly with high accuracy
A python package to analyze and compare voices with deep learning
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
A generative speech model for daily dialogue.
Awesome speech/audio LLMs, representation learning, and codec models
Joint speech-language model - respond directly to audio!
chinese speech pretrained models



