Skip to content
View MXuer's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report MXuer

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Official code for "Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis"

Python 292 31 Updated Mar 7, 2026

Flexible audio loudness meter in Python with implementation of ITU-R BS.1770-4 loudness algorithm

Python 763 60 Updated Jan 4, 2026

A high-quality rapid TTS voice cloning model that reaches speeds of 150x realtime.

Python 2,948 351 Updated Mar 12, 2026

Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.

Python 952 81 Updated Feb 25, 2026

Patterns and resources of low latency programming.

1,201 64 Updated Jul 30, 2025

Opencpop: A High-Quality Open Source Chinese Popular Song Database for Singing Voice Synthesis

232 11 Updated Dec 10, 2025

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,721 243 Updated Dec 30, 2025

We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction

Python 196 17 Updated Mar 19, 2026

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 3,243 422 Updated Dec 11, 2025

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,368 101 Updated Mar 16, 2026

The official implementation of CATT Arabic diacritization models.

Python 67 9 Updated Jul 18, 2025

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Python 19,505 2,400 Updated Mar 16, 2026
Python 90 20 Updated Jul 21, 2025

Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.

Python 702 64 Updated Mar 19, 2026

Convert PDF to markdown + JSON quickly with high accuracy

Python 32,897 2,276 Updated Mar 10, 2026

A Survey of Spoken Dialogue Models (60 pages)

315 18 Updated Nov 28, 2024

GLM-4-Voice | 端到端中英语音对话模型

Python 3,153 277 Updated Dec 5, 2024

A python package to analyze and compare voices with deep learning

Python 3,232 478 Updated Oct 12, 2023

前端实践项目

Vue 1 Updated Jan 18, 2024

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

Jupyter Notebook 5,443 500 Updated Feb 23, 2026

A generative speech model for daily dialogue.

Python 38,959 4,227 Updated Jan 18, 2026
Python 839 75 Updated Jun 7, 2024

Awesome speech/audio LLMs, representation learning, and codec models

1,212 73 Updated Aug 13, 2025

Joint speech-language model - respond directly to audio!

Python 373 33 Updated Jul 1, 2024
Python 252 12 Updated Feb 14, 2024

chinese speech pretrained models

Shell 1,194 89 Updated Aug 23, 2024
Python 1,459 187 Updated Feb 11, 2024

Thai Language Toolkit

Python 29 5 Updated Dec 20, 2025
Next