Stars
A feature-rich command-line audio/video downloader
Command-line program to download videos from YouTube.com and other video sites
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Robust Speech Recognition via Large-Scale Weak Supervision
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
💫 Toolkit to help you get started with Spec-Driven Development
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
The original local LLM interface. Text, vision, tool-calling, training, and more. 100% offline.
Portable file server with accelerated resumable uploads, dedup, WebDAV, SFTP, FTP, TFTP, zeroconf, media indexer, thumbnails++ all in one file
aider is AI pair programming in your terminal
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthr…
Easily train a good VC model with voice data <= 10 mins!
⚡ A Fast, Extensible Progress Bar for Python and CLI
Google Chromium, sans integration with Google
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
A TTS model capable of generating ultra-realistic dialogue in one pass.
Accelerate your web app development | Build fast. Run fast.
Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.
Lets make video diffusion practical!
A Conversational Speech Generation Model
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
ComfyUI-Manager is an extension designed to enhance the usability of ComfyUI. It offers management functions to install, remove, disable, and enable various custom nodes of ComfyUI. Furthermore, th…
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
Generate 3D objects conditioned on text or images
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
Streamlined interface for generating images with AI in Krita. Inpaint and outpaint with optional text prompt, no tweaking required.

