Stars
The simplest, fastest repository for training/finetuning medium-sized GPTs.
TensorFlow code and pre-trained models for BERT
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
Code and documentation to train Stanford's Alpaca models, and generate the data.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
ImageBind One Embedding Space to Bind Them All
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
vits2 backbone with multilingual-bert
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Code for the paper "Jukebox: A Generative Model for Music"
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
Google AI 2018 BERT pytorch implementation
Official repo for consistency models.
Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
A light weight Python library for the Spotify Web API
Muzic: Music Understanding and Generation with Artificial Intelligence
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
