Fish Speech

Documentation is under construction, English is not fully supported yet.

This codebase is released under BSD-3-Clause License, and all models are released under CC-BY-NC-SA-4.0 License. Please refer to LICENSE for more details.

Disclaimer

We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.

Requirements

GPU memory: 2GB (for inference), 24GB (for finetuning)
System: Linux (full functionality), Windows (inference only, flash-attn is not supported, torch.compile is not supported)

Therefore, we strongly recommend to use WSL2 or docker to run the codebase for Windows users.

Setup

# Basic environment setup conda create -n fish-speech python=3.10 conda activate fish-speech conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia # Install flash-attn (for linux) pip3 install ninja && MAX_JOBS=4 pip3 install flash-attn --no-build-isolation # Install fish-speech pip3 install -e .

Inference (CLI)

Download required vqgan and text2semantic model from our huggingface repo.

wget https://huggingface.co/fishaudio/speech-lm-v1/raw/main/vqgan-v1.pth -O checkpoints/vqgan-v1.pth wget https://huggingface.co/fishaudio/speech-lm-v1/blob/main/text2semantic-400m-v0.1-4k.pth -O checkpoints/text2semantic-400m-v0.1-4k.pth

Generate semantic tokens from text:

python tools/llama/generate.py \ --text "Hello" \ --num-samples 2 \ --compile

You may want to use --compile to fuse cuda kernels faster inference (~25 tokens/sec -> ~300 tokens/sec).

Generate vocals from semantic tokens:

python tools/vqgan/inference.py -i codes_0.npy

Rust Data Server

Since loading and shuffle the dataset is very slow and memory consuming, we use a rust server to load and shuffle the dataset. The server is based on GRPC and can be installed by

cd data_server cargo build --release

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
data_server		data_server
fish_speech		fish_speech
tools		tools
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md
dockerfile		dockerfile
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fish Speech

Disclaimer

Requirements

Setup

Inference (CLI)

Rust Data Server

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fish Speech

Disclaimer

Requirements

Setup

Inference (CLI)

Rust Data Server

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages