LuxTTS

LuxTTS is an lightweight zipvoice based text-to-speech model designed for high quality voice cloning and realistic generation at speeds exceeding 150x realtime.

LuxTTS_demo.mp4

The main features are

Voice cloning: SOTA voice cloning on par with models 10x larger.
Clarity: Clear 48khz speech generation unlike most TTS models which are limited to 24khz.
Speed: Reaches speeds of 150x realtime on a single GPU and faster then realtime on CPU's as well.
Efficiency: Fits within 1gb vram meaning it can fit in any local gpu.

Usage

You can try it locally, colab, or spaces.

Simple installation:

git clone https://github.com/ysharma3501/LuxTTS.git cd LuxTTS pip install -r requirements.txt

Load model:

from zipvoice.luxvoice import LuxTTS # load model on GPU lux_tts = LuxTTS('YatharthS/LuxTTS', device='cuda') # load model on CPU # lux_tts = LuxTTS('YatharthS/LuxTTS', device='cpu', threads=2) # load model on MPS for macs # lux_tts = LuxTTS('YatharthS/LuxTTS', device='mps')

Simple inference

import soundfile as sf from IPython.display import Audio text = "Hey, what's up? I'm feeling really great if you ask me honestly!" ## change this to your reference file path, can be wav/mp3 prompt_audio = 'audio_file.wav' ## encode audio(takes 10s to init because of librosa first time) encoded_prompt = lux_tts.encode_prompt(prompt_audio, rms=0.01) ## generate speech final_wav = lux_tts.generate_speech(text, encoded_prompt, num_steps=4) ## save audio final_wav = final_wav.numpy().squeeze() sf.write('output.wav', final_wav, 48000) ## display speech if display is not None: display(Audio(final_wav, rate=48000))

Inference with sampling params:

import soundfile as sf from IPython.display import Audio text = "Hey, what's up? I'm feeling really great if you ask me honestly!" ## change this to your reference file path, can be wav/mp3 prompt_audio = 'audio_file.wav' rms = 0.01 ## higher makes it sound louder(0.01 or so recommended) t_shift = 0.9 ## sampling param, higher can sound better but worse WER num_steps = 4 ## sampling param, higher sounds better but takes longer(3-4 is best for efficiency) speed = 1.0 ## sampling param, controls speed of audio(lower=slower) return_smooth = False ## sampling param, makes it sound smoother possibly but less cleaner ref_duration = 5 ## Setting it lower can speedup inference, set to 1000 if you find artifacts. ## encode audio(takes 10s to init because of librosa first time) encoded_prompt = lux_tts.encode_prompt(prompt_audio, duration=ref_duration, rms=rms) ## generate speech final_wav = lux_tts.generate_speech(text, encoded_prompt, num_steps=num_steps, t_shift=t_shift, speed=speed, return_smooth=return_smooth) ## save audio final_wav = final_wav.numpy().squeeze() sf.write('output.wav', final_wav, 48000) ## display speech if display is not None: display(Audio(final_wav, rate=48000))

Tips

Please use at minimum a 3 second audio file for voice cloning.
You can use return_smooth = True if you hear metallic sounds.
Lower t_shift for less possible pronunciation errors but worse quality and vice versa.

Community

Lux-TTS-Gradio: A gradio app to use LuxTTS.
OptiSpeech: Clean UI app to use LuxTTS.
LuxTTS-Comfyui: Nodes to use LuxTTS in comfyui.

Thanks to all community contributions!

Info

Q: How is this different from ZipVoice?

A: LuxTTS uses the same architecture but distilled to 4 steps with an improved sampling technique. It also uses a custom 48khz vocoder instead of the default 24khz version.

Q: Can it be even faster?

A: Yes, currently it uses float32. Float16 should be significantly faster(almost 2x).

Roadmap

Release model and code
Huggingface spaces demo
Release MPS support (thanks to @builtbybasit)
Release LuxTTS v1.5
Release code for float16 inference

Acknowledgments

ZipVoice for their excellent code and model.
Vocos for their great vocoder.

Final Notes

The model and code are licensed under the Apache-2.0 license. See LICENSE for details.

Stars/Likes would be appreciated, thank you.

Email: yatharthsharma350@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
zipvoice		zipvoice
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LuxTTS

The main features are

Usage

Simple installation:

Load model:

Simple inference

Inference with sampling params:

Tips

Community

Info

Roadmap

Acknowledgments

Final Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Languages

Folders and files

Latest commit

History

Repository files navigation

LuxTTS

The main features are

Usage

Simple installation:

Load model:

Simple inference

Inference with sampling params:

Tips

Community

Info

Roadmap

Acknowledgments

Final Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Languages

Packages