Skip to content

✨feat: WebAPI & Docker#40

Open
breakstring wants to merge 5 commits intoSparkAudio:mainfrom
breakstring:main
Open

✨feat: WebAPI & Docker#40
breakstring wants to merge 5 commits intoSparkAudio:mainfrom
breakstring:main

Conversation

@breakstring
Copy link

  1. Add Spark-TTS Web API with FastAPI implementation
    2.Add Docker support for Spark-TTS deployment
- Implement comprehensive FastAPI-based TTS API service - Add API endpoints for text-to-speech with voice cloning and creation - Create example client script for API interaction - Include environment configuration and startup script - Add README with detailed API usage and configuration instructions - Configure .env.example for flexible service setup - Implement file cleanup and output management - Support multiple audio input and output methods
- Create Dockerfile for building Spark-TTS images with flexible model inclusion - Add docker_builder.sh script for easy image building - Implement docker-compose.yml with multiple service configurations - Add .dockerignore to optimize Docker build context - Update README and run_api.sh to support Docker deployment - Configure environment variables and service types for containerized deployment
@breakstring breakstring mentioned this pull request Mar 8, 2025
@D34DC3N73R
Copy link

Tested this out but I get the following error in startup logs:

ERROR:api.main:Model initialization failed: requires the protobuf library but it was not found in your environment. Checkout the instructions on the installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones that match your environment. Please note that you may need to restart your runtime after installation. 

Adding protobuf==4.21.12 to requirements.txt and building again solves the issue.

@breakstring
Copy link
Author

Tested this out but I get the following error in startup logs:

ERROR:api.main:Model initialization failed: requires the protobuf library but it was not found in your environment. Checkout the instructions on the installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones that match your environment. Please note that you may need to restart your runtime after installation. 

Adding protobuf==4.21.12 to requirements.txt and building again solves the issue.

It's very strange, I checked in my own environment and there is no such protobuf package, and there is no such error at runtime(both in docker logs and local running logs).

(sparktts) azureuser@t4-westus2:~/Spark-TTS$ pip list Package Version ------------------------ ------------ accelerate 0.26.0 aiofiles 23.2.1 annotated-types 0.7.0 antlr4-python3-runtime 4.9.3 anyio 4.8.0 audioread 3.0.1 certifi 2025.1.31 cffi 1.17.1 charset-normalizer 3.4.1 click 8.1.8 decorator 5.2.1 einops 0.8.1 einx 0.3.0 fastapi 0.115.11 ffmpy 0.5.0 filelock 3.17.0 frozendict 2.4.6 fsspec 2025.2.0 gradio 5.18.0 gradio_client 1.7.2 h11 0.14.0 httpcore 1.0.7 httpx 0.28.1 huggingface-hub 0.29.2 idna 3.10 Jinja2 3.1.6 joblib 1.4.2 lazy_loader 0.4 librosa 0.10.2.post1 llvmlite 0.44.0 markdown-it-py 3.0.0 MarkupSafe 2.1.5 mdurl 0.1.2 mpmath 1.3.0 msgpack 1.1.0 networkx 3.4.2 numba 0.61.0 numpy 2.1.3 nvidia-cublas-cu12 12.4.5.8 nvidia-cuda-cupti-cu12 12.4.127 nvidia-cuda-nvrtc-cu12 12.4.127 nvidia-cuda-runtime-cu12 12.4.127 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.2.1.3 nvidia-curand-cu12 10.3.5.147 nvidia-cusolver-cu12 11.6.1.9 nvidia-cusparse-cu12 12.3.1.170 nvidia-nccl-cu12 2.21.5 nvidia-nvjitlink-cu12 12.4.127 nvidia-nvtx-cu12 12.4.127 omegaconf 2.3.0 orjson 3.10.15 packaging 24.2 pandas 2.2.3 pillow 11.1.0 pip 25.0 platformdirs 4.3.6 pooch 1.8.2 psutil 7.0.0 pycparser 2.22 pydantic 2.10.6 pydantic_core 2.27.2 pydub 0.25.1 Pygments 2.19.1 python-dateutil 2.9.0.post0 python-dotenv 1.0.1 python-multipart 0.0.20 pytz 2025.1 PyYAML 6.0.2 regex 2024.11.6 requests 2.32.3 rich 13.9.4 ruff 0.9.9 safehttpx 0.1.6 safetensors 0.5.2 scikit-learn 1.6.1 scipy 1.15.2 semantic-version 2.10.0 setuptools 75.8.0 shellingham 1.5.4 six 1.17.0 sniffio 1.3.1 soundfile 0.12.1 soxr 0.5.0.post1 starlette 0.46.0 sympy 1.13.1 threadpoolctl 3.5.0 tokenizers 0.20.3 tomlkit 0.13.2 torch 2.5.1 torchaudio 2.5.1 tqdm 4.66.5 transformers 4.46.2 triton 3.1.0 typer 0.15.2 typing_extensions 4.12.2 tzdata 2025.1 urllib3 2.3.0 uvicorn 0.34.0 websockets 15.0.1 wheel 0.45.1 

At the same time, I also use some other methods to check the protobuf package, which does not exist either.
image

@D34DC3N73R
Copy link

This is the full error

:~/test-sparktts$ docker run -p 7860:7860 --name test-sparktts --gpus all -e SERVICE_TYPE=webui spark-tts:latest-full Traceback (most recent call last): File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2447, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/transformers/models/qwen2/tokenization_qwen2_fast.py", line 120, in __init__ super().__init__( File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_fast.py", line 116, in __init__ fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Exception: expected value at line 1 column 1 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/app/webui.py", line 260, in <module> demo = build_ui( ^^^^^^^^^ File "/app/webui.py", line 97, in build_ui model = initialize_model(model_dir, device=device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/webui.py", line 47, in initialize_model model = SparkTTS(model_dir, device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/cli/SparkTTS.py", line 44, in __init__ self._initialize_inference() File "/app/cli/SparkTTS.py", line 48, in _initialize_inference self.tokenizer = AutoTokenizer.from_pretrained(f"{self.model_dir}/LLM") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 920, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2213, in from_pretrained return cls._from_pretrained( ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2448, in _from_pretrained except import_protobuf_decode_error(): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 87, in import_protobuf_decode_error raise ImportError(PROTOBUF_IMPORT_ERROR.format(error_message)) ImportError: requires the protobuf library but it was not found in your environment. Checkout the instructions on the installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones that match your environment. Please note that you may need to restart your runtime after installation. 

Adding this allows me to run the container after a rebuild

$ cat requirements.txt einops==0.8.1 einx==0.3.0 numpy==2.2.3 omegaconf==2.3.0 packaging==24.2 safetensors==0.5.2 soundfile==0.12.1 soxr==0.5.0.post1 torch==2.5.1 torchaudio==2.5.1 tqdm==4.66.5 transformers==4.46.2 gradio==5.18.0 fastapi==0.115.11 uvicorn==0.34.0 python-dotenv==1.0.1 protobuf==4.21.12 

From within the container

root@d0dad5f76940:/app# pip show protobuf Name: protobuf Version: 4.21.12 Summary: Home-page: https://developers.google.com/protocol-buffers/ Author: protobuf@googlegroups.com Author-email: protobuf@googlegroups.com License: 3-Clause BSD License Location: /usr/local/lib/python3.12/site-packages Requires: Required-by: 
@breakstring
Copy link
Author

This is the full error

:~/test-sparktts$ docker run -p 7860:7860 --name test-sparktts --gpus all -e SERVICE_TYPE=webui spark-tts:latest-full Traceback (most recent call last): File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2447, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/transformers/models/qwen2/tokenization_qwen2_fast.py", line 120, in __init__ super().__init__( File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_fast.py", line 116, in __init__ fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Exception: expected value at line 1 column 1 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/app/webui.py", line 260, in <module> demo = build_ui( ^^^^^^^^^ File "/app/webui.py", line 97, in build_ui model = initialize_model(model_dir, device=device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/webui.py", line 47, in initialize_model model = SparkTTS(model_dir, device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/cli/SparkTTS.py", line 44, in __init__ self._initialize_inference() File "/app/cli/SparkTTS.py", line 48, in _initialize_inference self.tokenizer = AutoTokenizer.from_pretrained(f"{self.model_dir}/LLM") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 920, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2213, in from_pretrained return cls._from_pretrained( ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2448, in _from_pretrained except import_protobuf_decode_error(): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 87, in import_protobuf_decode_error raise ImportError(PROTOBUF_IMPORT_ERROR.format(error_message)) ImportError: requires the protobuf library but it was not found in your environment. Checkout the instructions on the installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones that match your environment. Please note that you may need to restart your runtime after installation. 

Adding this allows me to run the container after a rebuild

$ cat requirements.txt einops==0.8.1 einx==0.3.0 numpy==2.2.3 omegaconf==2.3.0 packaging==24.2 safetensors==0.5.2 soundfile==0.12.1 soxr==0.5.0.post1 torch==2.5.1 torchaudio==2.5.1 tqdm==4.66.5 transformers==4.46.2 gradio==5.18.0 fastapi==0.115.11 uvicorn==0.34.0 python-dotenv==1.0.1 protobuf==4.21.12 

From within the container

root@d0dad5f76940:/app# pip show protobuf Name: protobuf Version: 4.21.12 Summary: Home-page: https://developers.google.com/protocol-buffers/ Author: protobuf@googlegroups.com Author-email: protobuf@googlegroups.com License: 3-Clause BSD License Location: /usr/local/lib/python3.12/site-packages Requires: Required-by: 

Oops, it's webui part.

This is the full error

:~/test-sparktts$ docker run -p 7860:7860 --name test-sparktts --gpus all -e SERVICE_TYPE=webui spark-tts:latest-full Traceback (most recent call last): File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2447, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/transformers/models/qwen2/tokenization_qwen2_fast.py", line 120, in __init__ super().__init__( File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_fast.py", line 116, in __init__ fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Exception: expected value at line 1 column 1 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/app/webui.py", line 260, in <module> demo = build_ui( ^^^^^^^^^ File "/app/webui.py", line 97, in build_ui model = initialize_model(model_dir, device=device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/webui.py", line 47, in initialize_model model = SparkTTS(model_dir, device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/cli/SparkTTS.py", line 44, in __init__ self._initialize_inference() File "/app/cli/SparkTTS.py", line 48, in _initialize_inference self.tokenizer = AutoTokenizer.from_pretrained(f"{self.model_dir}/LLM") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 920, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2213, in from_pretrained return cls._from_pretrained( ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2448, in _from_pretrained except import_protobuf_decode_error(): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 87, in import_protobuf_decode_error raise ImportError(PROTOBUF_IMPORT_ERROR.format(error_message)) ImportError: requires the protobuf library but it was not found in your environment. Checkout the instructions on the installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones that match your environment. Please note that you may need to restart your runtime after installation. 

Adding this allows me to run the container after a rebuild

$ cat requirements.txt einops==0.8.1 einx==0.3.0 numpy==2.2.3 omegaconf==2.3.0 packaging==24.2 safetensors==0.5.2 soundfile==0.12.1 soxr==0.5.0.post1 torch==2.5.1 torchaudio==2.5.1 tqdm==4.66.5 transformers==4.46.2 gradio==5.18.0 fastapi==0.115.11 uvicorn==0.34.0 python-dotenv==1.0.1 protobuf==4.21.12 

From within the container

root@d0dad5f76940:/app# pip show protobuf Name: protobuf Version: 4.21.12 Summary: Home-page: https://developers.google.com/protocol-buffers/ Author: protobuf@googlegroups.com Author-email: protobuf@googlegroups.com License: 3-Clause BSD License Location: /usr/local/lib/python3.12/site-packages Requires: Required-by: 

I'm very sorry, I just packaged the webui part into Docker but didn't test this part of the code, because the webui is the existing code, and I thought it should work fine. I will take some time today to verify it.
Thank you very much for your clarification.

@breakstring
Copy link
Author

image
I just found a clean VM to set up the environment, then completely rebuilt the image and executed your command without encountering the protobuf error you mentioned. The warning in the first line is something I had seen before.

After starting, the corresponding WebUI can also be opened. Of course, there are also some strange issues on the WebUI that cause me to sometimes be able to generate audio and most times not, which is also the reason I repackaged this FastAPI-based WebAPI interface. Gradio is too difficult to use....

@D34DC3N73R
Copy link

You are correct on that. I completely wiped my build cache and downloaded the model fresh from HF and did not receive the error on startup. Sorry for the false report!

@phong-phuong
Copy link

While your intent was to have separate images, one that includes pretrained and a lite one that doens't, the commands here are copying and deleting files in separate layers, which will only add to the filesize.

As a result, the lite image actually contains the pretrained models in the image in earlier layers, twice, one in the /tmp folder, and a second in the final destination.

For reference the pretrained images are around 3.67GB.
Personally, I would completely avoid including the models in the image and let the use mount them to avoid this complexity, and to avoid redundant models in both docker container library and on disk.

Lite image is 17 GB
image

Lite image should be 10 GB:

image

# Copy context COPY . /tmp/context/ # 1st copy (+3.67GB) # Check if model directory exists RUN if [ -d "/tmp/context/pretrained_models" ]; then \ echo "Found pretrained_models directory"; \ else \ echo "pretrained_models directory not found"; \ fi # Decide whether to copy model files based on INCLUDE_MODELS parameter RUN if [ "${INCLUDE_MODELS}" = "true" ]; then \ echo "Including models in the image"; \ if [ -d "/tmp/context/pretrained_models" ]; then \ cp -r /tmp/context/pretrained_models/* /app/pretrained_models/ || echo "No model files to copy"; \ # 2nd copy (+367GB) else \ echo "Warning: pretrained_models directory not found in build context"; \ fi; \ else \ echo "Models will need to be mounted at runtime"; \ fi # Clean up temporary directory - Comment: # This is run in a separate layer, so it doesn't reduce the image size RUN rm -rf /tmp/context 
@breakstring
Copy link
Author

While your intent was to have separate images, one that includes pretrained and a lite one that doens't, the commands here are copying and deleting files in separate layers, which will only add to the filesize.

As a result, the lite image actually contains the pretrained models in the image in earlier layers, twice, one in the /tmp folder, and a second in the final destination.

For reference the pretrained images are around 3.67GB. Personally, I would completely avoid including the models in the image and let the use mount them to avoid this complexity, and to avoid redundant models in both docker container library and on disk.

Lite image is 17 GB image

Lite image should be 10 GB:

image

# Copy context COPY . /tmp/context/ # 1st copy (+3.67GB) # Check if model directory exists RUN if [ -d "/tmp/context/pretrained_models" ]; then \ echo "Found pretrained_models directory"; \ else \ echo "pretrained_models directory not found"; \ fi # Decide whether to copy model files based on INCLUDE_MODELS parameter RUN if [ "${INCLUDE_MODELS}" = "true" ]; then \ echo "Including models in the image"; \ if [ -d "/tmp/context/pretrained_models" ]; then \ cp -r /tmp/context/pretrained_models/* /app/pretrained_models/ || echo "No model files to copy"; \ # 2nd copy (+367GB) else \ echo "Warning: pretrained_models directory not found in build context"; \ fi; \ else \ echo "Models will need to be mounted at runtime"; \ fi # Clean up temporary directory - Comment: # This is run in a separate layer, so it doesn't reduce the image size RUN rm -rf /tmp/context 

Thanks for your feedback. I'm in a travel these days, and will check it next week once I have time. @phong-phuong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants