Qwen3-TTS, Apache-2.0, production-ready TTS

Qwen3-TTS: Create Human-Like Voices in Seconds

Turn a 3-second clip into a voice that sounds real — natural, emotional, human-like.

Qwen3-TTS streaming reports first-packet latency down to 97ms in public benchmarks.

Qwen3-TTS Live Demo

Try the Qwen3-TTS Space demo directly in your browser.

What is Qwen3-TTS

Qwen3-TTS combines a 12Hz multi-codebook tokenizer with a dual-track architecture to balance speed, control, and fidelity.

12Hz multi-codebook tokenizer

Efficient acoustic compression with high-fidelity reconstruction and paralinguistic detail.

Dual-track discrete LM

Reduces bottlenecks and error accumulation from conventional LM+DiT pipelines.

Natural-language voice control

Guide timbre, emotion, and prosody with instructions for expressive output.

Multilingual coverage

Public docs list 10 major languages including Chinese, English, Japanese, and more.

Why developers choose Qwen3-TTS

Clear model roles, transparent benchmarks, and flexible Qwen3-TTS deployment paths.

0.6B and 1.7B variants span Base (general TTS and clone), CustomVoice (preset voices + instruction control), and VoiceDesign (new voice creation).

How to integrate Qwen3-TTS

A practical path from local tests to real-time Qwen3-TTS deployments.

1

Set up Python 3.12

Create an environment and install qwen-tts for local inference.

2

Choose model + mode

Pick Base, CustomVoice, or VoiceDesign with 0.6B or 1.7B depending on budget.

3

Generate with Qwen3TTSModel

Use generate_custom_voice / generate_voice_design / generate_voice_clone to synthesize audio.

4

Deploy via demo or API

Run the web demo locally or connect to DashScope real-time APIs for production.

Qwen3-TTS capabilities at a glance

Qwen3-TTS covers voice design, cloning, controllable speech, and streaming.

VoiceDesign

Create new voices from natural-language descriptions.

Voice clone (3s)

Base supports fast cloning from seconds of reference audio.

CustomVoice control

Preset voices with instruction-based control of timbre and emotion.

Streaming + non-streaming

One model supports batch synthesis and real-time streaming.

Multilingual and dialect-ready

Public docs report 10 languages with multiple dialect voices.

Apache-2.0 open source

Open weights and permissive licensing for commercial use.

Qwen3-TTS performance highlights

Benchmark points summarized from public Qwen3-TTS evaluations.

97ms First-packet latency in streaming mode

97ms

First-packet latency in streaming mode

10 languages Publicly reported multilingual coverage

10 languages

Publicly reported multilingual coverage

WER 1.835% Multilingual clone eval with 0.789 speaker similarity

WER 1.835%

Multilingual clone eval with 0.789 speaker similarity

User testimonials

Experience notes shared by teams across industries.

Voice drafts land fast, so our reviews move forward without delay.

Emma Johnson, Product Manager

Emma Johnson

Product Manager

Switching languages is effortless, making A/B voice tests for campaigns easy.

Michael Carter, Growth Marketing

Michael Carter

Growth Marketing

The API is clean and latency stays low, even for real-time flows.

Sophia Davis, Frontend Engineer

Sophia Davis

Frontend Engineer

Tone and pacing are easy to tune, so demos land with clients faster.

Daniel Thompson, Video Producer

Daniel Thompson

Video Producer

Consistent voice style keeps lessons cohesive across long courses.

Olivia Martinez, Instructional Design Lead

Olivia Martinez

Instructional Design Lead

Lightweight setup and fast onboarding make it ideal for small teams.

James Wilson, Indie Developer

James Wilson

Indie Developer

We can tweak copy and hear results immediately—iteration is twice as fast.

Ava Brooks, Editorial Director

Ava Brooks

Editorial Director

Long-form coherence holds up well, perfect for quick voice data screening.

Ethan Parker, AI Researcher

Ethan Parker

AI Researcher

The demo is ready for internal alignment, saving hours in decision cycles.

Liam Anderson, Brand Marketing

Liam Anderson

Brand Marketing

Frequently asked questions

Key facts summarized from public docs and benchmarks.








Build with Qwen3-TTS

Qwen3-TTS delivers open-source, controllable speech for real-time and batch use cases.

Get Qwen3-TTS updates

Qwen3-TTS product updates, demos, and release notes in your inbox.