khariha/dia2-easy-tts: An easy way to deploy Dia2 TTS without a GPU

Dia2 Easy TTS

Deploy a production-ready Text-to-Speech API using Dia2 in minutes—no powerful GPU hardware required. Clone voices from short audio samples and generate natural-sounding speech with just a few API calls.

This project wraps the powerful Dia2 TTS model in a simple REST API deployed on Modal’s serverless infrastructure.

Why use Modal?

Zero hardware setup - Runs entirely on Modal’s cloud GPUs
Simple REST API - Standard HTTP endpoints for easy integration
Voice cloning - Clone any voice from a short WAV sample
Multi-speaker dialogues - Support for conversations between two speakers
Fast deployment - Get your API running in ~5 minutes
Free tier friendly - $30/month in free Modal credits

Demo

…

Dia2 Easy TTS

This project wraps the powerful Dia2 TTS model in a simple REST API deployed on Modal’s serverless infrastructure.

Why use Modal?

Zero hardware setup - Runs entirely on Modal’s cloud GPUs
Simple REST API - Standard HTTP endpoints for easy integration
Voice cloning - Clone any voice from a short WAV sample
Multi-speaker dialogues - Support for conversations between two speakers
Fast deployment - Get your API running in ~5 minutes
Free tier friendly - $30/month in free Modal credits

Demo

Coming soon!

Requirements

Python 3.8 or higher
A Modal account (sign up free here)

Quick Setup

1. Install Modal

pip install modal

2. Authenticate with Modal

modal setup

This opens your browser to log in and authenticates your machine with Modal.

3. Clone and Deploy

# Clone the repository
git clone https://github.com/khariha/dia2-easy-tts.git
cd dia2-easy-tts

# Deploy to Modal (first time takes ~5 minutes to download model weights)
modal deploy main.py

A successful deployment will show:

✓ App deployed! 🎉
└── 🔨 Created web function fastapi_app => https://your-url.modal.run

4. Test Your Deployment

# Health check
curl https://your-url.modal.run/health

# Generate speech with default voices
curl -X POST https://your-url.modal.run/synthesize \
-F 'script=[S1]Hello, this is a test of the Dia2 TTS system.' \
-o output.wav

Congrats! You now have a working text to speech server without requiring any powerful hardware yourself!

Basic Synthesis (Using Default Voices)

curl -X POST https://your-url.modal.run/synthesize \
-F 'script=[S1]Your text here with [S2]multiple speakers.' \
-o output.wav

Custom Voice Cloning

curl -X POST https://your-url.modal.run/synthesize \
-F 'script=[S1]Hello world' \
-F 'speaker1_audio=@/absolute/path/to/my_voice.wav' \
-o output.wav

Multi-Speaker Dialogue

curl -X POST https://your-url.modal.run/synthesize \
-F 'script=[S1]How are you? [S2]I am doing great, thanks!' \
-F 'speaker1_audio=@/absolute/path/to/voice1.wav' \
-F 'speaker2_audio=@/absolute/path/to/voice2.wav' \
-o conversation.wav

Python Client

import requests

url = "https://your-url.modal.run/synthesize"

script = """[S1]Welcome to our podcast.
[S2]Thanks for having me!
[S1]Let's dive right in."""

# With default voices
response = requests.post(url, data={"script": script}, timeout=600)

if response.ok:
with open("output.wav", "wb") as f:
f.write(response.content)
print("✓ Audio saved!")

JavaScript/TypeScript

const formData = new FormData();
formData.append('script', '[S1]Hello from JavaScript!');

const response = await fetch('https://your-url.modal.run/synthesize', {
method: 'POST',
body: formData
});

if (response.ok) {
const blob = await response.blob();
const url = URL.createObjectURL(blob);
// Use url for audio playback or download
}

Script Format

Scripts use speaker tags to indicate who’s speaking:

[S1] - Speaker 1 (uses speaker1_audio or default voice 1)
[S2] - Speaker 2 (uses speaker2_audio or default voice 2)

Example:

[S1]It was a dark and stormy night.
[S2]The wind howled through the trees.
[S1]Sarah pulled her coat tighter as she walked.

API Endpoints

`POST /synthesize`

Generate speech from a script.

Parameters:

script (required): Text with [S1] and [S2] speaker tags
speaker1_audio (optional): WAV file for speaker 1’s voice
speaker2_audio (optional): WAV file for speaker 2’s voice

Returns: WAV audio file

Example:

curl -X POST https://your-url.modal.run/synthesize \
-F 'script=[S1]Hello world' \
-o output.wav

`GET /health`

Check service status and GPU availability.

Returns:

{
"status": "healthy",
"service": "Dia2 TTS",
"gpu_info": {
"torch_info": {
"cuda_available": true,
"device_name": "NVIDIA A100-SXM4-40GB"
}
}
}

🔧 Troubleshooting

"modal: command not found"

pip install modal

"No Modal token found"

modal setup

Follow the browser prompts to authenticate.

First deployment is slow

The first deployment downloads ~2GB of model weights. This is cached for subsequent deployments (~30 seconds).

Request timeout

For long scripts, increase client timeout:

requests.post(url, data=data, timeout=600)  # 10 minutes

Synthesis takes too long

Long scripts (>30 seconds of audio) may take 1-2 minutes
Consider breaking into smaller chunks
Use keep_warm=1 for faster subsequent requests (costs more)

Stopping a Running Modal App

If you want to stop a deployed app (for example, to avoid GPU usage or shut down a test deployment), you can do so using the Modal CLI.

1. List Your Active Apps

modal app list

This will display all apps deployed under your Modal account, including their IDs and names.

Example output:

APP ID                          NAME
ap-abc123xyz                   dia2-easy-tts

2. Stop an App

Use the app ID from the list command:

modal app stop ap-abc123xyz

This immediately stops the app and releases any allocated GPU resources.

Notes

Stopping an app prevents further requests to its URL
You can redeploy the app later using modal deploy main.py
Stopped apps do not incur GPU charges
Useful for saving credits when you’re done testing

GPU quota exceeded

Modal free tier has GPU usage limits
Check your usage at modal.com/settings/billing
Upgrade to paid tier for higher limits

Acknowledgments

Dia2 - The powerful TTS model powering this API
Modal - Serverless infrastructure platform
FastAPI - Modern web framework

Dia2 Easy TTS

Why use Modal?

Demo

Dia2 Easy TTS

Why use Modal?

Demo

Requirements

Quick Setup

1. Install Modal

2. Authenticate with Modal

3. Clone and Deploy

4. Test Your Deployment

Basic Synthesis (Using Default Voices)

Custom Voice Cloning

Multi-Speaker Dialogue

Python Client

JavaScript/TypeScript

Script Format

API Endpoints

POST /synthesize

GET /health

🔧 Troubleshooting

"modal: command not found"

"No Modal token found"

First deployment is slow

Request timeout

Synthesis takes too long

Stopping a Running Modal App

1. List Your Active Apps

2. Stop an App

Notes

GPU quota exceeded

Acknowledgments

Similar Posts

`POST /synthesize`

`GET /health`