Dia2 Easy TTS
Deploy a production-ready Text-to-Speech API using Dia2 in minutes—no powerful GPU hardware required. Clone voices from short audio samples and generate natural-sounding speech with just a few API calls.
This project wraps the powerful Dia2 TTS model in a simple REST API deployed on Modal’s serverless infrastructure.
Why use Modal?
- Zero hardware setup - Runs entirely on Modal’s cloud GPUs
- Simple REST API - Standard HTTP endpoints for easy integration
- Voice cloning - Clone any voice from a short WAV sample
- Multi-speaker dialogues - Support for conversations between two speakers
- Fast deployment - Get your API running in ~5 minutes
- Free tier friendly - $30/month in free Modal credits
Demo
…
Dia2 Easy TTS
Deploy a production-ready Text-to-Speech API using Dia2 in minutes—no powerful GPU hardware required. Clone voices from short audio samples and generate natural-sounding speech with just a few API calls.
This project wraps the powerful Dia2 TTS model in a simple REST API deployed on Modal’s serverless infrastructure.
Why use Modal?
- Zero hardware setup - Runs entirely on Modal’s cloud GPUs
- Simple REST API - Standard HTTP endpoints for easy integration
- Voice cloning - Clone any voice from a short WAV sample
- Multi-speaker dialogues - Support for conversations between two speakers
- Fast deployment - Get your API running in ~5 minutes
- Free tier friendly - $30/month in free Modal credits
Demo
Coming soon!
Requirements
- Python 3.8 or higher
- A Modal account (sign up free here)
Quick Setup
1. Install Modal
pip install modal
2. Authenticate with Modal
modal setup
This opens your browser to log in and authenticates your machine with Modal.
3. Clone and Deploy
# Clone the repository
git clone https://github.com/khariha/dia2-easy-tts.git
cd dia2-easy-tts
# Deploy to Modal (first time takes ~5 minutes to download model weights)
modal deploy main.py
A successful deployment will show:
✓ App deployed! 🎉
└── 🔨 Created web function fastapi_app => https://your-url.modal.run
4. Test Your Deployment
# Health check
curl https://your-url.modal.run/health
# Generate speech with default voices
curl -X POST https://your-url.modal.run/synthesize \
-F 'script=[S1]Hello, this is a test of the Dia2 TTS system.' \
-o output.wav
Congrats! You now have a working text to speech server without requiring any powerful hardware yourself!
Basic Synthesis (Using Default Voices)
curl -X POST https://your-url.modal.run/synthesize \
-F 'script=[S1]Your text here with [S2]multiple speakers.' \
-o output.wav
Custom Voice Cloning
curl -X POST https://your-url.modal.run/synthesize \
-F 'script=[S1]Hello world' \
-F 'speaker1_audio=@/absolute/path/to/my_voice.wav' \
-o output.wav
Multi-Speaker Dialogue
curl -X POST https://your-url.modal.run/synthesize \
-F 'script=[S1]How are you? [S2]I am doing great, thanks!' \
-F 'speaker1_audio=@/absolute/path/to/voice1.wav' \
-F 'speaker2_audio=@/absolute/path/to/voice2.wav' \
-o conversation.wav
Python Client
import requests
url = "https://your-url.modal.run/synthesize"
script = """[S1]Welcome to our podcast.
[S2]Thanks for having me!
[S1]Let's dive right in."""
# With default voices
response = requests.post(url, data={"script": script}, timeout=600)
if response.ok:
with open("output.wav", "wb") as f:
f.write(response.content)
print("✓ Audio saved!")
JavaScript/TypeScript
const formData = new FormData();
formData.append('script', '[S1]Hello from JavaScript!');
const response = await fetch('https://your-url.modal.run/synthesize', {
method: 'POST',
body: formData
});
if (response.ok) {
const blob = await response.blob();
const url = URL.createObjectURL(blob);
// Use url for audio playback or download
}
Script Format
Scripts use speaker tags to indicate who’s speaking:
[S1]- Speaker 1 (usesspeaker1_audioor default voice 1)[S2]- Speaker 2 (usesspeaker2_audioor default voice 2)
Example:
[S1]It was a dark and stormy night.
[S2]The wind howled through the trees.
[S1]Sarah pulled her coat tighter as she walked.
API Endpoints
POST /synthesize
Generate speech from a script.
Parameters:
script(required): Text with[S1]and[S2]speaker tagsspeaker1_audio(optional): WAV file for speaker 1’s voicespeaker2_audio(optional): WAV file for speaker 2’s voice
Returns: WAV audio file
Example:
curl -X POST https://your-url.modal.run/synthesize \
-F 'script=[S1]Hello world' \
-o output.wav
GET /health
Check service status and GPU availability.
Returns:
{
"status": "healthy",
"service": "Dia2 TTS",
"gpu_info": {
"torch_info": {
"cuda_available": true,
"device_name": "NVIDIA A100-SXM4-40GB"
}
}
}
🔧 Troubleshooting
"modal: command not found"
pip install modal
"No Modal token found"
modal setup
Follow the browser prompts to authenticate.
First deployment is slow
The first deployment downloads ~2GB of model weights. This is cached for subsequent deployments (~30 seconds).
Request timeout
For long scripts, increase client timeout:
requests.post(url, data=data, timeout=600) # 10 minutes
Synthesis takes too long
- Long scripts (>30 seconds of audio) may take 1-2 minutes
- Consider breaking into smaller chunks
- Use
keep_warm=1for faster subsequent requests (costs more)
Stopping a Running Modal App
If you want to stop a deployed app (for example, to avoid GPU usage or shut down a test deployment), you can do so using the Modal CLI.
1. List Your Active Apps
modal app list
This will display all apps deployed under your Modal account, including their IDs and names.
Example output:
APP ID NAME
ap-abc123xyz dia2-easy-tts
2. Stop an App
Use the app ID from the list command:
modal app stop ap-abc123xyz
This immediately stops the app and releases any allocated GPU resources.
Notes
- Stopping an app prevents further requests to its URL
- You can redeploy the app later using
modal deploy main.py - Stopped apps do not incur GPU charges
- Useful for saving credits when you’re done testing
GPU quota exceeded
- Modal free tier has GPU usage limits
- Check your usage at modal.com/settings/billing
- Upgrade to paid tier for higher limits