ποΈ Epub to Audiobook Converter - Pro Modular Edition
A fully autonomous, localized, and AI-powered suite to transform your EPUB library into high-quality audiobooks. This project has been evolved from a simple converter into a professional modular ecosystem where every component is built from source and optimized for high-end GPUs (like the RTX 4080).
π Key Features
- 100% Local & Editable: No more external pre-built images. Every service (XTTS, GPT-SoVITS, Maya1, Kokoro) is built locally from source. You own the code.
- π§ Smart VRAM Management: Automatically starts the required TTS engine and shuts down the others when switching, ensuring your GPU memory is always optimized.
- β¨ AI Chapter Naming: Integrated with Ollama (
dolphin-llama3) to automaticalβ¦
ποΈ Epub to Audiobook Converter - Pro Modular Edition
A fully autonomous, localized, and AI-powered suite to transform your EPUB library into high-quality audiobooks. This project has been evolved from a simple converter into a professional modular ecosystem where every component is built from source and optimized for high-end GPUs (like the RTX 4080).
π Key Features
-
100% Local & Editable: No more external pre-built images. Every service (XTTS, GPT-SoVITS, Maya1, Kokoro) is built locally from source. You own the code.
-
π§ Smart VRAM Management: Automatically starts the required TTS engine and shuts down the others when switching, ensuring your GPU memory is always optimized.
-
β¨ AI Chapter Naming: Integrated with Ollama (
dolphin-llama3) to automatically generate catchy, context-aware chapter titles for English books. -
β‘ Ultra-Fast Engines:
-
Kokoro v1.0: Real-time speed for Italian and English.
-
Maya1: State-of-the-art Italian TTS.
-
GPT-SoVITS: Best-in-class quality for complex narratives.
-
XTTS v2: Reliable voice cloning with improved stability.
-
π‘οΈ Robust Networking: Fixed static IP architecture to prevent "Connection Refused" or "Hostname not found" errors common in Docker Desktop.
-
πΎ Persistent Models: Models are saved on your physical drive. No more multi-gigabyte downloads every time you restart the containers.
π οΈ Tech Stack
- Core: Python 3.11 (Slim Trixie)
- UI: Gradio with real-time log streaming.
- Orchestration: Docker Compose with bridge networking and static IPAM.
- Intelligence: Ollama (Local LLM).
- Audio: Pydub, FFmpeg, and specialized TTS wrappers.
π¦ Directory Structure
.
βββ epub_to_audiobook_src/ # Core logic and Web UI
βββ Kokoro/ # Ultra-fast TTS source
βββ GPT-SoVITS/ # High-fidelity TTS source
βββ Maya1/ # SOTA Italian TTS source
βββ xtts_api_server_src/ # XTTS v2 Backend
βββ wrappers/ # OpenAI-compatible API adapters
βββ refs/ # Voice reference files (.wav + .txt)
βββ models_fast_cache/ # Persistent storage for XTTS
βββ models_gpt_sovits/ # Persistent storage for SoVITS
βββ models_maya1/ # Persistent storage for Maya1
βββ Output/ # Your generated audiobooks
π¦ Getting Started
1. Prerequisites
- Docker Desktop with CUDA support enabled.
- An NVIDIA GPU (16GB+ VRAM recommended for Maya1).
- (Optional) Ollama installed locally or running via the provided container.
2. Deployment
docker compose up -d --build
Note: The first start will take ~10 minutes to download and compile model weights.
3. Usage
Access the Web UI at: http://localhost:9900
- Select Book: Upload your
.epub. - Choose Engine: Switch between Kokoro, XTTS, Maya1, or GPT-SoVITS.
- Smart VRAM: Click "β‘ Activate Engine" to free up GPU memory.
- Start Conversion: Watch the logs stream in real-time.
βοΈ Advanced Configuration
Smart VRAM Logic
The system uses a direct link to the Docker Socket (/var/run/docker.sock). When you select a new engine, the UI sends a signal to stop competing containers, ensuring that 100% of your VRAM is available for the current task.
AI Title Generation
If your book is in English and has generic titles (like "Chapter 1"), the system will send the first 3000 characters to Ollama to generate a proper title. This is handled by an uncensored prompt to ensure it works even with adult or sensitive content.
π§ Maintenance
Clear VRAM: If something gets stuck, use the "π§Ή Panic Clean VRAM" button in the UI. Update Code: Since everything is local, just edit the files in the _src folders and restart the container.
π License
This project is for personal and educational use. Please respect the licenses of the individual TTS engines (RVC, Maya Research, Coqui, etc.).