🎙️ Introducing Epub to Audiobook Modular: A Fully Local, AI-Powered Audiobook Suite for your EPUBs!

🎙️ Epub to Audiobook Converter - Pro Modular Edition

A fully autonomous, localized, and AI-powered suite to transform your EPUB library into high-quality audiobooks. This project has been evolved from a simple converter into a professional modular ecosystem where every component is built from source and optimized for high-end GPUs (like the RTX 4080).

🚀 Key Features

100% Local & Editable: No more external pre-built images. Every service (XTTS, GPT-SoVITS, Maya1, Kokoro) is built locally from source. You own the code.
🧠 Smart VRAM Management: Automatically starts the required TTS engine and shuts down the others when switching, ensuring your GPU memory is always optimized.
✨ AI Chapter Naming: Integrated with Ollama (dolphin-llama3) to automatical…

🎙️ Epub to Audiobook Converter - Pro Modular Edition

🚀 Key Features

100% Local & Editable: No more external pre-built images. Every service (XTTS, GPT-SoVITS, Maya1, Kokoro) is built locally from source. You own the code.
🧠 Smart VRAM Management: Automatically starts the required TTS engine and shuts down the others when switching, ensuring your GPU memory is always optimized.
✨ AI Chapter Naming: Integrated with Ollama (dolphin-llama3) to automatically generate catchy, context-aware chapter titles for English books.
⚡ Ultra-Fast Engines:
Kokoro v1.0: Real-time speed for Italian and English.
Maya1: State-of-the-art Italian TTS.
GPT-SoVITS: Best-in-class quality for complex narratives.
XTTS v2: Reliable voice cloning with improved stability.
🛡️ Robust Networking: Fixed static IP architecture to prevent "Connection Refused" or "Hostname not found" errors common in Docker Desktop.
💾 Persistent Models: Models are saved on your physical drive. No more multi-gigabyte downloads every time you restart the containers.

🛠️ Tech Stack

Core: Python 3.11 (Slim Trixie)
UI: Gradio with real-time log streaming.
Orchestration: Docker Compose with bridge networking and static IPAM.
Intelligence: Ollama (Local LLM).
Audio: Pydub, FFmpeg, and specialized TTS wrappers.

📦 Directory Structure

.
├── epub_to_audiobook_src/   # Core logic and Web UI
├── Kokoro/                  # Ultra-fast TTS source
├── GPT-SoVITS/              # High-fidelity TTS source
├── Maya1/                   # SOTA Italian TTS source
├── xtts_api_server_src/     # XTTS v2 Backend
├── wrappers/                # OpenAI-compatible API adapters
├── refs/                    # Voice reference files (.wav + .txt)
├── models_fast_cache/       # Persistent storage for XTTS
├── models_gpt_sovits/       # Persistent storage for SoVITS
├── models_maya1/            # Persistent storage for Maya1
└── Output/                  # Your generated audiobooks

🚦 Getting Started

1. Prerequisites

Docker Desktop with CUDA support enabled.
An NVIDIA GPU (16GB+ VRAM recommended for Maya1).
(Optional) Ollama installed locally or running via the provided container.

2. Deployment

docker compose up -d --build

Note: The first start will take ~10 minutes to download and compile model weights.

3. Usage

Access the Web UI at: http://localhost:9900

Select Book: Upload your .epub.
Choose Engine: Switch between Kokoro, XTTS, Maya1, or GPT-SoVITS.
Smart VRAM: Click "⚡ Activate Engine" to free up GPU memory.
Start Conversion: Watch the logs stream in real-time.

⚙️ Advanced Configuration

Smart VRAM Logic

The system uses a direct link to the Docker Socket (/var/run/docker.sock). When you select a new engine, the UI sends a signal to stop competing containers, ensuring that 100% of your VRAM is available for the current task.

AI Title Generation

If your book is in English and has generic titles (like "Chapter 1"), the system will send the first 3000 characters to Ollama to generate a proper title. This is handled by an uncensored prompt to ensure it works even with adult or sensitive content.

🔧 Maintenance

Clear VRAM: If something gets stuck, use the "🧹 Panic Clean VRAM" button in the UI. Update Code: Since everything is local, just edit the files in the _src folders and restart the container.

📜 License

This project is for personal and educational use. Please respect the licenses of the individual TTS engines (RVC, Maya Research, Coqui, etc.).

🎙️ Epub to Audiobook Converter - Pro Modular Edition

🚀 Key Features

🎙️ Epub to Audiobook Converter - Pro Modular Edition

🚀 Key Features

🛠️ Tech Stack

📦 Directory Structure

🚦 Getting Started

1. Prerequisites

2. Deployment

3. Usage

⚙️ Advanced Configuration

Smart VRAM Logic

AI Title Generation

🔧 Maintenance

📜 License

Similar Posts