subtitlesdog/Subtitles.Translate.Agent: A disruptive AI subtitle engine using Multi-Agent collaboration to deliver near-human level translation with unified style and terminology. 基于多智能体协同的颠覆性 AI 字幕引擎，实现风格统一、术语一致的准人工级翻译交付。

📖 Project Introduction

Disruptive Intelligent Subtitle Engine. Based on Multi-Agent Collaboration technology, it automates the entire process, delivering near-human level translation works.

✨ Highlights

Unified Style: Generates a full-film style guide (tone/style/addressing strategy) first, avoiding the "two different translators" feeling.
Consistent Terminology: Automatically generates and enforces a glossary, unifying names/places/proper nouns and pronoun gender (he/she).
Context Disambiguation: Uses a sliding window to reference previous translations + future previews, reducing ambiguous references and sentence segmentation errors.
Semantic Audit: Built-in review loop specifically checking for mistranslations/omissions/hall…

📖 Project Introduction

Disruptive Intelligent Subtitle Engine. Based on Multi-Agent Collaboration technology, it automates the entire process, delivering near-human level translation works.

✨ Highlights

Unified Style: Generates a full-film style guide (tone/style/addressing strategy) first, avoiding the "two different translators" feeling.
Consistent Terminology: Automatically generates and enforces a glossary, unifying names/places/proper nouns and pronoun gender (he/she).
Context Disambiguation: Uses a sliding window to reference previous translations + future previews, reducing ambiguous references and sentence segmentation errors.
Semantic Audit: Built-in review loop specifically checking for mistranslations/omissions/hallucinations; also adheres to enjambment protocols, not randomly completing half-sentences.
More Human-Like: Enjambment + multi-line sliding translation, making the tone more coherent and reducing "machine translation flavor".
Multi-Format Subtitles: Supports multiple common subtitle format inputs (auto-detection).
Token Saving: Compact subtitle format, reducing Token consumption and improving processing efficiency.

Quick Start

Environment Requirements

.NET SDK 10.0 (This project TargetFramework is net10.0)
Available LLM API Key and Endpoint (customizable in the startup entry)

Run

Execute in the project root directory:

cd src
dotnet restore
dotnet run --project .\Subtitles.Translate.Agent\Subtitles.Translate.Agent.csproj

Interactive Use

After startup, the program will prompt sequentially:

Enter local subtitle file path (supports drag and drop to terminal)
Enter target language (Enter defaults to: Simplified Chinese)
Enter API Key (prompts if not configured)

Output Files

After translation, it will generate in the same directory as the original subtitle: OriginalFileName.<targetLanguage>.srt
Defaults to writing monolingual translation (GenerateTranslatedSrt()); for bilingual subtitles, change the entry point to GenerateBilingualSrt()

Custom Models and Endpoints

Entry configuration is located at AgentSystemConfig initialization in Program.cs:L90-L105, where ModelId, Endpoint, ApiKey etc. can be modified.

// src/Subtitles.Translate.Agent/Program.cs
var systemConfig = new AgentSystemConfig();
systemConfig.AddDefaultConfig(new AgentConfig
{
ModelId = "gpt-oss-120b",
ApiKey = “apiKey”,
Endpoint = "https://<your-endpoint>/v1"
});

Modification example (using long-context models for Agent 1/2, lower cost model for translation):

var systemConfig = new AgentSystemConfig();
systemConfig.AddConfig(nameof(Step1_DirectorAgent), new AgentConfig
{
ModelId = "gemini-3-flash",
ApiKey = “apiKey”,
Endpoint = "https://<your-endpoint>/v1"
});
systemConfig.AddConfig(nameof(Step2_GlossaryAgent), new AgentConfig
{
ModelId = "gemini-3-flash",
ApiKey = “apiKey”,
Endpoint = "https://<your-endpoint>/v1"
});
systemConfig.AddConfig(nameof(Step3_TranslatorAgent), new AgentConfig
{
ModelId = "gpt-oss-120b",
ApiKey = “apiKey”,
Endpoint = "https://<your-endpoint>/v1"
});

Recommended Models (Cost/Context Trade-off)

Agent 1 (Director) / Agent 2 (Glossary): Recommend gemini-3-flash (longer context, lower cost, better for global scanning and terminology extraction)
Translation Stage (Translator): Can use lower cost models, e.g., gpt-oss-120b

1. Step1_DirectorAgent (Global Understanding / Style Guide)

What it does: Reads the full subtitle first, generating a "full-film guidebook" for subsequent translations to strictly follow, solving style drift and addressing confusion.
Input: Full subtitles (compact formatted), target language, and other request parameters.

2. Step2_GlossaryAgent (Terminology Extraction / Consistency Constraint)

What it does: Based on Step 1’s strategy and full subtitles, extracts key entities and builds a controlled glossary, ensuring "one noun, one translation".
Input: Step 1 Style Guide + Full Subtitles.
Output: Character table (including aliases and gender inference), location table, terminology table (including domain and definition), and provides a Markdown version directly embeddable in prompts.

3. Step3_TranslatorAgent (Sliding Window Translation / Strong Format Validation)

What it does: Translates subtitles in batches using a sliding window, referencing previous context and future previews to reduce segmentation, reference, and ambiguity errors.
Input: Step 1 Style Guide + Step 2 Glossary + Previous Translated Context + Current Batch + Future Preview.
Output: Line-by-line initial translation (forced to keep original IDs and quantity consistent); optionally triggers Step 4 for semantic audit before writing final draft.

4. Step4_ReviewerAgent [To Be Open Sourced] (Semantic Audit / Back-Translation Protocol)

What it does: Only performs "audit-level" corrections for semantic accuracy, specifically checking for mistranslations, omissions, and hallucinations; no polishing, no terminology beautification.
Input: Step 3 Initial Translation Batch.
Output: Line-by-line PASS/FIXED, error reason (critique), and final adopted translation (final_translation), strictly aligned with Step 3 quantity.

5. Step5_PolisherAgent [To Be Open Sourced] (Terminology Compliance + Flow Polishing)

What it does: Without breaking timeline cut points, first enforces terminology and pronoun corrections, then performs more native expression polishing and rhythm optimization.
Input: Step 2 Glossary + Step 1 Style Guide + Current Batch Translation + Previous Polished Result (for coherent transition).
Output: polished_text, optional note (slang/terminology explanation), optimization_tag (terminology correction/context polishing/style adaptation/no change).

6. Step6_TimingAdjusterAgent [To Be Open Sourced] (Reading Comfort Timeline Fine-tuning)

What it does: Automatically extends end_time based on translation length and next sentence start time to improve readability; only changes end time, does not touch start time, overlap not allowed.
Input: Translated text, original start/end, next sentence start (including 50ms safety buffer).
Output: KEEP/EXTEND, adjusted_end, reason, and applies adjustments back to subtitle object.

📅 Open Source Plan

February 2026: Open Step6_TimingAdjusterAgent
March 2026: Develop Windows / macOS / Web UI

🙏 Acknowledgements

This project uses the following excellent open-source projects:

Subtitle Edit (libse): Powerful subtitle editing and processing core library.
Microsoft Agents: Foundation framework for building intelligent Agents.
Mscc.GenerativeAI: Provides .NET support for Google Gemini models.

📖 Project Introduction

✨ Highlights

📖 Project Introduction

✨ Highlights

Quick Start

Environment Requirements

Run

Interactive Use

Output Files

Custom Models and Endpoints

Recommended Models (Cost/Context Trade-off)

1. Step1_DirectorAgent (Global Understanding / Style Guide)

2. Step2_GlossaryAgent (Terminology Extraction / Consistency Constraint)

3. Step3_TranslatorAgent (Sliding Window Translation / Strong Format Validation)

4. Step4_ReviewerAgent [To Be Open Sourced] (Semantic Audit / Back-Translation Protocol)

5. Step5_PolisherAgent [To Be Open Sourced] (Terminology Compliance + Flow Polishing)

6. Step6_TimingAdjusterAgent [To Be Open Sourced] (Reading Comfort Timeline Fine-tuning)

📅 Open Source Plan

🙏 Acknowledgements

Similar Posts