In today’s digital workplace, the ability to quickly scan, process, and understand documents is crucial. Whether you’re digitizing invoices, processing legal documents, or managing medical records, having an efficient document workflow can save hours of manual work. In this comprehensive tutorial, you’ll learn how to build a modern web-based document scanner that not only scans and extracts text but also uses AI to automatically summarize documents.
Demo Video: Web Document Scanner with OCR and AI Summarization
Prerequisites
- 30-day free trial license
- OCR requires installing the OCR add-on (Windows Only). Download
DynamicWebTWAINOCRResources.zipfrom [Dynamsoft’s website](…
In today’s digital workplace, the ability to quickly scan, process, and understand documents is crucial. Whether you’re digitizing invoices, processing legal documents, or managing medical records, having an efficient document workflow can save hours of manual work. In this comprehensive tutorial, you’ll learn how to build a modern web-based document scanner that not only scans and extracts text but also uses AI to automatically summarize documents.
Demo Video: Web Document Scanner with OCR and AI Summarization
Prerequisites
- 30-day free trial license
- OCR requires installing the OCR add-on (Windows Only). Download
DynamicWebTWAINOCRResources.zipfrom Dynamsoft’s website and run the installer as administrator.
Technology Stack Overview
Our intelligent document scanner combines three powerful technologies:
1. Dynamic Web TWAIN - Professional Document Scanning
Dynamic Web TWAIN is a JavaScript SDK that enables direct scanner access from web browsers. It provides:
- Direct Scanner Control - Access TWAIN/ICA/SANE scanners from the browser
- Advanced Image Processing - Auto-deskew, noise reduction, border detection
- OCR Integration - Built-in text extraction in multiple languages
- Document Editing - Crop, rotate, adjust brightness/contrast
- Multiple Output Formats - PDF, JPEG, PNG, TIFF, and more
2. Chrome’s Gemini Nano - Built-in AI
Google’s Gemini Nano is a lightweight AI model that runs entirely in the browser:
- Privacy-First - All processing happens locally, no data sent to servers
- Zero Cost - Free to use, no API keys or quotas
- Fast Performance - No network latency
- Offline Capable - Works without internet connection
- Multiple APIs - Summarization, translation, question-answering, and more
3. simple-dynamsoft-mcp - Development Accelerator
The simple-dynamsoft-mcp is a Model Context Protocol (MCP) server that provides:
- Quick Code Snippets - Get Dynamic Web TWAIN code examples instantly
- API Documentation - Access SDK documentation through your AI assistant
- Best Practices - Get guidance on common implementation patterns
- Faster Development - Reduce time spent searching documentation
Step-by-Step Development Tutorial
Step 1: Set Up Your Development Environment
First, create your project structure:
mkdir web-document-scanner
cd web-document-scanner
Create the following files:
index.html- Main application UIcss/style.css- Custom stylesjs/app.js- Application logic
Step 2: Install Dynamic Web TWAIN
Add the Dynamic Web TWAIN SDK to your HTML:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Document Scanner App</title>
<!-- Bootstrap CSS -->
<link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css" rel="stylesheet">
<!-- Font Awesome -->
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css">
<!-- Dynamic Web TWAIN SDK -->
<script src="https://cdn.jsdelivr.net/npm/dwt@latest/dist/dynamsoft.webtwain.min.js"></script>
<!-- Custom CSS -->
<link rel="stylesheet" href="css/style.css">
</head>
<body>
<!-- Your UI will go here -->
<script src="js/app.js"></script>
</body>
</html>
Step 3: Initialize Dynamic Web TWAIN
In your js/app.js, initialize the SDK:
// Configure Dynamic Web TWAIN
Dynamsoft.DWT.ProductKey = "YOUR_LICENSE_KEY_HERE"; // Get free trial at dynamsoft.com
Dynamsoft.DWT.ResourcesPath = 'https://cdn.jsdelivr.net/npm/dwt@latest/dist';
Dynamsoft.DWT.AutoLoad = false;
Dynamsoft.DWT.UseDefaultViewer = false;
let DWTObject;
// Initialize SDK
Dynamsoft.DWT.CreateDWTObjectEx({
WebTwainId: 'dwtId',
}, function (object) {
DWTObject = object;
console.log('Dynamic Web TWAIN initialized successfully');
loadScanners();
}, function (error) {
console.error('DWT initialization failed:', error);
alert('Failed to initialize scanner. Please refresh and try again.');
});
Step 4: Load Available Scanners
Detect and list available scanners:
function loadScanners() {
const sourceSelect = document.getElementById('sourceSelect');
DWTObject.GetDevicesAsync().then(function (devices) {
sourceSelect.innerHTML = '';
if (devices.length === 0) {
sourceSelect.innerHTML = '<option>No scanners found</option>';
return;
}
devices.forEach((device, index) => {
const option = document.createElement('option');
option.value = index;
option.textContent = device.displayName;
sourceSelect.appendChild(option);
});
console.log(`Found ${devices.length} scanner(s)`);
}).catch(function (error) {
console.error('Failed to load scanners:', error);
});
}
Step 5: Implement Document Scanning
Add the scanning functionality:
function acquireImage() {
const selectedIndex = document.getElementById('sourceSelect').value;
if (!deviceList[selectedIndex]) {
alert('Please select a scanner first.');
return;
}
DWTObject.SelectDeviceAsync(deviceList[selectedIndex])
.then(function () {
return DWTObject.AcquireImageAsync({
IfShowUI: false,
IfCloseSourceAfterAcquire: true
});
})
.then(function () {
console.log('Scan completed successfully');
updateImageDisplay();
})
.catch(function (error) {
alert('Scan failed: ' + error.message);
});
}
Step 6: Display Scanned Documents
Create a thumbnail view and large preview:
function updateImageDisplay() {
const thumbnailContainer = document.getElementById('thumbnailContainer');
thumbnailContainer.innerHTML = '';
if (DWTObject.HowManyImagesInBuffer === 0) {
thumbnailContainer.innerHTML = '<div class="empty-state">No documents yet</div>';
return;
}
// Create thumbnails for each image
for (let i = 0; i < DWTObject.HowManyImagesInBuffer; i++) {
const imgUrl = DWTObject.GetImageURL(i);
const thumbnail = document.createElement('div');
thumbnail.className = 'thumbnail-item';
thumbnail.onclick = () => selectImage(i);
const img = document.createElement('img');
img.src = imgUrl;
img.alt = `Document ${i + 1}`;
thumbnail.appendChild(img);
thumbnailContainer.appendChild(thumbnail);
}
}
function selectImage(index) {
selectedImageIndex = index;
const imgUrl = DWTObject.GetImageURL(index);
document.getElementById('largeImageDisplay').src = imgUrl;
document.getElementById('largeImageDisplay').style.display = 'block';
document.getElementById('ocrBtnLarge').disabled = false;
}
Step 7: Implement OCR Text Extraction
Add OCR functionality using Dynamic Web TWAIN’s OCR add-on:
async function performOCR(imageIndex) {
const modal = document.getElementById('ocrModal');
const ocrText = document.getElementById('ocrText');
const language = document.getElementById('ocrLanguage').value;
modal.classList.add('show');
ocrText.textContent = 'Processing OCR...';
try {
// Perform OCR
const result = await DWTObject.Addon.OCRKit.Recognize(imageIndex, {
settings: { language: language }
});
// Extract text from result
const extractedText = formatOCRResult(result);
ocrText.textContent = extractedText || 'No text found.';
} catch (error) {
console.error('OCR Error:', error);
ocrText.textContent = `OCR Error: ${error.message}`;
}
}
function formatOCRResult(result) {
let text = '';
if (result && result.blocks) {
result.blocks.forEach(block => {
if (block.lines) {
block.lines.forEach(line => {
if (line.words) {
line.words.forEach(word => {
text += word.value + ' ';
});
}
text += '\n';
});
}
text += '\n';
});
}
return text.trim();
}
Step 8: Integrate Chrome’s Gemini Nano for Summarization
Now for the exciting part - adding AI-powered summarization:
async function summarizeOCRText() {
const ocrText = document.getElementById('ocrText').textContent;
if (!ocrText || ocrText.startsWith('OCR Error')) {
alert('No valid text to summarize.');
return;
}
const summaryModal = document.getElementById('summaryModal');
const summaryText = document.getElementById('summaryText');
const progressBar = document.getElementById('summaryProgress');
summaryModal.classList.add('show');
summaryText.textContent = 'Preparing summary...';
try {
// Check if Summarizer API is supported
if (!('Summarizer' in self)) {
throw new Error('Summarizer API not supported. Please use Chrome 138+.');
}
// Check availability
const availability = await Summarizer.availability();
if (availability === 'unavailable') {
throw new Error('Summarizer API is not available.');
}
// Download model if needed (first time only)
summaryText.textContent = 'Loading AI model...';
const summarizer = await Summarizer.create({
type: 'tldr', // Options: 'tldr', 'key-points', 'headline'
length: 'medium', // Options: 'short', 'medium', 'long'
format: 'plain-text', // Options: 'plain-text', 'markdown'
monitor(m) {
// Track download progress
m.addEventListener('downloadprogress', (e) => {
const progress = Math.round(e.loaded * 100);
summaryText.textContent = `Loading AI model... ${progress}%`;
});
}
});
summaryText.textContent = 'Generating summary...';
// Generate summary
const summary = await summarizer.summarize(ocrText, {
context: 'This is text extracted from a scanned document via OCR.'
});
summaryText.textContent = summary;
} catch (error) {
console.error('Summarization Error:', error);
summaryText.textContent = `Error: ${error.message}`;
}
}
Step 9: Add Save Functionality
Allow users to save scanned documents as PDF:
function saveImages() {
if (DWTObject.HowManyImagesInBuffer === 0) {
alert('No images to save.');
return;
}
DWTObject.IfShowFileDialog = true;
DWTObject.SaveAllAsPDF(
'scanned_documents.pdf',
function () {
alert('Documents saved successfully!');
},
function (errorCode, errorString) {
alert('Failed to save: ' + errorString);
}
);
}
Step 10: Build the User Interface
Create a clean, modern interface in index.html:
<div class="container scanner-container">
<div class="scanner-header">
<h1><i class="fas fa-scanner"></i> Document Scanner</h1>
<p class="lead">Scan, extract, and summarize documents with AI</p>
</div>
<div class="card scanner-card">
<!-- Scanner Controls -->
<div class="scanner-controls">
<div class="row">
<div class="col-md-6">
<label for="sourceSelect">Select Scanner:</label>
<select class="form-select" id="sourceSelect">
<option>Loading scanners...</option>
</select>
</div>
<div class="col-md-6 text-center">
<button id="scanBtn" class="btn btn-primary btn-lg">
<i class="fas fa-play-circle"></i> Start Scanning
</button>
</div>
</div>
</div>
<!-- Document Viewer -->
<div class="viewer-container">
<div class="document-viewer">
<!-- Thumbnail Panel -->
<div class="thumbnail-panel">
<h6>Documents</h6>
<div id="thumbnailContainer"></div>
</div>
<!-- Large Image Panel -->
<div class="large-image-panel">
<h6>Document View</h6>
<div id="largeImageContainer">
<img id="largeImageDisplay" alt="Selected document">
</div>
<div class="controls">
<button id="ocrBtnLarge">
<i class="fas fa-search"></i> OCR
</button>
<button id="editBtnLarge">
<i class="fas fa-pen"></i> Edit
</button>
</div>
</div>
</div>
</div>
</div>
</div>
<!-- OCR Modal -->
<div id="ocrModal" class="modal">
<div class="modal-content">
<div class="modal-header">
<h4>OCR Result</h4>
<button class="close" onclick="closeOCRModal()">×</button>
</div>
<div class="modal-body">
<div id="ocrText">Processing OCR...</div>
</div>
<div class="modal-footer">
<select id="ocrLanguage">
<option value="en">English</option>
<option value="fr">French</option>
<option value="es">Spanish</option>
<option value="de">German</option>
</select>
<button onclick="copyOCRText()">Copy Text</button>
<button onclick="summarizeOCRText()">Summarize</button>
</div>
</div>
</div>
<!-- Summary Modal -->
<div id="summaryModal" class="modal">
<div class="modal-content">
<div class="modal-header">
<h4>Document Summary</h4>
<button class="close" onclick="closeSummaryModal()">×</button>
</div>
<div class="modal-body">
<div id="summaryProgress" style="display:none;">
<div class="progress-fill"></div>
</div>
<div id="summaryText">Preparing summary...</div>
</div>
<div class="modal-footer">
<button onclick="copySummaryText()">Copy Summary</button>
</div>
</div>
</div>
Using simple-dynamsoft-mcp for Faster Development
To accelerate development with Dynamic Web TWAIN, use the simple-dynamsoft-mcp server with your AI assistant:
Installation
npm install -g simple-dynamsoft-mcp
Configuration for VS Code
Add to your MCP settings:
{
"servers": {
"dynamsoft": {
"command": "npx",
"args": ["-y", "simple-dynamsoft-mcp"]
}
}
}
Benefits of Using MCP
Once configured, you can ask your AI assistant:
- "Show me how to scan documents with Dynamic Web TWAIN"
- "How do I perform OCR on a scanned image?"
- "What’s the code to save images as PDF?"
- "How to load images from local files?"
The MCP server provides instant code examples and API documentation, dramatically reducing development time.
Enabling Gemini Nano in Chrome
To use AI summarization, enable Gemini Nano:
- Update Chrome to version 138 or later
- Enable flags:
- Go to
chrome://flags/#optimization-guide-on-device-model - Set to "Enabled BypassPerfRequirement"
- Go to
chrome://flags/#summarization-api-for-gemini-nano - Set to "Enabled"
- Restart Chrome
- Download model:
- Visit
chrome://components/ - Find "Optimization Guide On Device Model"
- Click "Check for update"
Testing Your Application
1. Start a Local Server
# Using Python
python -m http.server 8000
# Using Node.js
npx http-server -p 8000
2. Open in Browser
Navigate to http://localhost:8000
3. Test Workflow
- Select Scanner - Choose from the dropdown
- Scan Document - Click "Start Scanning"
- View Document - Click thumbnail to preview
- Extract Text - Click "OCR" button
- Summarize - Click "Summarize" in OCR modal
- Save - Click "Save All" to export as PDF
Source Code
https://github.com/yushulx/web-twain-document-scan-management/tree/main/examples/scan_ocr_summarize