Vissper OSS
Real-time speech-to-text transcription for macOS with AI-powered transcript polishing. Built with Rust and native macOS frameworks.
Overview
Vissper OSS is an open-source desktop application that captures microphone audio and transcribes it in real-time using Azure OpenAIβs Realtime API. It runs discretely in the macOS menu bar with a transparent overlay window for live transcription display.
Key highlights:
- Direct connection to Azure OpenAI (no intermediary servers)
- Native macOS UI via objc2 bindings to AppKit
- Real-time WebSocket streaming for instant transcription
- Secure credential storage in macOS Keychain
Features
Transcription
- Real-time speech-to-text via Azure OpenAI Realtime API (GPT-4o Transcribe)
- Multi-language support: Englβ¦
Vissper OSS
Real-time speech-to-text transcription for macOS with AI-powered transcript polishing. Built with Rust and native macOS frameworks.
Overview
Vissper OSS is an open-source desktop application that captures microphone audio and transcribes it in real-time using Azure OpenAIβs Realtime API. It runs discretely in the macOS menu bar with a transparent overlay window for live transcription display.
Key highlights:
- Direct connection to Azure OpenAI (no intermediary servers)
- Native macOS UI via objc2 bindings to AppKit
- Real-time WebSocket streaming for instant transcription
- Secure credential storage in macOS Keychain
Features
Transcription
- Real-time speech-to-text via Azure OpenAI Realtime API (GPT-4o Transcribe)
- Multi-language support: English, Norwegian, Danish, Finnish, German
- Live partial and final transcript display
- Automatic reconnection with retry logic
AI-Powered Polishing
- Basic Polish: Copyediting for grammar and readability
- Meeting Notes: Structured summaries with action items, decisions, and key points
- Preserves original language and meaning
User Interface
- Menu bar integration (NSStatusBar)
- Transparent overlay window that floats above other applications
- Multi-tab view: Raw transcript, Basic polish, Meeting notes
- Customizable transparency and appearance
Screenshot Integration
- Full-screen and region-based screenshot capture
- Screenshots embedded in transcripts as markdown images
- Timestamped filenames for organization
Export Options
- Copy to clipboard (automatic on stop)
- Save as Markdown files
- Export to PDF
Requirements
-
macOS 12.0+ (Monterey or later)
-
Azure OpenAI account with:
-
GPT-4o Transcribe deployment (for speech-to-text)
-
GPT-4o or similar deployment (for polishing)
-
Rust toolchain (latest stable) - Install Rust
Quick Start
1. Clone and Build
git clone https://github.com/vissper/vissper-oss.git
cd vissper-oss
cargo build --release
2. Run
cargo run --release
Or run the binary directly:
./target/release/vissper
3. Configure Azure OpenAI
- Click the Vissper icon in the menu bar
- Select Settings
- Enter your Azure OpenAI credentials:
- Endpoint URL:
https://your-resource.openai.azure.com - STT Deployment: Your GPT-4o Transcribe deployment name
- Polish Deployment: Your GPT-4o deployment name
- API Key: Your Azure OpenAI API key
- Click Save
Credentials are stored securely in the macOS Keychain.
4. Start Recording
- Click Start Recording in the menu bar, or
- Press Control + Space (global hotkey)
Keyboard Shortcuts
| Shortcut | Action |
|---|---|
| Control + Space | Start/Stop recording (raw transcript) |
| Control + Shift + 1 | Stop with basic polishing |
| Control + Shift + 2 | Stop with meeting notes |
| Control + Shift + 0 | Full-screen screenshot |
| Control + Shift + 9 | Region screenshot |
Project Structure
vissper-oss/
βββ src/
β βββ main.rs # Application entry point
β βββ audio/ # CoreAudio microphone capture
β β βββ mod.rs # Audio capture implementation
β β βββ types.rs # AudioChunk, AudioCaptureHandle
β β βββ resampler.rs # 16kHz resampling
β βββ menubar/ # macOS menu bar (NSStatusBar)
β β βββ mod.rs # Menu bar manager
β β βββ builder.rs # Menu item construction
β β βββ delegate.rs # Objective-C delegate
β β βββ state.rs # App state and callbacks
β β βββ updates/ # Dynamic menu updates
β βββ transcription_window/ # Transparent overlay window
β β βββ mod.rs # Window manager
β β βββ window.rs # Window creation/layout
β β βββ state.rs # Tab types, window state
β β βββ components/ # UI components
β β βββ api/ # Public window APIs
β βββ recording/ # Recording session management
β β βββ mod.rs # Start/stop logic
β β βββ transcription_task.rs # Background transcription
β β βββ polish.rs # Transcript polishing
β β βββ clipboard.rs # Clipboard operations
β βββ transcription/ # Azure OpenAI Realtime API
β β βββ mod.rs # TranscriptionClient
β β βββ azure_connection.rs # WebSocket management
β β βββ azure_messages.rs # Message serialization
β βββ azure_openai.rs # Azure OpenAI Chat API client
β βββ keychain.rs # macOS Keychain storage
β βββ settings_window/ # Settings UI
β βββ hotkeys.rs # Global keyboard shortcuts
β βββ screenshot.rs # Screenshot capture
β βββ storage.rs # Local file storage
β βββ preferences.rs # User preferences
βββ config.toml # Application configuration
βββ Cargo.toml # Rust dependencies
βββ README.md
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Vissper (Rust + objc2) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Menu Bar β Transcription Window β Settings β
β (NSStatusBar) β (NSWindow overlay) β (NSWindow) β
ββββββββββββββββββββββ΄ββββββββββββββββββββββββ΄βββββββββββββββββ€
β Recording Manager β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Audio Capture β Transcription Client β Polish API β
β (CoreAudio) β (WebSocket) β (REST) β
ββββββββββββββββββββββ΄ββββββββββββββββββββββββ΄βββββββββββββββββ€
β Azure OpenAI Services β
β Realtime API (STT) β Chat API (Polish) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Data Flow
- Audio Capture: CoreAudio captures microphone input, resampled to 16kHz mono PCM
- Streaming: Audio chunks sent via WebSocket to Azure OpenAI Realtime API
- Transcription: Partial and final transcripts received in real-time
- Display: Transcripts shown in transparent overlay window
- Polishing: On stop, optionally polish via Azure OpenAI Chat API
- Output: Copy to clipboard and save to local storage
Development
Build Commands
# Development build
cargo build
# Release build (optimized)
cargo build --release
# Run tests
cargo test
# Lint
cargo clippy --all-targets --all-features
# Format code
cargo fmt
Key Dependencies
| Category | Crate | Purpose |
|---|---|---|
| Async | tokio | Async runtime |
| HTTP | reqwest | REST API client |
| WebSocket | tokio-tungstenite | Realtime API |
| Audio | cpal, rubato | Capture and resampling |
| macOS UI | objc2, objc2-app-kit | Native AppKit bindings |
| Security | security-framework | Keychain integration |
| Hotkeys | global-hotkey | System-wide shortcuts |
Thread Safety
All AppKit operations must run on the main thread. The codebase uses MainThreadMarker to prove main thread execution:
if let Some(mtm) = MainThreadMarker::new() {
// On main thread - safe to call AppKit
Self::update_ui(mtm);
} else {
// Dispatch to main thread
dispatch::Queue::main().exec_async(|| {
if let Some(mtm) = MainThreadMarker::new() {
Self::update_ui(mtm);
}
});
}
Configuration
config.toml
[version_check]
url = "https://vissper.com/version.json"
enabled = true
User Preferences
Stored in ~/.config/Vissper/preferences.json:
- Language preference
- Transcript/screenshot storage paths
- Overlay transparency (0.3-1.0)
Azure Credentials
Stored securely in macOS Keychain under service com.vissper.desktop.
Azure OpenAI Setup
- Create an Azure OpenAI resource in the Azure Portal
- Deploy the required models:
- GPT-4o Transcribe - for real-time speech-to-text
- GPT-4o (or similar) - for transcript polishing
- Copy your API key from the Azure Portal
- Enter credentials in Vissper Settings
Security
- Azure credentials stored in macOS Keychain (encrypted)
- No data sent to Vissper servers (direct Azure connection)
- Audio and transcripts stored only locally
- Sensitive data cleared from memory using
zeroize - No API keys or credentials in code or logs
License
Dual licensed under MIT and Apache 2.0. See LICENSE-MIT and LICENSE-APACHE.
Contributing
Contributions welcome! Please read CONTRIBUTING.md before submitting pull requests.
Support
- GitHub Issues - Bug reports and feature requests
- Discussions - Questions and community support