Announcing the fastest inference for realtime voice AI agents
together.ai·23h
Flag this post

Model Library

November 4, 2025

By

Rajas Bansal, Sahil Yadav, Garima Dhanania, Sri Yanamandra, Charles Zedlewski, Zain Hasan, Derek Petersen, Blaine Kasten, Sonny Khan, Rishabh Bhargava

Summary

  • Streaming Whisper speech-to-text (STT): Continuous transcription over WebSocket APIs optimized for voice agents
  • First serverless open-source text-to-speech (TTS): Orpheus (high-fidelity) and Kokoro (ultra-low latency) available through REST and WebSocket APIs without dedicated infrastructure
  • Voxtral transcription and speaker diarization: Premium multilingual transcription model and automatic speaker identification for batch processing

Voice interfaces are one of the hallmarks of a …

Similar Posts

Loading similar posts...