, /PRNewswire/ – aiOla, a voice-AI lab advancing speech recognition technology, is announcing today Drax, an open-source AI model that brings flow-matching-based generative methods to speech. By rethinking how speech models are trained, Drax captures the nuance of real-world audio without the delay that slows traditional systems. With its novel approach, Drax delivers accuracy on par with, or better than, leading models like OpenAI’s Whisper while achieving five times lower latency than other major speech systems, such as Alibaba’s Qwen2 – reaching up to 32× faster-than-real-time performance.
Modern speech recognition systems struggle to balance speed and accuracy. Models like OpenAI’s Whisper process speech token by token – producing strong results but taking t…
, /PRNewswire/ – aiOla, a voice-AI lab advancing speech recognition technology, is announcing today Drax, an open-source AI model that brings flow-matching-based generative methods to speech. By rethinking how speech models are trained, Drax captures the nuance of real-world audio without the delay that slows traditional systems. With its novel approach, Drax delivers accuracy on par with, or better than, leading models like OpenAI’s Whisper while achieving five times lower latency than other major speech systems, such as Alibaba’s Qwen2 – reaching up to 32× faster-than-real-time performance.
Modern speech recognition systems struggle to balance speed and accuracy. Models like OpenAI’s Whisper process speech token by token – producing strong results but taking too long for large-scale or long-form audio, such as hour-long meetings or customer calls. To improve speed, some researchers have shifted toward diffusion-based models that process multiple tokens simultaneously. These systems are faster but often less accurate, since they’re typically trained on clean, idealized data rather than the noisy and unpredictable speech found in real-world settings. As a result, transcription tools continue to fall short in enterprise environments like call centers, healthcare, and manufacturing, where conversations are lengthy and even minor errors can affect compliance, analytics, and customer experience.
Drax introduces a new way to train speech recognition systems that achieves both speed and accuracy. While most ASR models, like OpenAI’s Whisper, process speech sequentially, predicting one token at a time, Drax outputs the entire token sequence in parallel, capturing full context at once. This parallel, flow-based approach dramatically reduces latency and prevents the compounding errors that often occur in long transcriptions.
“Gone are the days of choosing between transcription accuracy or speed,” said Prof. Dr. Yossi Keshet, Chief Scientist at aiOla. “With Drax, we’ve achieved a real breakthrough in speech recognition technology that delivers both technical innovation and immediate real-world impact. And, by open-sourcing it, we hope to spark further discovery and collaboration from the community.”
Like image diffusion models that refine a picture from random noise, Drax learns to reconstruct speech from an initial noisy representation. During training, it moves through a three-step probability path: starting with meaningless noise, transitioning through a speech-like but imperfect middle state, and finally converging on the correct transcript. This middle stage exposes Drax to realistic, acoustically plausible errors, helping it stay robust to accents, background noise, and natural speech variability.
“There’s no room for error in critical applications of speech technology,” said Gil Hetz, VP of AI at aiOla. “That’s why Drax is such a breakthrough; it combines accuracy and speed without compromise. It handles real-world speech, no matter the background noise, accent, or jargon, making it not only technologically impressive but also the kind of reliability modern enterprises need.”
“Voice is the most natural and efficient way for data entry, and it will become the default way for human-machine interaction,” said Amir Haramty, President of aiOla. “Today, transcription often fails to keep up with the pace of real-world operations, from customer service to compliance. With Drax, we’re closing that gap and making voice technology actually practical at scale. That’s why advancing speech recognition is so important – it’s the future of the enterprise, and I’m incredibly proud to be part of aiOla, where we’re continuously pushing the boundaries of what’s possible.”
aiOla has published research detailing how this single-step training approach enables Drax to generate accurate transcriptions in just a few quick processing steps, dramatically reducing latency without sacrificing precision. In English benchmarks, Drax achieved an average word error rate (WER) of 7.4%, on par with Whisper-large-v3 at 7.6%, and outperformed both OpenAI’s Whisper and Alibaba’s Qwen2-Audio on select datasets, while running up to five times faster. Across Spanish, French, German, and Mandarin, Drax maintained comparable or better accuracy, demonstrating consistent performance across languages and environments.
Alongside the research paper, aiOla will release Drax under a permissive open-source license, making the model publicly available on GitHub and Hugging Face. The release includes three model sizes, from the lightweight Flash version to the full-scale base model, allowing others to run, test, and extend Drax for their own applications. By sharing the model openly, aiOla aims to establish a new benchmark for future ASR research and innovation.
About aiOla:
aiOla is a deep tech Voice AI lab redefining how enterprises interact with their systems by making voice the natural interface for executing workflows and capturing data. Built on patented foundation models, aiOla’s technology goes beyond standard speech-to-text, transforming manual processes by merging voice recognition with intelligent workflow agents to turn natural speech into real-time, structured data. This enables hands-free process automation across CRMs, ERPs, QMS platforms, and more, allowing enterprises in critical industries to embed voice-driven workflow agents directly into operations.
Supporting over 100 languages and accurately interpreting jargon, accents, abbreviations, and acronyms even in noisy environments, aiOla, backed by $58 million in funding from New Era, Hamilton Lane and United Airlines and led by a world-class professional and research team, is advancing voice as the primary interface for enterprise systems, modernizing workflows, and reinventing how data entry gets done.
Logo - https://mma.prnewswire.com/media/2811968/aiOla_Logo.jpg
Contact:
Ali Goldberg Concrete Media for aiOla [email protected]
SOURCE aiOla
