She's not happy about it, but I’ve been speaking to my computer more than my wife for the last month.
Voice + Agents have completely changed how I use my machine. It started with coding, then it was research, and now I’m looking to change how I work entirely.
I’m using a local speech-to-text (STT) model that transcribes what I’m saying. It’s blazing fast. Text is then sent to one of my apps in rotation (Cursor, Gemini, Claude Code or Claude.ai). Quality of the model matters; things just started clicking after the Sonnet 4.5 release and Opus took this to another level. With the tooling around these models in coding agents, I’m treating AI like an actual colleague instead of a chatbot.
Here’s some specific features that helped a lot. Plan mode Longer context Much bett...
She's not happy about it, but I’ve been speaking to my computer more than my wife for the last month.
Voice + Agents have completely changed how I use my machine. It started with coding, then it was research, and now I’m looking to change how I work entirely.
I’m using a local speech-to-text (STT) model that transcribes what I’m saying. It’s blazing fast. Text is then sent to one of my apps in rotation (Cursor, Gemini, Claude Code or Claude.ai). Quality of the model matters; things just started clicking after the Sonnet 4.5 release and Opus took this to another level. With the tooling around these models in coding agents, I’m treating AI like an actual colleague instead of a chatbot.
Here’s some specific features that helped a lot. Plan mode Longer context Much better tools Interleaved thinking
But the interface with the machine is changing. I can talk about 3x faster than typing (169 vs ~60 words per minute). I can provide three times more context in the same amount of time. Instead of being more precise with my words, I just speak more words. It’s a completely different way of working. These models are so good that they just seem to pull out my intent consistently, even when it’s just implied. When it doesn't work, I just try again.
Looking back to when I was a sales engineer: preparing notes for a sales call, updating the CRM, preparing follow-up emails. With the right scaffolding and personalization, the agents today can do 90% of these tasks. I can see a future where white collar work will involve delegating tasks to agents using voice and review their outputs. When this happens, the bottleneck is how quickly you can think.
It’s hard to tell who will build the killer apps in this space. Are these a product or a feature? Either way, there’s only a handful of companies that are already building the rails. Those building voice models, voice agents, and the infrastructure to bring this to a reality are really interesting right now.
If I’ve got one prediction for 2026, it’s the year that voice goes mainstream.
Post-script: [1] There’s a huge market for incorporating these features into non-coding use-cases.
Things Voice AI will disrupt
- Personalized therapy
- Personalized coaching
- Fast food drive-throughs
- Initial recruiter screenings
- Automated phone call routing
- Learning languages and training accents
- Prioritizing and routing emergency services
- Pre-screening calls for doctor appointments
Things that I don’t want to change
- Writing. I like writing and I don’t want to lose it.
- Replacing computers with people. I love connecting with people
- Reading; narration helps comprehension but can be a worse experience online
- Thinking and decision making. When you move fast with a voice agent, you can lose too much