Speak, Edit, Repeat: High-Fidelity Voice Editing and Zero-Shot TTS with Cross-Attentive Mamba
arxiv.org·4d
🎚️Voice AI Systems
A gentle introduction to Generative AI: Historical perspective
medium.com·4h·
Discuss: Hacker News
🏗️AI Infrastructure
Show HN: I built a video-to-text tool – 10 min free daily, no signup
harku.io·15h·
Discuss: Hacker News
🎚️Audio Codecs
🧠 Real-Time Smart Speech Assistant with Python, Whisper & LLMs
dev.to·8h·
Discuss: DEV
🎙️Whisper
Revisiting Karpathy's 'Unreasonable Effectiveness of Recurrent Neural Networks'
gilesthomas.com·4h·
Discuss: Hacker News
🧠Neuromorphic Hardware
Show HN: AI Voice AudioBook – Convert ebooks to audio with your cloned voice
zan.chat·16h·
Discuss: Hacker News
🎚️Voice AI Systems
Show HN: Nanowakeword – Automates custom wake word model training
github.com·17h·
Discuss: Hacker News
🎙️Whisper
AI receptionist that answers real phone calls
news.ycombinator.com·11h·
Discuss: Hacker News
🧠AI
Making Machines Sound Sarcastic: LLM-Enhanced and Retrieval-Guided Sarcastic Speech Synthesis
arxiv.org·2d
🎤Voice Interfaces
Towards a Typology of Strange LLM Chains-of-Thought
lesswrong.com·1d
💻Local LLMs
The key to conversational speech recognition
datasciencecentral.com·1d
🎤Voice Interfaces
Own your AI: Learn how to fine-tune Gemma 3 270M and run it on-device
developers.googleblog.com·2d·
Discuss: Hacker News
💻Local LLMs
Detecting and Mitigating Insertion Hallucination in Video-to-Audio Generation
arxiv.org·1d
🎤Voice Interfaces
I built a translator for spatial thinking (because I can't interview in Python)
graemefawcett.ca·10h·
Discuss: Hacker News
vibe-coding
MuFFIN: Multifaceted Pronunciation Feedback Model with Interactive Hierarchical Neural Modeling
arxiv.org·4d
🎙️Whisper
From RNNs to ChatGPT: The Paper That Changed How AI Thinks 🤖
dev.to·12h·
Discuss: DEV
🏗️AI Infrastructure
Prompt Engineering Templates That Work: 7 Copy-Paste Recipes for LLMs
kdnuggets.com·1d
🧩Low-code
How Google Translate & ChatGPT Work: The Transformer, Unboxed
dev.to·1d·
Discuss: DEV
🎙️Whisper