MuFFIN: Multifaceted Pronunciation Feedback Model with Interactive Hierarchical Neural Modeling
arxiv.orgยท2h
๐ŸŽ™๏ธWhisper
Speak, Edit, Repeat: High-Fidelity Voice Editing and Zero-Shot TTS with Cross-Attentive Mamba
arxiv.orgยท2h
๐ŸŽ™๏ธWhisper
Master Pronunciation with AI: How Tech Helps
future.forem.comยท4hยท
Discuss: DEV
๐ŸŽ™๏ธWhisper
[P] Looking to interview people whoโ€™ve worked on audio labeling for ML (PhD research project)
reddit.comยท1dยท
๐ŸŽตAudio ML
Show HN: My first finished audio plugin. Minimal Bloat, Under 1000 LOC
news.ycombinator.comยท1hยท
Discuss: Hacker News
๐Ÿ’ฟFLAC Archaeology
SpeechCT-CLIP: Distilling Text-Image Knowledge to Speech for Voice-Native Multimodal CT Analysis
arxiv.orgยท1d
๐ŸŽ™๏ธWhisper
Hume AI Octave 2: new text-to-speech model, 11+ languages
hume.aiยท3dยท
Discuss: Hacker News
๐ŸŽ™๏ธWhisper
Why do LLMs freak out over the seahorse emoji?
vgel.meยท1dยท
๐Ÿ—œ๏ธLZW Variants
Probing Whisper for Dysarthric Speech in Detection and Assessment
arxiv.orgยท2h
๐ŸŽ™๏ธWhisper
Paper2Video: Automatic Video Generation from Scientific Papers
arxiv.orgยท2h
๐Ÿ“ŠDocument Wavelets
A multi-platform GPU accelerated library for signal analysis using Apple MLX
byron-the-bulb.github.ioยท10hยท
Discuss: Hacker News
๐Ÿ“ŠSpectrograms
Mouse Sensors Can Pick Up Speech From Surface Vibrations, Researchers Show
it.slashdot.orgยท1d
๐Ÿ”Audio Forensics
Investigating LLM Variability in Personalized Conversational Information Retrieval
arxiv.orgยท2h
๐Ÿ‘คSearch Personalization
๐ŸŽ™๏ธ Building an AI-Powered Interview Analyzer on GCP
dev.toยท14hยท
Discuss: DEV
๐ŸŽ™๏ธWhisper
Show HN: I Built a Transcription CLI Because Uploading 4GB Videos Was Killing Me
medium.comยท12hยท
Discuss: Hacker News
๐Ÿ’ฟFLAC Archaeology
LLM Optimization Notes: Memory, Compute and Inference Techniques
gaurigupta19.github.ioยท14hยท
Discuss: Hacker News
๐Ÿ’ปLocal LLMs
Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
arxiv.orgยท1d
๐ŸŽงVorbis Encoding
Certifiable Safe RLHF: Fixed-Penalty Constraint Optimization for Safer Language Models
arxiv.orgยท2h
๐Ÿ”—Parser Combinators