HiFi-HARP: A High-Fidelity 7th-Order Ambisonic Room Impulse Response Dataset
arxiv.orgยท1d
๐Ÿ‘‚Psychoacoustics
Flag this post
Swift Sings: MDX-Net Vocal Splits and RVC Voice Conversion On-Device with ONNX/CoreML
web.navan.devยท1d
๐ŸŽงLearned Audio
Flag this post
Taming Text-to-Sounding Video Generation via Advanced Modality Condition andInteraction
dev.toยท1dยท
Discuss: DEV
๐Ÿง Neural Compression
Flag this post
Frequency-Spatial Interaction Driven Network for Low-Light Image Enhancement
arxiv.orgยท2h
๐Ÿ“ŠRate-Distortion Theory
Flag this post
Enhanced Spectral Decomposition for High-Dimensional Bio-Signal Classification
dev.toยท5hยท
Discuss: DEV
๐ŸŒˆSpectral Methods
Flag this post
Compositional Bias Control in Large Language Models: Preference Learning Fails, Supervision Succeeds
arxiv.orgยท2h
๐ŸŽ›๏ธFeed Filtering
Flag this post
LibriConvo: Simulating Conversations from Read Literature for ASR and Diarization
arxiv.orgยท2h
๐ŸŽตAudio ML
Flag this post
Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing
arxiv.orgยท2h
๐Ÿง Machine Learning
Flag this post
How to Apply Powerful AI Audio Models to Real-World Applications
towardsdatascience.comยท10h
๐ŸŽตAudio ML
Flag this post
The Limits of Data Scaling: Sub-token Utilization and Acoustic Saturation in Multilingual ASR
arxiv.orgยท2h
๐ŸŽ™๏ธWhisper
Flag this post
Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMS
arxiv.orgยท2h
๐ŸŽตAudio ML
Flag this post
Poisson Flow Consistency Training
arxiv.orgยท2h
๐Ÿง Machine Learning
Flag this post
Region-Adaptive Learned Hierarchical Encoding for 3D Gaussian Splatting Data
arxiv.orgยท2h
๐Ÿง Learned Codecs
Flag this post
Transformer Key-Value Memories Are Nearly as Interpretable as Sparse Autoencoders
arxiv.orgยท2h
๐Ÿง Machine Learning
Flag this post
FAME: Fairness-aware Attention-modulated Video Editing
arxiv.orgยท2h
๐Ÿง Learned Codecs
Flag this post
RatioWaveNet: A Learnable RDWT Front-End for Robust and Interpretable EEG Motor-Imagery Classification
arxiv.orgยท2h
๐Ÿ“ŠLearned Metrics
Flag this post
Foley Control: Aligning a Frozen Latent Text-to-Audio Model to Video
arxiv.orgยท1d
๐ŸŽงVorbis Encoding
Flag this post
OFFSIDE: Benchmarking Unlearning Misinformation in Multimodal Large Language Models
arxiv.orgยท2h
๐Ÿง Machine Learning
Flag this post