Neural TTS, Voice Cloning, Real-time Audio, Kitten TTS
Can Large Language Models (LLMs) Describe Pictures Like Children? A Comparative Corpus Study
arxiv.org·8h
I built an Amazon Echo killer with this ESP32-powered quad microphone array
xda-developers.com·5d
MOON: Generative MLLM-based Multimodal Representation Learning for E-commerce Product Understanding
arxiv.org·1d
CardAIc-Agents: A Multimodal Framework with Hierarchical Adaptation for Cardiac Care Support
arxiv.org·8h
MuSACo: Multimodal Subject-Specific Selection and Adaptation for Expression Recognition with Co-Training
arxiv.org·1d
Loading...Loading more...