Neural TTS, Voice Cloning, Real-time Audio, Kitten TTS
Can Large Language Models (LLMs) Describe Pictures Like Children? A Comparative Corpus Study
arxiv.org·14h
MOON: Generative MLLM-based Multimodal Representation Learning for E-commerce Product Understanding
arxiv.org·1d
Sycophancy under Pressure: Evaluating and Mitigating Sycophantic Bias via Adversarial Dialogues in Scientific QA
arxiv.org·14h
CardAIc-Agents: A Multimodal Framework with Hierarchical Adaptation for Cardiac Care Support
arxiv.org·14h
MuSACo: Multimodal Subject-Specific Selection and Adaptation for Expression Recognition with Co-Training
arxiv.org·1d
Loading...Loading more...