speech-to-text, model, local, whisper, whisper.cpp, input, voice, recognition, ai
OpenM3D: Open Vocabulary Multi-view Indoor 3D Object Detection without Human Annotations
arxiv.org·2d
Knowing or Guessing? Robust Medical Visual Question Answering via Joint Consistency and Contrastive Learning
arxiv.org·3d
Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance
arxiv.org·1d
Loading...Loading more...