Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval andFiltering
paperium.net·13h·
Discuss: DEV
Flag this post

Overview: Advancing Knowledge-based Visual Question Answering with Wiki-PRF

This research introduces Wiki-PRF, a novel three-stage methodology designed to significantly enhance Knowledge-based Visual Question Answering (KB-VQA). The core challenge addressed is the struggle of existing Visual Language Models (VLMs) and Retrieval-Augmented Generation (RAG) systems with the quality of multimodal queries and the relevance of retrieved external knowledge. Wiki-PRF tackles this by integrating dynamic visual tool invocation, multimodal knowledge retrieval, and intelligent relevance filtering. The proposed framework, comprising Processing, Retrieval, and Filtering stages, leverages a Visual Language Model (VLM-PRF) trained with reinforcement learning to orchestrate tool …

Similar Posts

Loading similar posts...