Semore: VLM-guided Enhanced Semantic Motion Representations for Visual Reinforcement Learning
arxiv.org·8h
🧮Vector Embeddings
Preview
Report Post

View PDF HTML (experimental)

Abstract:The growing exploration of Large Language Models (LLM) and Vision-Language Models (VLM) has opened avenues for enhancing the effectiveness of reinforcement learning (RL). However, existing LLM-based RL methods often focus on the guidance of control policy and encounter the challenge of limited representations of the backbone networks. To tackle this problem, we introduce Enhanced Semantic Motion Representations (Semore), a new VLM-based framework for visual RL, which can simultaneously extract semantic and motion representations through a dual-path backbone from the RGB flows. Semore utilizes VLM with common-sense knowledge to retrieve key information from observ…

Similar Posts

Loading similar posts...