MCASEN: Boosting Referring Remote Sensing Image Segmentation With Multimodal Semantic Guided Decoding and Recursive Multiscale Fusion (opens in new tab)

Referring remote sensing image segmentation (RRSIS) aims to segment the target described by textual descriptions in remote sensing (RS) images. Existing methods suffer from insufficient utilization of multimodal contextual information due to the lack of textual semantic guidance during the decoding stage, while their coarse multiscale feature fusion strategies lead to information loss and spatial structure distortion. To mitigate these limitations, we propose the multimodal context-aware sema...

Read the original article