A Semantic Transformer for Self-Supervised Deep Clustering of Large Hyperspectral Images (opens in new tab)

Local window-based vision transformers (ViTs) have recently attracted significant attention in hyperspectral image (HSI) analysis for their balance between global modeling and efficiency. However, window-based self-attention in unsupervised HSI clustering still suffers from two key limitations: 1) implicitly restricting semantic correlations to a local window, overlooking fragmented and noncontiguous yet semantically homogeneous regions; and 2) using fixed grid windows, causing boundary mixin...

Read the original article