Building Reasonable Inference for Vision-Language Models in Blind Image Quality Assessment
arxiv.org·2d
📊Learned Metrics
Preview
Report Post

View PDF HTML (experimental)

Abstract:Recent progress in BIQA has been driven by VLMs, whose semantic reasoning abilities suggest that they might extract visual features, generate descriptive text, and infer quality in a human-like manner. However, these models often produce textual descriptions that contradict their final quality predictions, and the predicted scores can change unstably during inference - behaviors not aligned with human reasoning. To understand these issues, we analyze the factors that cause contradictory assessments and instability. We first estimate the relationship between the final quality predictions and the generated visual features, finding that the predictions are not fully grounde…

Similar Posts

Loading similar posts...