Abstract:Emotional responses during advertising video viewing are recognized as essential for understanding media effects because they have influenced attention, memory, and purchase intention. To establish a methodological basis for explainable emotion estimation without relying on external information such as physiological signals or subjective ratings, we have quantified "pleasantness," "surprise," and "habituation" solely from scene-level expression features of advertising videos, drawing on the free energy(FE) principle, which has provided a unified account of perception, learning, and behavior. In this framework, Kullback-Leibler divergence (KLD) has captured predi…
Abstract:Emotional responses during advertising video viewing are recognized as essential for understanding media effects because they have influenced attention, memory, and purchase intention. To establish a methodological basis for explainable emotion estimation without relying on external information such as physiological signals or subjective ratings, we have quantified "pleasantness," "surprise," and "habituation" solely from scene-level expression features of advertising videos, drawing on the free energy(FE) principle, which has provided a unified account of perception, learning, and behavior. In this framework, Kullback-Leibler divergence (KLD) has captured prediction error, Bayesian surprise (BS) has captured belief updates, and uncertainty (UN) has reflected prior ambiguity, and together they have formed the core components of FE. Using 1,059 15 s food video advertisements, the experiments have shown that KLD has reflected "pleasantness" associated with brand presentation, BS has captured "surprise" arising from informational complexity, and UN has reflected "surprise" driven by uncertainty in element types and spatial arrangements, as well as by the variability and quantity of presented elements. This study also identified three characteristic emotional patterns, namely uncertain stimulus, sustained high emotion, and momentary peak and decay, demonstrating the usefulness of the proposed method. Robustness across nine hyperparameter settings and generalization tests with six types of Japanese advertising videos (three genres and two durations) confirmed that these tendencies remained stable. This work can be extended by integrating a wider range of expression elements and validating the approach through subjective ratings, ultimately guiding the development of technologies that can support the creation of more engaging advertising videos.
| Comments: | This article has been accepted for publication in IEEE Access and will be published shortly |
| Subjects: | Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI) |
| Cite as: | arXiv:2601.00812 [cs.CV] |
| (or arXiv:2601.00812v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2601.00812 arXiv-issued DOI via DataCite |
Submission history
From: Takashi Ushio [view email] [v1] Mon, 22 Dec 2025 06:51:41 UTC (2,399 KB)