상세 보기
RecSal-Net: Recursive Saliency Network for video saliency prediction
- Woo, ChaeEun;
- Lee, SuMin;
- Park, Soo Min;
- Kim, Byung Hyung
WEB OF SCIENCE
0SCOPUS
0초록
Video saliency prediction, which emulates the selective visual processing mechanisms of the human visual system, has found widespread applications across various domains. However, existing models often struggle to effectively integrate spatiotemporal features, particularly due to challenges associated with multi-resolution feature processing. To address these limitations, we propose the Recursive Saliency Network (RecSal-Net), a model designed to enhance the performance of video saliency prediction. The model adopts the Video Swin Transformer as its backbone to extract rich spatiotemporal features. In addition, a Recursive Feature Pyramid structure is introduced to integrate multi-resolution features while minimizing information loss. To further improve feature representation, a top-down feature integration strategy is employed, transferring highlevel semantic features to lower-level feature maps. This is complemented by iterative upsampling operations, which enrich both highand low-resolution representations. Experimental results demonstrate that RecSal-Net outperforms state-of-the-art methods on the DHF1K, Hollywood-2, and UCF Sports datasets, achieving superior performance across key evaluation metrics such as AUC-J, CC, and NSS. These findings validate the model's effectiveness in capturing long-range spatiotemporal dependencies and integrating multi-resolution features. Overall, our work underscores the potential of recursive feature modeling to advance future video saliency prediction frameworks. The code is available at https://github.com/affctivai/RecSal-Net.
키워드
- 제목
- RecSal-Net: Recursive Saliency Network for video saliency prediction
- 저자
- Woo, ChaeEun; Lee, SuMin; Park, Soo Min; Kim, Byung Hyung
- 발행일
- 2025-10
- 유형
- Article
- 저널명
- Neurocomputing
- 권
- 650