상세 보기
SPPT: Siamese Pyramid Pooling Transformer for Visual Object Tracking
- Fang, Yang;
- Xie, Bailian;
- Jiang, Bingbing;
- Ke, Xuhui;
- Li, Yan
WEB OF SCIENCE
8SCOPUS
0초록
Recently, visual transformer-based tracking has achieved significant success owing to its effective attention modeling strategies and global context feature extraction. However, most transformer trackers are based on the canonical Siamese and correlation-based tracking paradigm, which comprises three stages: feature extraction, feature fusion, and similarity function learning. This paradigm is speculated to weaken the cross-correlation between the template and search features while increasing the computational cost of the tracking model. Hence, we propose a Siamese pyramid pooling transformer (SPPT) to implement a one-stream end-to-end visual object tracking framework with two newly proposed modules: an iterative pooling attention-based feature extraction and correlation (P-FEC) module and an iterative enhanced correlation block (ECB). The P-FEC module can simultaneously perform feature extraction and correlation, whereas the ECB can enhance feature integration and target-aware feature embedding learning. The SPPT has a much shorter attention sequence length, fewer parameters, and fewer floating-point operations per second (FLOPs) than existing transformer-based trackers. Extensive experiments on the LaSOT, TrackingNet, and GOT-10k benchmarks demonstrate that our proposed SPPT tracker achieves state-of-the-art tracking performance in terms of precision and success scores, as compared with most convolutional neural network-based and transformer-based trackers.
키워드
- 제목
- SPPT: Siamese Pyramid Pooling Transformer for Visual Object Tracking
- 저자
- Fang, Yang; Xie, Bailian; Jiang, Bingbing; Ke, Xuhui; Li, Yan
- 발행일
- 2023-12-30
- 유형
- Article
- 권
- 13
- 페이지
- 1 ~ 17