SPPT: Siamese Pyramid Pooling Transformer for Visual Object Tracking

  • Fang, Yang
  • Xie, Bailian
  • Jiang, Bingbing
  • Ke, Xuhui
  • Li, Yan
Citations

WEB OF SCIENCE

8
Citations

SCOPUS

0

초록

Recently, visual transformer-based tracking has achieved significant success owing to its effective attention modeling strategies and global context feature extraction. However, most transformer trackers are based on the canonical Siamese and correlation-based tracking paradigm, which comprises three stages: feature extraction, feature fusion, and similarity function learning. This paradigm is speculated to weaken the cross-correlation between the template and search features while increasing the computational cost of the tracking model. Hence, we propose a Siamese pyramid pooling transformer (SPPT) to implement a one-stream end-to-end visual object tracking framework with two newly proposed modules: an iterative pooling attention-based feature extraction and correlation (P-FEC) module and an iterative enhanced correlation block (ECB). The P-FEC module can simultaneously perform feature extraction and correlation, whereas the ECB can enhance feature integration and target-aware feature embedding learning. The SPPT has a much shorter attention sequence length, fewer parameters, and fewer floating-point operations per second (FLOPs) than existing transformer-based trackers. Extensive experiments on the LaSOT, TrackingNet, and GOT-10k benchmarks demonstrate that our proposed SPPT tracker achieves state-of-the-art tracking performance in terms of precision and success scores, as compared with most convolutional neural network-based and transformer-based trackers.

키워드

Visual Transformer TrackingPyramid Pooling AttentionFeature Extraction and CorrelationEnhancedCorrelation Block
제목
SPPT: Siamese Pyramid Pooling Transformer for Visual Object Tracking
저자
Fang, YangXie, BailianJiang, BingbingKe, XuhuiLi, Yan
DOI
10.22967/HCIS.2023.13.059
발행일
2023-12-30
유형
Article
저널명
Human-centric Computing and Information Sciences
13
페이지
1 ~ 17