SPPT: Siamese Pyramid Pooling Transformer for Visual Object Tracking

Fang, Yang; Xie, Bailian; Jiang, Bingbing; Ke, Xuhui; Li, Yan

doi:10.22967/HCIS.2023.13.059

상세 보기

SPPT: Siamese Pyramid Pooling Transformer for Visual Object Tracking

Fang, Yang;
Xie, Bailian;
Jiang, Bingbing;
Ke, Xuhui;
Li, Yan

Citations

WEB OF SCIENCE

8

Citations

SCOPUS

0

초록

Recently, visual transformer-based tracking has achieved significant success owing to its effective attention modeling strategies and global context feature extraction. However, most transformer trackers are based on the canonical Siamese and correlation-based tracking paradigm, which comprises three stages: feature extraction, feature fusion, and similarity function learning. This paradigm is speculated to weaken the cross-correlation between the template and search features while increasing the computational cost of the tracking model. Hence, we propose a Siamese pyramid pooling transformer (SPPT) to implement a one-stream end-to-end visual object tracking framework with two newly proposed modules: an iterative pooling attention-based feature extraction and correlation (P-FEC) module and an iterative enhanced correlation block (ECB). The P-FEC module can simultaneously perform feature extraction and correlation, whereas the ECB can enhance feature integration and target-aware feature embedding learning. The SPPT has a much shorter attention sequence length, fewer parameters, and fewer floating-point operations per second (FLOPs) than existing transformer-based trackers. Extensive experiments on the LaSOT, TrackingNet, and GOT-10k benchmarks demonstrate that our proposed SPPT tracker achieves state-of-the-art tracking performance in terms of precision and success scores, as compared with most convolutional neural network-based and transformer-based trackers.

키워드

Visual Transformer Tracking; Pyramid Pooling Attention; Feature Extraction and Correlation; Enhanced; Correlation Block

제목: SPPT: Siamese Pyramid Pooling Transformer for Visual Object Tracking

저자: Fang, Yang; Xie, Bailian; Jiang, Bingbing; Ke, Xuhui; Li, Yan

DOI: 10.22967/HCIS.2023.13.059

발행일: 2023-12-30

유형: Article

저널명: Human-centric Computing and Information Sciences

권: 13

페이지: 1 ~ 17