DPT-tracker: Dual pooling transformer for efficient visual tracking

Fang, Yang; Xie, Bailian; Khairuddin, Uswah; Min, Zijian; Jiang, Bingbing; Li, Weisheng

doi:10.1049/cit2.12296

상세 보기

DPT-tracker: Dual pooling transformer for efficient visual tracking

Fang, Yang;
Xie, Bailian;
Khairuddin, Uswah;
Min, Zijian;
Jiang, Bingbing;
외 1명

Citations

WEB OF SCIENCE

7

Citations

SCOPUS

8

초록

Transformer tracking always takes paired template and search images as encoder input and conduct feature extraction and target-search feature correlation by self and/or cross attention operations, thus the model complexity will grow quadratically with the number of input images. To alleviate the burden of this tracking paradigm and facilitate practical deployment of Transformer-based trackers, we propose a dual pooling transformer tracking framework, dubbed as DPT, which consists of three components: a simple yet efficient spatiotemporal attention model (SAM), a mutual correlation pooling Transformer (MCPT) and a multiscale aggregation pooling Transformer (MAPT). SAM is designed to gracefully aggregates temporal dynamics and spatial appearance information of multi-frame templates along space-time dimensions. MCPT aims to capture multi-scale pooled and correlated contextual features, which is followed by MAPT that aggregates multi-scale features into a unified feature representation for tracking prediction. DPT tracker achieves AUC score of 69.5 on LaSOT and precision score of 82.8 on TrackingNet while maintaining a shorter sequence length of attention tokens, fewer parameters and FLOPs compared to existing state-of-the-art (SOTA) Transformer tracking methods. Extensive experiments demonstrate that DPT tracker yields a strong real-time tracking baseline with a good trade-off between tracking performance and inference efficiency.

키워드

human-computer interfacing; image motion analysis; pattern recognition; signal processing; tracking

제목: DPT-tracker: Dual pooling transformer for efficient visual tracking

저자: Fang, Yang; Xie, Bailian; Khairuddin, Uswah; Min, Zijian; Jiang, Bingbing; Li, Weisheng

DOI: 10.1049/cit2.12296

발행일: 2024-08

유형: Article

저널명: CAAI Transactions on Intelligence Technology

권: 9

호: 4

페이지: 948 ~ 959