Improving Vision Transformers to Learn Small-Size Dataset From Scratch

Citations

WEB OF SCIENCE

38
Citations

SCOPUS

51

초록

This paper proposes various techniques that help Vision Transformer (ViT) to learn small-size datasets from scratch successfully. ViT, which applied the transformer structure to the image classification task, has outperformed convolutional neural networks, recently. However, the high performance of ViT results from pre-training using large-size datasets, and its dependence on large datasets comes from low locality inductive bias. And conventional ViT cannot effectively attend the target class due to redundant attention caused by a rather high constant temperature factor. In order to improve the locality inductive bias of ViT, this paper proposes novel tokenization (Shifted Patch Tokenization: SPT) using shifted patches and a position encoding (CoordConv Position Encoding: CPE) using CoordConv. Also, to improve poor attention, we propose a new self-attention mechanism (Locality Self-Attention: LSA) based on learnable temperature and self-relation masking. SPT, CPE, and LSA are intuitive techniques, but they successfully improve the performance of ViT even on small-size datasets. We qualitatively show that each technique attends a more important area and contributes to having a flatter loss landscape. Moreover, the proposed techniques are generic add-on modules applicable to various ViT backbones. Our experiments show, when learning Tiny-ImageNet from scratch, the proposed scheme based on SPT, CPE, and LSA increases the accuracy of ViT backbones by +3.66 on average and up to +5.7. Also, the performance improvement of ViT backbones in ImageNet-1K classification, learning on COCO from scratch, and transfer learning on classification datasets verify that the generalization ability of the proposed method is excellent.

키워드

Vision transformerattention mechanismdata efficient learning
제목
Improving Vision Transformers to Learn Small-Size Dataset From Scratch
저자
Lee, SeunghoonLee, SeunghyunSong, Byung Cheol
DOI
10.1109/ACCESS.2022.3224044
발행일
2022
유형
Article
저널명
IEEE Access
10
페이지
123212 ~ 123224