Knowledge Transfer via Decomposing Essential Information in Convolutional Neural Networks

Citations

WEB OF SCIENCE

9
Citations

SCOPUS

9

초록

Knowledge distillation (KD) from a "teacher" neural network and transfer of the knowledge to a small student network is done to improve the performance of the student network. This method is one of the most popular techniques to lighten convolutional neural networks (CNNs). Many KD algorithms have been proposed recently, but they still cannot properly distill essential knowledge of the teacher network, and the transfer tends to depend on the spatial shape of the teacher's feature map. To solve these problems, we propose a method to transfer knowledge independently of the spatial shape of the teacher's feature map, which is major information obtained by decomposing the feature map through singular value decomposition (SVD). In addition, we present a multitask learning method that enables the student to learn the teacher's knowledge effectively by adaptively adjusting the teacher's constraints to the student's learning speed. Experimental results show that the proposed method performs 2.37% better on the CIFAR100 data set and 2.89% better on the TinyImageNet data set than the state-of-the-art method. The source code is publicly available at https://github.com/sseung0703/KD_methods_with_TF.

키워드

Knowledge engineeringShapeLearning systemsNeural networksFeature extractionKnowledge transferTask analysisDeep neural network (DNN)knowledge transfermultitask learningsmaller network
제목
Knowledge Transfer via Decomposing Essential Information in Convolutional Neural Networks
저자
Lee, SeunghyunSong, Byung Cheol
DOI
10.1109/TNNLS.2020.3027837
발행일
2022-01
유형
Article
저널명
IEEE Transactions on Neural Networks and Learning Systems
33
1
페이지
366 ~ 377