Compressing Neural Networks on Limited Computing Resources

Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Network compression is a crucial technique for applying deep learning models to edge or mobile devices. However, the cost of achieving higher benchmark performance through compression is continuously increasing, making network compression a significant burden-especially for small industries focused on developing compact models. Specifically, existing network compression techniques often require extensive computational resources, rendering them impractical for edge devices and small-scale applications. To democratize network compression, we propose a general-purpose framework that combines novel filter pruning and knowledge distillation techniques. First, unlike conventional filter pruning methods based on static heuristics and costly neural architecture search (NAS)-based approaches, our method leverages meta-learning for rapid and fine examination of the importance of each gate. This enables rapid and stable sub-network discovery, significantly improving the pruning process. Second, to minimize the computational cost of knowledge distillation, we introduce a synthetic teacher assistant that leverages precomputed fixed knowledge-referring to the stored feature maps/logits of the teacher network. By leveraging fixed knowledge, we mitigate the cost incurred by the teacher network and facilitate the transmission of fixed knowledge to the student via synthetic teacher assistants, thereby preventing distribution collapse. Our proposed framework dramatically reduces the compression overhead while maintaining high accuracy, achieving a 55.2% reduction in FLOPs of ResNet-50 trained on ImageNet while preserving 76.2% top-1 accuracy with only 199 GPU hours-significantly lower than previous state-of-the-art methods. Overall, our framework democratizes deep learning compression by offering a cost-effective and computationally feasible solution, enabling broader adoption in low-resource environments.

키워드

CostsKnowledge engineeringLogic gatesFiltering algorithmsGraphics processing unitsComputational modelingAccuracyImage codingDeep learningWorkstationsDeep neural network compressionfilter pruningknowledge transferefficient neural networkslightweight deep learning models
제목
Compressing Neural Networks on Limited Computing Resources
저자
Lee, SeunghyunLee, DongjunHyun, MinjuKim, HeejeSong, Byung Cheol
DOI
10.1109/ACCESS.2025.3567102
발행일
2025
유형
Article
저널명
IEEE Access
13
페이지
80063 ~ 80075