상세 보기
QuantEdge: A Hybrid Quantization Approach for Optimized AI Deployment Across Edge Devices
- Mahmudov, Rasim;
- Kim, Deok-Hwan
WEB OF SCIENCE
3SCOPUS
4초록
Deploying artificial intelligence (AI) models on edge devices introduces significant challenges due to limited computational resources, strict latency requirements, and energy constraints. These limitations hinder the performance of traditional deep learning models in real-time applications. This study addresses the pressing problem of optimizing AI inference for heterogeneous and resource-constrained edge environments by introducing QuantEdge, a hybrid quantization approach that combines post-training quantization (PTQ) and quantization-aware training (QAT). The proposed method dynamically adapts model precision and computational load based on device-specific constraints, making it suitable for a wide spectrum of hardware from low-power IoT nodes to advanced embedded systems. Experiments conducted on devices such as Jetson AGX Xavier, Asus Tinker Edge T, Raspberry Pi, and AGX clusters show that QuantEdge reduces inference latency by up to 31.8% while maintaining high accuracy. Additionally, it significantly improves energy efficiency and memory usage. The research is motivated by the growing demand for efficient on-device AI in real-world domains such as autonomous vehicles, mobile health diagnostics, smart surveillance, and edge-enabled IoT. QuantEdge presents a robust solution to real-time AI deployment challenges by tailoring quantization dynamically to hardware capabilities, thus enhancing the practicality and scalability of edge AI systems.
키워드
- 제목
- QuantEdge: A Hybrid Quantization Approach for Optimized AI Deployment Across Edge Devices
- 저자
- Mahmudov, Rasim; Kim, Deok-Hwan
- 발행일
- 2025
- 유형
- Article
- 저널명
- IEEE Access
- 권
- 13
- 페이지
- 161605 ~ 161618