QuantEdge: A Hybrid Quantization Approach for Optimized AI Deployment Across Edge Devices

Citations

WEB OF SCIENCE

3
Citations

SCOPUS

4

초록

Deploying artificial intelligence (AI) models on edge devices introduces significant challenges due to limited computational resources, strict latency requirements, and energy constraints. These limitations hinder the performance of traditional deep learning models in real-time applications. This study addresses the pressing problem of optimizing AI inference for heterogeneous and resource-constrained edge environments by introducing QuantEdge, a hybrid quantization approach that combines post-training quantization (PTQ) and quantization-aware training (QAT). The proposed method dynamically adapts model precision and computational load based on device-specific constraints, making it suitable for a wide spectrum of hardware from low-power IoT nodes to advanced embedded systems. Experiments conducted on devices such as Jetson AGX Xavier, Asus Tinker Edge T, Raspberry Pi, and AGX clusters show that QuantEdge reduces inference latency by up to 31.8% while maintaining high accuracy. Additionally, it significantly improves energy efficiency and memory usage. The research is motivated by the growing demand for efficient on-device AI in real-world domains such as autonomous vehicles, mobile health diagnostics, smart surveillance, and edge-enabled IoT. QuantEdge presents a robust solution to real-time AI deployment challenges by tailoring quantization dynamically to hardware capabilities, thus enhancing the practicality and scalability of edge AI systems.

키워드

Edge computinghybrid quantizationneural processing unit (NPU)neural processing unit (NPU)latency optimizationlatency optimizationQuantEdgeQuantEdgeon-device AIon-device AIon-device AI
제목
QuantEdge: A Hybrid Quantization Approach for Optimized AI Deployment Across Edge Devices
저자
Mahmudov, RasimKim, Deok-Hwan
DOI
10.1109/ACCESS.2025.3609798
발행일
2025
유형
Article
저널명
IEEE Access
13
페이지
161605 ~ 161618