QuantEdge: A Hybrid Quantization Approach for Optimized AI Deployment Across Edge Devices

Mahmudov, Rasim; Kim, Deok-Hwan

doi:10.1109/ACCESS.2025.3609798

상세 보기

QuantEdge: A Hybrid Quantization Approach for Optimized AI Deployment Across Edge Devices

Mahmudov, Rasim;
Kim, Deok-Hwan

Citations

WEB OF SCIENCE

3

Citations

SCOPUS

4

초록

Deploying artificial intelligence (AI) models on edge devices introduces significant challenges due to limited computational resources, strict latency requirements, and energy constraints. These limitations hinder the performance of traditional deep learning models in real-time applications. This study addresses the pressing problem of optimizing AI inference for heterogeneous and resource-constrained edge environments by introducing QuantEdge, a hybrid quantization approach that combines post-training quantization (PTQ) and quantization-aware training (QAT). The proposed method dynamically adapts model precision and computational load based on device-specific constraints, making it suitable for a wide spectrum of hardware from low-power IoT nodes to advanced embedded systems. Experiments conducted on devices such as Jetson AGX Xavier, Asus Tinker Edge T, Raspberry Pi, and AGX clusters show that QuantEdge reduces inference latency by up to 31.8% while maintaining high accuracy. Additionally, it significantly improves energy efficiency and memory usage. The research is motivated by the growing demand for efficient on-device AI in real-world domains such as autonomous vehicles, mobile health diagnostics, smart surveillance, and edge-enabled IoT. QuantEdge presents a robust solution to real-time AI deployment challenges by tailoring quantization dynamically to hardware capabilities, thus enhancing the practicality and scalability of edge AI systems.

키워드

Edge computing; hybrid quantization; neural processing unit (NPU); neural processing unit (NPU); latency optimization; latency optimization; QuantEdge; QuantEdge; on-device AI; on-device AI; on-device AI

제목: QuantEdge: A Hybrid Quantization Approach for Optimized AI Deployment Across Edge Devices

저자: Mahmudov, Rasim; Kim, Deok-Hwan

DOI: 10.1109/ACCESS.2025.3609798

발행일: 2025

유형: Article

저널명: IEEE Access

권: 13

페이지: 161605 ~ 161618