Fold-PIM: A Cost-Efficient LPDDR5-Based PIM for On-Device SLMs

Citations

WEB OF SCIENCE

1
Citations

SCOPUS

1

초록

The increasing demand for on-device AI applications has shifted focus to Small Language Models (SLMs) optimized for mobile environments. However, the limited memory bandwidth of LPDDR5-based systems presents significant challenges for efficiently executing memory-bound matrix-vector multiplication operations, a core component of SLM inference. In this paper, we propose Fold-PIM, an LPDDR5-based Processing-in-Memory (PIM) architecture designed to address these challenges. Fold-PIM features a shared PU architecture that leverages subarray-level parallelism and employs key techniques with in-tile transposition, adaptive tiling, and a tailored protocol to reduce vector replacement latency. Our evaluation results demonstrate that Fold-PIM achieves up to 3.9x speedup of token generation time in SLM inference compared to the baseline system without PIM.

키워드

Random access memoryParallel processingBandwidthVectorsProtocolsCostsThroughputTransistorsMemory managementTrainingProcessing-in-memorySLMon-device AILPDDR5ARCHITECTURE
제목
Fold-PIM: A Cost-Efficient LPDDR5-Based PIM for On-Device SLMs
저자
Jeun, KyounghoKim, HyeonuLee, Eojin
DOI
10.1109/LCA.2025.3566692
발행일
2025-01
유형
Article
저널명
IEEE Computer Architecture Letters
24
1
페이지
185 ~ 188