Fold-PIM: A Cost-Efficient LPDDR5-Based PIM for On-Device SLMs

Jeun, Kyoungho; Kim, Hyeonu; Lee, Eojin

doi:10.1109/LCA.2025.3566692

상세 보기

Fold-PIM: A Cost-Efficient LPDDR5-Based PIM for On-Device SLMs

Jeun, Kyoungho;
Kim, Hyeonu;
Lee, Eojin

Citations

WEB OF SCIENCE

1

Citations

SCOPUS

1

초록

The increasing demand for on-device AI applications has shifted focus to Small Language Models (SLMs) optimized for mobile environments. However, the limited memory bandwidth of LPDDR5-based systems presents significant challenges for efficiently executing memory-bound matrix-vector multiplication operations, a core component of SLM inference. In this paper, we propose Fold-PIM, an LPDDR5-based Processing-in-Memory (PIM) architecture designed to address these challenges. Fold-PIM features a shared PU architecture that leverages subarray-level parallelism and employs key techniques with in-tile transposition, adaptive tiling, and a tailored protocol to reduce vector replacement latency. Our evaluation results demonstrate that Fold-PIM achieves up to 3.9x speedup of token generation time in SLM inference compared to the baseline system without PIM.

키워드

Random access memory; Parallel processing; Bandwidth; Vectors; Protocols; Costs; Throughput; Transistors; Memory management; Training; Processing-in-memory; SLM; on-device AI; LPDDR5; ARCHITECTURE

제목: Fold-PIM: A Cost-Efficient LPDDR5-Based PIM for On-Device SLMs

저자: Jeun, Kyoungho; Kim, Hyeonu; Lee, Eojin

DOI: 10.1109/LCA.2025.3566692

발행일: 2025-01

유형: Article

저널명: IEEE Computer Architecture Letters

권: 24

호: 1

페이지: 185 ~ 188