상세 보기
Fold-PIM: A Cost-Efficient LPDDR5-Based PIM for On-Device SLMs
- Jeun, Kyoungho;
- Kim, Hyeonu;
- Lee, Eojin
WEB OF SCIENCE
1SCOPUS
1초록
The increasing demand for on-device AI applications has shifted focus to Small Language Models (SLMs) optimized for mobile environments. However, the limited memory bandwidth of LPDDR5-based systems presents significant challenges for efficiently executing memory-bound matrix-vector multiplication operations, a core component of SLM inference. In this paper, we propose Fold-PIM, an LPDDR5-based Processing-in-Memory (PIM) architecture designed to address these challenges. Fold-PIM features a shared PU architecture that leverages subarray-level parallelism and employs key techniques with in-tile transposition, adaptive tiling, and a tailored protocol to reduce vector replacement latency. Our evaluation results demonstrate that Fold-PIM achieves up to 3.9x speedup of token generation time in SLM inference compared to the baseline system without PIM.
키워드
- 제목
- Fold-PIM: A Cost-Efficient LPDDR5-Based PIM for On-Device SLMs
- 저자
- Jeun, Kyoungho; Kim, Hyeonu; Lee, Eojin
- 발행일
- 2025-01
- 유형
- Article
- 권
- 24
- 호
- 1
- 페이지
- 185 ~ 188