DRAM ACT 및 PRE 지연 숨김을 통한 HBM-PIM의 처리량 최적화

Optimizing HBM-PIM Throughput through DRAM ACT and PRE Hiding

초록

Modern applications, such as Large Language Models (LLMs), increasingly demand high memory bandwidth, which is difficult to meet using conventional memory devices alone. The insufficiency in memory bandwidth causes significant performance bottlenecks, as excessive time is spent on data transfers between the host processor and memory. Processing in memory (PIM) architectures address this challenge by placing processing units (PUs) near memory banks, offloading tasks from the host and leveraging DRAM internal bandwidth. HBM-PIM is one of the PIM devices developed in practice, which features one PU per two banks and enables parallel operations across all PUs. In this paper, we conduct an in-depth analysis of HBM-PIM operations, taking into account the DRAM microarchitecture. A detailed examination of HBM-PIM's microarchitecture reveals that its characteristics are not fully exploited. Based on this insight, we propose optimization techniques that leverage the structural features of HBM-PIM and can be implemented without hardware modification. By modifying the order of instructions, adjusting data mapping, and loosening memory barriers, we minimize latency caused by DRAM row conflicts and improve the performance of HBM-PIM. Our optimizations yield average performance improvements of 1.15×, 1.43×, and 1.29× for GEMV, ADD/MUL, and ReLU operations, respectively.

키워드

프로세싱인메모리HBM-PIMDRAM 마이크로아키텍처성능 최적화processing in memoryHBM-PIMDRAM microarchitectureperformance optimization
제목
DRAM ACT 및 PRE 지연 숨김을 통한 HBM-PIM의 처리량 최적화
제목 (타언어)
Optimizing HBM-PIM Throughput through DRAM ACT and PRE Hiding
저자
김현우이어진
발행일
2025-07
유형
Y
저널명
정보과학회논문지
52
7
페이지
557 ~ 571