CLAY: CXL-based Scalable NDP Architecture Accelerating Embedding Layers

  • Yun, Sungmin
  • Nam, Hwayong
  • Kyung, Kwanhee
  • Park, Jaehyun
  • Kim, Byeongho
  • ... Lee, Eojin
  • 외 2명
Citations

WEB OF SCIENCE

6
Citations

SCOPUS

10

초록

An embedding layer is one of the most critical building blocks of deep neural networks, especially for recommender systems and graph neural networks. The embedding layer dominates a large portion of the total execution time due to its large memory requirements and little data reuse in operations. To accelerate the embedding layers, dual in-line memory module (DIMM) based neardata processing architectures have been proposed. They amplify bandwidth by adding a processing unit to the DIMM's buffer. However, prior architectures have less capacity scalability due to the limited number of memory channels. Crucially, they are limited in performance improvement due to the load imbalance problem and the limitations of DIMM-based memory systems with a multi-drop bus structure between the processing units and the host. In this paper, we propose CLAY, a CXL-based scalable near-data processing architecture that accelerates general embedding layers in DNN. Breaking away from conventional memory channel structures, CLAY interconnects the DRAM modules to reduce the data transfer overhead among DRAM modules. Furthermore, we devise a dedicated memory address mapping to mitigate load imbalance in CLAY and a packet duplication scheme that enables full utilization of CLAY by reducing the required instruction transmission band-width. We propose a method of scaling CLAY and a software stack to use CLAY. Compared to the state-of-the-art NDP architectures of FeaNMP and G-NMP, CLAY achieves an end-to-end speedup of up to 1.87x and 2.77x for recommender systems and graph neural networks, respectively.

키워드

Memory systemEmbedding layerNear-data processingDRAMCompute Express LinkMEMORY
제목
CLAY: CXL-based Scalable NDP Architecture Accelerating Embedding Layers
저자
Yun, SungminNam, HwayongKyung, KwanheePark, JaehyunKim, ByeonghoKwon, YongsukLee, EojinAhn, Jung Ho
DOI
10.1145/3650200.3656595
발행일
2024
유형
Proceedings Paper
저널명
PROCEEDINGS OF THE 38TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2024
페이지
338 ~ 351