CLAY: CXL-based Scalable NDP Architecture Accelerating Embedding Layers

Yun, Sungmin; Nam, Hwayong; Kyung, Kwanhee; Park, Jaehyun; Kim, Byeongho; Kwon, Yongsuk; Lee, Eojin; Ahn, Jung Ho

doi:10.1145/3650200.3656595

상세 보기

CLAY: CXL-based Scalable NDP Architecture Accelerating Embedding Layers

Yun, Sungmin;
Nam, Hwayong;
Kyung, Kwanhee;
Park, Jaehyun;
Kim, Byeongho;
... Lee, Eojin;
외 2명

Citations

WEB OF SCIENCE

6

Citations

SCOPUS

10

초록

An embedding layer is one of the most critical building blocks of deep neural networks, especially for recommender systems and graph neural networks. The embedding layer dominates a large portion of the total execution time due to its large memory requirements and little data reuse in operations. To accelerate the embedding layers, dual in-line memory module (DIMM) based neardata processing architectures have been proposed. They amplify bandwidth by adding a processing unit to the DIMM's buffer. However, prior architectures have less capacity scalability due to the limited number of memory channels. Crucially, they are limited in performance improvement due to the load imbalance problem and the limitations of DIMM-based memory systems with a multi-drop bus structure between the processing units and the host. In this paper, we propose CLAY, a CXL-based scalable near-data processing architecture that accelerates general embedding layers in DNN. Breaking away from conventional memory channel structures, CLAY interconnects the DRAM modules to reduce the data transfer overhead among DRAM modules. Furthermore, we devise a dedicated memory address mapping to mitigate load imbalance in CLAY and a packet duplication scheme that enables full utilization of CLAY by reducing the required instruction transmission band-width. We propose a method of scaling CLAY and a software stack to use CLAY. Compared to the state-of-the-art NDP architectures of FeaNMP and G-NMP, CLAY achieves an end-to-end speedup of up to 1.87x and 2.77x for recommender systems and graph neural networks, respectively.

키워드

Memory system; Embedding layer; Near-data processing; DRAM; Compute Express Link; MEMORY

제목: CLAY: CXL-based Scalable NDP Architecture Accelerating Embedding Layers

저자: Yun, Sungmin; Nam, Hwayong; Kyung, Kwanhee; Park, Jaehyun; Kim, Byeongho; Kwon, Yongsuk; Lee, Eojin; Ahn, Jung Ho

DOI: 10.1145/3650200.3656595

발행일: 2024

유형: Proceedings Paper

저널명: PROCEEDINGS OF THE 38TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2024

페이지: 338 ~ 351