Architecture Optimizations for Memory Systems of Throughput Processors

Gu, Yongbin

Graduate Thesis Or Dissertation

Architecture Optimizations for Memory Systems of Throughput Processors

Public Deposited

Télécharger le fichier PDF

Citeable URL: https://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/np193h506

Descriptions

Attribute Name	Values
Creator	Gu, Yongbin
Abstract	Throughput-oriented processors, such as graphics processing units (GPUs), have been increasingly used to accelerate general purpose computing, including machine learning models that are being utilized in numerous disciplines. Thousands of concurrently running threads in a GPU demand a highly efficient memory subsystem for data supply in GPUs. In this dissertation, we have studied the memory architecture of the traditional GPUs and revealed that the traditional memory architecture, initially designed for graphics processing, is less efficient in handling general purpose computing tasks. We propose several memory architecture optimizations for two primary objectives: (1) optimize current memory architecture for more efficient handling of general purpose computing tasks; (2) improve the overall performance of GPUs. This dissertation has four major parts: (1) The first part deals with the L2 cache inefficiency. A key factor that affects the memory subsystem is the order of memory accesses. While reordering memory accesses at L2 cache has large potential benefits to both cache and DRAM, little work has been conducted to exploit this. In this work, we investigate the largely unexplored opportunity of L2 cache access reordering. We propose Cache Access Reordering Tree (CART), a novel architecture that can improve memory subsystem efficiency by actively reordering memory accesses at L2 cache to be cache-friendly and DRAM-friendly. (2) The second part deals with miss handling architecture (MHA) in GPUs. Conventional MHA is static in sense that it provides a fixed number of MSHR entries to track primary misses, and a fixed number of slots within each entry to track secondary misses. This leads to severe entry or slot under-utilization and poor match to practical workloads, as the number of memory requests to different cache lines can vary significantly. We propose Dynamically Linked MSHR (DL-MSHR), a novel approach that dynamically forms MSHR entries from a pool of available slots. This approach can self-adapt to primary-miss-predominant applications by forming more entries with fewer slots, and self-adapt to secondary-miss-predominant applications by having fewer entries but more slots per entry. (3) The third part aims to improve the performance of Unified Virtual Memory (UVM), which is recently introduced into GPUs. We propose CAPTURE(Capacity-Aware Prefetch with True Usage Reflected Eviction), a novel microarchitecture scheme that implements coordinated prefetch-eviction for GPU UVM management. CAPTURE utilizes GPU memory status and memory access history to dynamically adjust the prefetching and ``capture'' accurate remaining page reusing opportunities for improved eviction. (4) In the fourth part, we propose a comprehensive UVM benchmark suite named UVMBench to facilitate future research on the UVM research.
License	All rights reserved
Resource Type	Dissertation
Date Issued	2020-12-02
Degree Level	Doctoral
Degree Name	Doctor of Philosophy (Ph.D.)
Degree Field	Electrical and Computer Engineering
Degree Grantor	Oregon State University
Commencement Year	2021
Advisor	Chen, Lizhong
Committee Member	Louis, Joseph Bose, Bella Lee, Ben Nguyen, Thinh
Academic Affiliation	Electrical Engineering and Computer Science
Déclaration de droits	In Copyright
Funding Statement (additional comments about funding)	NSF1619456 NSF1750047 NSF1619472 NSF1566637
Publisher	Oregon State University
Peer Reviewed	No
Language	English [eng]

Des relations

Parents:

This work has no parents.

Dans Collection:

Graduate Theses and Dissertations (GTD)

Articles

La vignette	Titre	Date de téléchargement	Visibilité	actes
	GuYongbin2020.pdf	2020-12-03	Public	Télécharger

Hyrax

Architecture Optimizations for Memory Systems of Throughput Processors

Contenu téléchargeable

Descriptions

Des relations

Articles