Improving Cache Performance in Structured GPGPU Workloads via Specialized Thread Schedules

Prasetya, Naraenda

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Keller, G.K.
dc.contributor.author	Prasetya, Naraenda
dc.date.accessioned	2022-08-17T23:00:32Z
dc.date.available	2022-08-17T23:00:32Z
dc.date.issued	2022
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/42329
dc.description.abstract	Efficient cache utilization is critical in programs with high data throughput. Improving performance in this area often requires niche knowledge of computer architecture, extensive benchmarking, and algorithms that do more than intuively required. By changing the order in which tasks are executed, the order in which memory gets accessed gets changed. This way, we can manipulate how caches get used. This thesis proposes a column iterator which reschedules a 2D workload. We show that performance can be increased compared to the naive method by implementing the proposed method in C++ with CUDA and as an extension to the data parallel DSL Accelerate.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	This thesis shows that the cache efficiency for stencil and matrix multiplication can be improved compared to the naive implementation and the more common tiling approach, by rescheduling via index mapping. While the main focus lies in improving the performance on GPUs, the techniques presented can also be applied on a CPU.
dc.title	Improving Cache Performance in Structured GPGPU Workloads via Specialized Thread Schedules
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	GPU, GPGPU, parallel computing, cache, optimization, scheduling, multi-threading
dc.subject.courseuu	Computing Science
dc.thesis.id	8765

Files in this item

Name:: thesis.pdf
Size:: 358.6Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record