Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorKeller, G.K.
dc.contributor.authorPrasetya, Naraenda
dc.date.accessioned2022-08-17T23:00:32Z
dc.date.available2022-08-17T23:00:32Z
dc.date.issued2022
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/42329
dc.description.abstractEfficient cache utilization is critical in programs with high data throughput. Improving performance in this area often requires niche knowledge of computer architecture, extensive benchmarking, and algorithms that do more than intuively required. By changing the order in which tasks are executed, the order in which memory gets accessed gets changed. This way, we can manipulate how caches get used. This thesis proposes a column iterator which reschedules a 2D workload. We show that performance can be increased compared to the naive method by implementing the proposed method in C++ with CUDA and as an extension to the data parallel DSL Accelerate.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectThis thesis shows that the cache efficiency for stencil and matrix multiplication can be improved compared to the naive implementation and the more common tiling approach, by rescheduling via index mapping. While the main focus lies in improving the performance on GPUs, the techniques presented can also be applied on a CPU.
dc.titleImproving Cache Performance in Structured GPGPU Workloads via Specialized Thread Schedules
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsGPU, GPGPU, parallel computing, cache, optimization, scheduling, multi-threading
dc.subject.courseuuComputing Science
dc.thesis.id8765


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record