A Robust Multidimensional Parallel Scan Algorithm for Multi-Core CPUs
Summary
Scan operations are commonly applied in a wide variety of parallel applications and programming languages. Existing scan algorithms, for the CPU architecture, are either limited to one-dimensional input sequences or have some edge-case input shapes in which they lose their performance. The adaptive chained scan algorithm transformed the original chained scan with decoupled look-back to efficiently work on multi-core CPUs. This algorithm outperforms existing three-phase scan algorithms and offers flexibility in the number of working threads and has zero overhead compared to the sequential scan during single-threaded executions.
We present the assisting column-wise chained scan algorithm, which offers the same properties as the adaptive chained scan algorithm and performs efficiently and similarly regardless of the shape of the multidimensional input sequence. The algorithm is implemented and evaluated in Rust, by comparing it with the existing multidimensional scan implementation of Accelerate and the onedimensional adaptive chained scan. Our results show that the assisting column-wise chained scan algorithm performs significantly better in edge-case scenarios at the cost of a slight performance loss in normal scenarios, which introduces a trade-off between robustness and maximal performance.