Automatic task parallelism in Accelerate
Summary
Parallel execution has a large potential to improve the speed of a program. To make good use of parallelism
we need a program that is properly split into several tasks that can be performed in parallel. High level
frameworks like Accelerate make it easier to create such programs. Accelerate creates programs using
data parallel array computations. For more parallelism we propose a way to automatically add task
parallelism to programs as part of the Accelerate compiler. We define a transformation to introduce forks
to a program. We formalize this transformation using inference rules and we implement it in the Accelerate
compiler. To reduce the overhead of the added task parallelism, we propose several optimizations to not
introduce forks where they do not introduce actual opportunities of added parallelism, like not forking
trivial statements and combining statements that directly wait on the result of the previous statement.
Our results show that the compilation of this implementation is not significantly slower than the current
Accelerate compiler.