Parallelization of the cost-distance algorithm
Summary
Cost distance tool are embedded in various geographic information systems (GIS) giving insights into
spatial relationships. Most GIS software use a serial cost distance algorithm. Cost distance calculations
have a strong sequential nature due to the order we access cells in the raster. This limits the development
of a parallel algorithm, hindering the usage of multiple processes for faster computation. This paper
proposes a parallelization framework, accounting for the sequential nature while running in parallel.
By dynamically distributing partitions to different processes, we ensure full workload distribution and
minimising idle times for processes. Relative strong and weak scaling efficiencies drop below 80% when
run with more than 3 and 2 workers respectively. We notice that this is partly caused by the fact that the
size of the input data and the amount of work a worker has to perform, do not scale linearly. When scaling
to more workers, we expect to run into a performance bottleneck caused by input output operations of the
root node. Recommendations are made for future research to limit the amount of input output operations
by statically assigning partitions to workers.