Finding dense and well-shaped convex clusters in 2-dimensional point sets
Summary
We address the problem of computing a convex region with bounded area and diameter, enclosing the maximum number of points from a planar point set P.
This problem originated from previous work on an automatic pipeline for computing the mitotic count (MC) from histological images. The MC counts mitotic cells within a specific region and serves as a key indicator for determining cancer patient risk levels and treatment regimes. In this context, mitotic cells appear as points in a plane, and we need to identify the optimal mitotic hotspot region - a convex region that is not too elongated (bounded diameter), has a bounded area and contains the maximum number of points.
We developed a novel dynamic programming algorithm, the Area-Diameter (AD) algorithm, which solves this problem in O(n^6k) time and O(n^3k) space, where n is the size of P and k is the maximum number of enclosed points. To the best of our knowledge, this is the first polynomial time and exact algorithm to solve this problem.
We implemented the AD algorithm and an existing Area-only (A) algorithm that performs the same task without the diameter constraint in O(n^3k) time and O(n^2k) space. We experimentally compared their performance and solutions for three types of point sets: uniformly distributed sets with increasing size, normally distributed sets with increasing standard deviation and real-world medical data. For the AD algorithm, we test different diameters.
Our results show that despite the higher worst-case complexity of the AD algorithm, the bound on the diameter enables significant pruning that often makes our algorithm practically faster than the A algorithm for moderately dense point sets. The performance of the A algorithm does not change with different point set distributions, but the solutions tend to be too elongated and beyond acceptable limits for medical applications. Finally, we show that our algorithm can process real-world medical datasets in a reasonable time, delivering exact solutions that are better - in terms of the number of enclosed points - than those found by existing methods.