Morphological Cell Classification under Weak Supervision: A Learning from Label Proportions Approach
Summary
Abstract
Classification of cells is an important field for biology and pathology research, and there
have been many effective models integrating it with machine learning techniques. In the
supervised learning scenario, annotations on cells are crucial for cell classification, and they
can be obtained by cell image analysis and biological staining. However, due to restriction
on cost, time efficiency and limitations of certain microscopes, obtaining annotations of
all individual cells for the training of fully supervised machine learning models is always
more challenging. For this reason, it’s highly necessary to investigate the feasibility of
applying machine learning model on classification of cells with their morphological features,
but only proportion labels for groups of cells are known. The models developed and tested
in this thesis are from Learning from Label Proportions (LLP), a sub-domain of weakly
supervised learning. LLP models approach data in bags instead of instances, aiming at
prediction on instance level labels but assuming only proportions of each class in each bag
are known. We developed several LLP based machine learning models specifically for tabular
data containing morphological features of cells. We tested and compared these models,
examining technique details and bringing future research directions. Our findings indicate
that LLP models can achieve competitive performance on morphological cell classification
with only proportion labels known, bringing values on further research in biology by reducing
the need for comprehensive cell labeling.