Extended Video2Report Database: Object Detection & Medical Action Recognition for Medical Consultations.
Summary
Many healthcare professionals are burdened with a large administrative load, and questions arise whether the current approach is sustainable, despite consensus that reporting leads to a better quality of healthcare. A portion of the administration can be automated by analyzing and documenting the events that take place during medical appointments.
Computer vision is used to detect medical actions by analyzing the poses of both patient and care provider, as well as the detection of medical objects. OpenPose (pose estimation) and Faster R-CNN Resnet 101 (object detection) are used and the output of both these models are processed and analyzed with machine learning models, Random Forest and Long-Short Term Memory. The Video2Report dataset containing videos of medical actions has been extended with various new actions and other action classes have been complemented with additional recordings. An image dataset containing medical objects was collected (2117 images) and annotated (2956 annotations). Our experiments with object detection models did not result in improvements, possibly caused by a scarcity of images resembling the actual usage scenario. The best performing model proved to be Random Forest with a cross-validated test score of 75.43%. LSTM models reached an accuracy of 63.08%.