Computer vision based recognition of doctor's actions during medical consultations.
Summary
In the Netherlands, general practitioners have to prepare a report for each consultation and store this in the electronic medical record. This is time-consuming and automating the reporting procedure could solve this. However, recognising medical actions for the support of automatically storing patients information in the electronic medical record is limited, since there are no publicly available medical databases. Therefore, we present Video2Report, a database consisting of one-on-one medical consultations between a general practitioner and a patient. We construct a method that consists of selecting the most important medical actions and carefully recording and annotating the sessions. From the videos, we extract the skeleton positions by utilizing OpenPose. These skeleton positions are used to calculate useful mathematical information and use this to create feature sets. With these feature sets we will train and test three basic classifiers, i.e. a decision tree, random forest, and k-nearest neighbor classifiers. Our database consists of 192 sessions recorded with up to three cameras, accounting for a total of 451 videos, of which 332 consists of single actions and 119 consists of multiple action sequences. While Video2Report is too small for end-to-end deep learning, the results on the basic classifiers show promising results.