Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorSiebes, A.P.J.M.
dc.contributor.authorSternheim, A.M.
dc.date.accessioned2018-08-28T17:00:44Z
dc.date.available2018-08-28T17:00:44Z
dc.date.issued2018
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/30697
dc.description.abstractOften, patient organisations maintain an online community aimed at the patients they associate with. The Dutch breast cancer organisation (BVN) for example hosts the forum ‘de Amazones’, which is aimed at breast cancer patients. It is an important medium for patients and the association alike. for the association, it is a means to enable patients the best quality of life. For patients, participation on the forum empowers them. It was observed in the past year, however, that the activity on ‘de Amazones’ strongly decreased. This thesis applies the principle of customer churn (unsubscription) to the forum as an effort to identify those forum users that will leave the forum within one, two or three months. Identifying churners is a first step towards a program for social communities like the Duch breast cancer association to identify users as churners, and respond accordingly. The problem of churn prediction was approached from a supervised machine learning point. Twelve simple and easy-to-annotate variables were used to identify all forum posts with. Half of them described a single point in time (static features), while the other half summarised the past month (retrospective features). They were grouped into different feature groups, called: inactivity, textual, textual (retrospective), opinion, and opinion (retrospective) features. Different combinations of these variables were used to predict whether or not the writer of the post would be churned within one, two, or three months. The algorithm that was used is called XGBoost. It builds an ensemble of trees with gradient boosting. The resulting models were compared, to determine which groups of features were the most influential ones. Predictive accuracy was measured in ROC-AUC. The results show that on a realistic test set, churn in one month can be most accu- rately predicted (AUC = 0.670). This result is further examined with the false negative rate (FNR), which reflects how many of churners were correctly identified. Both scores were visibly influenced whenever only few samples from the data were available. It was also shown that the two restrospective feature groups were the most influential feature groups.
dc.description.sponsorshipUtrecht University
dc.format.extent2172088
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.titlePredicting Patient Churn: Features that predict when Breast Cancer Patients leave their online community
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsSupervised machine learning; XGBoost; Unbalanced data; Forum posts
dc.subject.courseuuArtificial Intelligence


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record