Predictive machine learning for a housing corporation
Summary
In the process of renting a house, payment arrears may happen to some tenants. Normally, the housing corporation can only take actions after the problems occurred. In this thesis, several machine learning and subgroup discovery algorithms are used to detect in advance people who are more likely to cause payment problems. The chosen machine leaning algorithms include logistic regression, random forests, k nearest neighbors, naive bayes and neural networks using model averaging, while the PRIM algorithm is selected for subgroup discovery. Because the skewed distribution of classes in datasets, we utilize the synthetic minority over-sampling technique (SMOTE) to generate more reasonable results. Additionally, feature selection and several ensemble methods are leveraged as well to improve the model performance, such as averaging, majority voting and stacking. By all these approaches, finally, we are able to get a few models that are significantly b etter than the preliminary one. However, since the available data is limited and incomplete, and important time-based information is missing, we can’t obtain a model which is good enough.