Deep Learning Model for Clustering Heterogeneous Data, Case Study on Recognizing Booking Behavior for a major Airline
Summary
In the world of revenue management, determining the right price for the right customer at the right time is an ongoing challenge. Within the airline industry, inventory analysts are continuously updating ticket prices to counteract the change in demand and willingness to pay. Some have to actively do this for as much as 20,000 flights. A model that could support them in their decision making by grouping similar flights would save a lot of time and effort. For this, we propose a novel clustering approach using Sum-Product Networks (SPNs) implemented as an Expectation Maximization algorithm that can handle heterogeneous data. To handle categorical variables correctly, we present our adapted version of CLARA, namely CATCLARA. For this approach, we will research what parameters will result in the optimal outcome. After validating the effectiveness of the model, we will perform a case study where we will cluster flights based on their booking behavior. The results show that an SPN with CATCLARA start yields better results than CLARA on most datasets. The outcome of the case study is that our model can correctly cluster flights based on their booking behavior, mainly by clustering flights with deviating behavior. Though the results are clearly not perfect, the model is a valuable first step towards supporting analysts in their decision making.