Using pattern set mining on Weigh-in-Motion data.
This research aimed to answer to which extend pattern set mining can prove to be useful for the Innovation and Datalab (ID-lab) of the Dutch Human Environment and Transportation Inspectorate (ILT). To study this, the unsupervised pattern set mining algorithm SLIM was used. SLIM finds interesting patterns in the data by searching for those patterns that can optimally compress the data. The patterns in question were travel time patterns in Weigh-in-Motion data. This data is relevant for ILT because one of its primary tasks is to inspect commercial transportation vehicles. Weigh-in-Motion systems collect data about vehicles in multiple locations along the Dutch highways. By analysing this data, more can be learned about the particularities of the transportation industry, which is helpful for inspections.
To find these patterns, three approaches were taken. First, using a simplified dataset grouped by license plates, SLIM was compared with well-known clustering technique K-Means. SLIM turned out to be more suitable, as it provided more distinctive results. Second, the dataset was made more complex by addressing the simplifications from the first approach, mainly going from binary to numeric variables. These had to be discretised, so they could be used by SLIM. SLIM still performed well but the resulting patterns were difficult to visualise. Finally, the data was grouped on businesses by adding more variables. The results were modelled as networks to visualise the patterns. This turned out to be a visualisation in which interesting patterns could be found with ease, especially with the filtering tools provided by Gephi.
In conclusion, pattern set mining could prove to be very useful for ID-lab. The new visualisation method using networks made discovering interesting patterns easier and could also be used to find other connections in the data. Interesting travel time patterns were extracted from the data, yet interpretation of the discretised continuous data proved to be challenging.