Reliable normalization in resume information extraction
MetadataShow full item record
In this thesis the subject of reliable normalization in Information Extraction is discussed, specifically for the normalization of extracted items out of people's resumees. A meta-classifier approach is presented, which is based on the Memory Based Learning (MBL) implementation of the k-Nearest Neigbor Algorithm. To investigate whether this approach is a practical solution to reliable normalization a literature study is done on the subject and various experiments on domain-specific data sets were conducted. It is shown that the meta-classifier approach in combination with the MBL algorithm is one of the fastest implementations in comparison to other reliability measures and classifier algorithms. The meta-classifier is able to reach an F2-score of 86.1% on randomly created test sets.