View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Identifying Product Entities in text with Conditional Random Fields

        Thumbnail
        View/Open
        Thesis.pdf (482.8Kb)
        Publication date
        2014
        Author
        Kuppevelt, D.E. van
        Metadata
        Show full item record
        Summary
        With a growing portion of the web dedicated to the discussion, review and retail of consumer products, it is increasingly relevant to develop methods for automated extraction of Product Entities in user generated text. In addition, it is important that the extraction models provide feedback about the quality of their output, in the form of a confidence score associated with each entity. Conditional Random Fields, which are designed as a discriminative solution for structured output prediction, have shown to be successful for the related problem of Named Entity Recognition. Furthermore, their probabilistic nature provides a natural way to obtain a confidence score. In this thesis, the optimal application of Conditional Random Fields to the specific problem of identifying Product Entities is investigated. A set of experiments is designed and executed to compare different choices of feature sets. The results prove that Conditional Random Fields perform better than heuristic models for this task. In addition, several existing methods for confidence scoring are experimentally compared, and an optimized algorithm to calculate the exact confidence estimate (known as the Constrained Forward Backward estimate) is introduced. The experiments show that the more heuristic Gamma Product method has a comparable performance to the Constrained Forward Backward method, and thus provides an alternative confidence estimate to use in practice. Finally, the F1 score for the Product Entity Recognition is enhanced even further by combining models, using the confidence score for voting.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/16147
        Collections
        • Theses
        Utrecht university logo