View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Contextualised Knowledge Base Creation through Phrase Extraction and Lexical Relation Classification

        Thumbnail
        View/Open
        AI_masters_thesis_Thomas_Hoek_7683650.pdf (654.0Kb)
        Publication date
        2024
        Author
        Hoek, Thomas
        Metadata
        Show full item record
        Summary
        Large Language Models (LLMs) are used to solve many downstream tasks in Natural Language Processing (NLP). One NLP area where they excel is classifying the lexical relation (LRC) between standalone words (e.g., 'house' and 'home' are related by the synonym relation). However, recent research raises doubts about the capability of LLMs to predict out-of-scope data and how proper they are in their generalisation. Because of this, we wanted to test how effective LRC models are in generalising when dealing with lexical relations. Lexical relations are very important for a variety of tasks, such as the creation of knowledge bases (KB). Knowledge bases and similar methods are essential in many formal semantic systems across various NLP fields, as they help infer lexical relations between words. In this thesis, we try to determine how feasible it is to create a hybrid model of our semantically driven model with LLMs. We want to achieve this by replacing static Knowledge bases with automatically generated ones using LLMs trained on context-dependent datasets. We achieve this by automatically extracting relevant phrase pairs from contrasting sentences while exploiting syntactic trees of the sentences. These extracted relevant phrases are then utilised by state-of-the-art (SOTA) models, which transform them into lexical relations. This allowed us to automatically create a KB from Natural Language Inference (NLI) data, which we will utilise by a logic-based NLI prover to test its performance. These extracted pairs with relations are used as a replacement for a KB. This KB is tested through the utilisation of a higher-order theorem prover. This semantic model utilises the lexical relations in the KB to prove NLI problems. This prover scores on the SICK test set, 1.9% lower compared to when the original KB (WordNet) is utilised, but as a combination with WordNet, it scores 0.4% lower. Our method is capable of solving new problems, which WordNet failed to solve. This however came at the cost of many problems being incorrectly solved, resulting in a lower precision for the trade of a higher recall. This is mainly caused by the limited availability of data to train on for determining the lexical relation between phrases and our method having too many false predictions due to low-quality lexical pairs.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/48075
        Collections
        • Theses
        Utrecht university logo