Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorChekol, Mel
dc.contributor.authorKisjes, Adriaan
dc.date.accessioned2023-05-23T00:00:47Z
dc.date.available2023-05-23T00:00:47Z
dc.date.issued2023
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/43907
dc.description.abstractMany organizations and data dependent applications deal with the fact that data is often incomplete and siloed across multiple knowledge bases. The semantic web and knowledge graphs are powerful tools that mitigate this by allowing rule-based systems to complete and connect different knowledge bases. To enable the use of more advanced machine-learning algorithms such as logistic regression or neural net- works, knowledge graphs need to be transformed into some kind of numeric input. In the field of neural language processing this has been solved with vector embed- dings, where for each word a vector is learned that captures its semantic meaning. There exist many knowledge graph embedding techniques inspired by natural lan- guage processing, one of which is Triple2vec where triples (two entities and the relation connecting them) are embedded as a whole. Triple2vec is innovative be- cause it captures both the graph topology as well as the heterogeneity of knowledge graphs, where other methods often focus on just one of those aspects. This thesis proposes to build on triple embeddings by developing TripLan2vec: a triple em- bedding technique that uses a pre-trained language model to generate embeddings based on textual descriptions. This enriches the embeddings by both capturing graph structure as well as natural language semantics. Moreover, it also enables the triple embeddings to be generated inductively with just a description as input, this means that triples that are not part of the training process can still be embedded, unlike with Triple2vec. During evaluation it was shown that TripLan2vec performs well at discriminating between true and false triples, and at predicting whether two triples are neighbours. In inductive evaluation, where just part of the training data was available, TripLan2vec outperforms most other methods.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectThe goal of this thesis was to develop a method to generated triple embeddings combining both natural language context as well as graph context, to learn embeddings that are more descriptive.
dc.titleTripLan2vec: Leveraging Pre-Trained Language Models for Inductive Triple Embeddings
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsKnowledge Graph; Natural Language Model; Triple Embedding
dc.subject.courseuuBusiness Informatics
dc.thesis.id16783


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record