TripLan2vec: Leveraging Pre-Trained Language Models for Inductive Triple Embeddings

Kisjes, Adriaan

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Chekol, Mel
dc.contributor.author	Kisjes, Adriaan
dc.date.accessioned	2023-05-23T00:00:47Z
dc.date.available	2023-05-23T00:00:47Z
dc.date.issued	2023
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/43907
dc.description.abstract	Many organizations and data dependent applications deal with the fact that data is often incomplete and siloed across multiple knowledge bases. The semantic web and knowledge graphs are powerful tools that mitigate this by allowing rule-based systems to complete and connect different knowledge bases. To enable the use of more advanced machine-learning algorithms such as logistic regression or neural net- works, knowledge graphs need to be transformed into some kind of numeric input. In the field of neural language processing this has been solved with vector embed- dings, where for each word a vector is learned that captures its semantic meaning. There exist many knowledge graph embedding techniques inspired by natural lan- guage processing, one of which is Triple2vec where triples (two entities and the relation connecting them) are embedded as a whole. Triple2vec is innovative be- cause it captures both the graph topology as well as the heterogeneity of knowledge graphs, where other methods often focus on just one of those aspects. This thesis proposes to build on triple embeddings by developing TripLan2vec: a triple em- bedding technique that uses a pre-trained language model to generate embeddings based on textual descriptions. This enriches the embeddings by both capturing graph structure as well as natural language semantics. Moreover, it also enables the triple embeddings to be generated inductively with just a description as input, this means that triples that are not part of the training process can still be embedded, unlike with Triple2vec. During evaluation it was shown that TripLan2vec performs well at discriminating between true and false triples, and at predicting whether two triples are neighbours. In inductive evaluation, where just part of the training data was available, TripLan2vec outperforms most other methods.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	The goal of this thesis was to develop a method to generated triple embeddings combining both natural language context as well as graph context, to learn embeddings that are more descriptive.
dc.title	TripLan2vec: Leveraging Pre-Trained Language Models for Inductive Triple Embeddings
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	Knowledge Graph; Natural Language Model; Triple Embedding
dc.subject.courseuu	Business Informatics
dc.thesis.id	16783

Files in this item

Name:: Thesis_Adriaan_Kisjes.pdf
Size:: 1.582Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record