Predicting Positive and Negative Ties in Signed Networks

Staals, Bo

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Garcia Bernardo, Javier
dc.contributor.author	Staals, Bo
dc.date.accessioned	2023-09-06T09:40:22Z
dc.date.available	2023-09-06T09:40:22Z
dc.date.issued	2023
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/44951
dc.description.abstract	Network analysis is a growing area of research in various fields. While most existing studies focus on unsigned networks, this research explores the coexistence of positive and negative ties in signed online and real world networks. By combining topological features and social theories, this research investigates the gap in previous research. This leads to the following research question: "To what extent can machine learning models predict positive and negative ties in online and real world networks?" The network analysis is carried out on two signed networks. The Wikipedia network represents the online network, where every user is allowed to vote for other users that request for adminship. The school network represents the real world network and is based on data collected in the "Children of Immigrants Longitudinal Survey in Four European Countries" project. The final datasets used for the classification model contain 4627 nodes and 122767 ties for the Wikipedia network and 4284 nodes and 27333 ties for the school network. Snowball sampling has been performed to obtain subsets of the networks due to computational bottlenecks. The final feature set includes every connection between nodes, which can be either negative, not observed or positive. It also includes important metrics such as node-level, network-level, similarity-based, social theory and path-based metrics. After model selection, hyperparameter tuning and feature selection on the subsets, the final datasets are evaluated on the tuned Light Gradient Boosting Machine model. The results show that the school network outperforms the Wikipedia network in terms of F1, recall and precision in all classes except the not observed class. Reasons for this difference could be the imbalanced datasets and different network structures. The Wikipedia network has biased predictions towards the not observed class while the school network shows consistent performance. In conclusion, the real world network has a higher performance compared to the online network. We must keep in mind that comparing these networks is challenging due to the different network structure. Therefore, future research could compare more similar networks to obtain more generalisable results. In addition, it is important to overcome the problem of imbalanced data in order to obtain reliable and consistent results.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	A Comparitive study on Online and Real World Networks
dc.title	Predicting Positive and Negative Ties in Signed Networks
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	Network analysis; signed networks; positive and negative ties; machine learning models; online and real world networks; Python
dc.subject.courseuu	Applied Data Science
dc.thesis.id	23508

Files in this item

Name:: Thesis_BoStaals.pdf
Size:: 1.327Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record