Osteoarthritis proteomics O2PLS clustering with a STRING database validation approach

Berg, Niels van den

View/Open

thesis_OA_dig_twins.docx (6.475Mb)

Publication date

2024

Author

Berg, Niels van den

Metadata

Show full item record

Summary

OA is a common joint disease that causes pain, stiffness, and reduced joint function, especially in adults aged 35 to 54. Its prevalence is growing due to factors like an aging population, more active lifestyles, and the global obesity epidemic. The cost of treating OA is significant, both in healthcare expenses and lost productivity. Up until now the burdens to society, caused by OA, have not been met with effective treatment or understanding of the disease. This study explores the relationship between proteomic data (protein abundance data) and the way we classify OA in patients. The aim is to uncover potential clues about the disease and improve our understanding of it. The research is based on data from the European IMI-APPROACH cohort, which aims to enhance our knowledge of OA by creating a well-designed patient group. This study specifically focuses on the proteomic data from this cohort, obtained from serum samples. To analyze this complex data, we used a mathematical model called Two-way Orthogonal Partial Least Squares (O2PLS). This model helped to understand the data by reducing its complexity, and it allowed us to spot any connections between the proteomic data and the clinical classifications of OA patients. Surprisingly, the results didn't show a clear grouping of patients based on the clinical OA classification, but there was some alternative separation in the data. The approach was tested using simulated "digital twin" data, which was based on previous research. This method involves creating artificial patient data to validate the workflow. A novel aspect of this study was the use of a database called STRING to create these digital twins. STRING is a powerful tool that predicts how different proteins, in the body, are associated. By using this database, we successfully simulated data and found good clustering when applied to the O2PLS model. This highlights the potential of STRING for creating artificial patient data for future research. While the O2PLS model and the digital twin validation showed promise, there were some challenges. The small size of the IMI-APPROACH dataset limited the model's ability to generalize and predict results on a larger scale. To address this limitation, the study suggests that combining data from different cohorts could improve the model's performance. Possible improvements to the Digital Twins workflow mainly include integration of protein expression databases.r

URI

https://studenttheses.uu.nl/handle/20.500.12932/45810

Collections

Theses