Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorKrempl, dr. Ing. habil G.
dc.contributor.advisorvan Ommen, Dr. M. T.
dc.contributor.authorAas, B.F.
dc.date.accessioned2020-07-30T18:00:26Z
dc.date.available2020-07-30T18:00:26Z
dc.date.issued2020
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/36423
dc.description.abstractToxicology is a field plagued by lack of experimental data and labels, as assessment of chemical toxicity is a time-consuming and costly process, all the while release of new substances grow in number. This further strengthens the need for robust screening tools and models for classification. Chemical similarity screening traditionally include a two-dimensional fingerprint representation of a chemical sub-structure, in which a distance measure between fingerprints determines similarity. This approach neglects potential importance ordering for sub-structures. The novelty of the approach presented in this paper aims to model so-called persistent, bioaccumulative and toxic(PBT) substances based on their physical chemical properties, and whether such an approach is an improvement over related fingerprint based approaches. Aims further include to inspect whether feature importance match a priori expert expectation, and whether the results could be improved by application of active machine learning. Two baseline machine learning models were fit to naive and filtered physical chemical data in the form of Random Forests and Support Vector Machine. The best performing model achieved a 94.28%classification accuracy, and was also able to pick up on existing legal guideline thresholds for substance evaluation. Further hypothesis of expert feature importance was showed to be true, with added importance for features previously not considered. Further utilizing a curious machine learning algorithm named Active Learning, it was shown that a similar accuracy could be achieved with 40-50% less data used, with a demo for interactive annotation with a chemical expert that could serve as a crossreferencing check on expert chemical evaluation. Albeit in need of further confirming data, the main contribution of this paper is the novel approach of using physio-chemical data, showing the value of utilizing machine learning algorithms as tool for the classification of harmful chemicals.
dc.description.sponsorshipUtrecht University
dc.format.extent2235111
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.titleChemical Similarity Screening With Machine Learning and Active Learning Using Physical Chemical Properties
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsToxicology, Machine Learning, Chemical Similarity, Random Forest, Support Vector Machine, Active Learning, PBT substances
dc.subject.courseuuArtificial Intelligence


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record