Finding Patterns in DNA Sequences for Prediction of Errors

Au Yeung, L.Y.

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Feelders, A.J.
dc.contributor.author	Au Yeung, L.Y.
dc.date.accessioned	2018-05-18T17:00:51Z
dc.date.available	2018-05-18T17:00:51Z
dc.date.issued	2018
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/29041
dc.description.abstract	With the use of DNA analysis it is possible to determine whether or not a suspect can be placed at a crime scene. In the field of forensics it is often the case that the DNA sample obtained is of low quality, the sample found could be from a little bit of saliva or small hairs left behind on the crime scene. Therefore DNA amplification is necessary. After DNA amplification, it is possible to sequence the DNA. This is a process to determine the order of the DNA sequence, which is a string made up of As, Cs, Gs and Ts. These methods are however prone to errors, making it more difficult to analyze the DNA correctly. For instance, the DNA could be a mixed sample where more than one individual’s DNA is found, then it will be harder to determine when a sequence is an erroneous sequence or a genuine sequence, belonging to an individual. We have used sequential pattern mining to determine whether it was possible to predict the reads that could be considered as non genuine sequences. This was done by taking segments of the DNA sequence before and after an error was made. Sequential pattern mining is a method in data mining to finding relevant patterns in a database of sequences. A pattern is considered relevant if it exceeds a threshold, which is user defined. This threshold is based on the number of times a pattern appears in the sequences in the database. We have found several patterns that were associated with a certain position on the DNA sequence. This certain position showed a bias towards one type of error. With the use of the pattern and the position associated with the pattern it might be possible to determine the number of sequences that are non genuine in a new sample. However, many other factors might contribute to whether or not a pattern shows this type of error, for instance, read orientation, error ratio and sequence length.
dc.description.sponsorship	Utrecht University
dc.format.extent	1462071
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.title	Finding Patterns in DNA Sequences for Prediction of Errors
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	DNA sequences, sequential pattern mining
dc.subject.courseuu	Artificial Intelligence

Files in this item

Name:: Masterthesis_LY_AuYeung.pdf
Size:: 1.394Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record