Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorRenooij, S.
dc.contributor.authorMajoor, I.A.G.
dc.date.accessioned2017-08-04T17:02:13Z
dc.date.available2017-08-04T17:02:13Z
dc.date.issued2017
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/26756
dc.description.abstractIn this paper, the differences between training a Naïve Bayes classifier on normally distributed continuous attributes and training a Naïve Bayes classifier on the discretized version of those continuous attributes have been examined. First, the methods that have been used in the experiment have been chosen carefully. To test if an attribute has a normal distribution, the ShapiroWilk test was executed. The discretization has taken place with the unsupervised method equal frequency and a supervised method using the minimum description length principle. Monte-Carlo cross validation was used to get three means of the percentages wrongly classified unseen instances per dataset after 25 runs: when using the continuous attributes themselves and when the attributes were discretized using the two discretization methods. In the results, the means have been tested with the paired t-test. The conclusion is to keep the continuous attributes when dealing with a larger dataset (2280 till 5000 instances) as there were less unseen instances wrongly classified and the true difference between the means was significant. When dealing with a smaller dataset (114 till 250 instances), the true difference between the means was not significant. Only when a smaller dataset has an unbalanced split of number of instances per different classification with a ratio of 1:2.5, keeping the normally distributed continuous attributes resulted in a better accuracy.
dc.description.sponsorshipUtrecht University
dc.format.extent1427890
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.titleNaïve Bayes classifier: normally distributed continuous attributes versus the discretized version of those attributes
dc.type.contentBachelor Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsnaive bayes classifier, normal distributed, continuous attributes, discretization
dc.subject.courseuuKunstmatige Intelligentie


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record