View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        On the errors introduced by the naive Bayes independence assumption

        Thumbnail
        View/Open
        Thesis.pdf (589.7Kb)
        Publication date
        2018
        Author
        Wachter, M.F. de
        Metadata
        Show full item record
        Summary
        This research seeks to shed light on how well a naive Bayesian approach functions when we cannot be sure of independent evidence. This research will show just how large the error in likelihood and posterior probability of a class variable given certain evidence can possibly get when using a naive Bayesian approach. I will prove that complete dependency among the evidence variables is the worst case scenario for the error in likelihood when we have 2 or 3 pieces of evidence. Based on these results, this research introduces an equation to calculate the maximum error in likelihood under complete dependency, based on the number of observed evidence variables. I will also show that there is no real bound on the error in posterior, except that it cannot become equal to one, where the worst case scenario is when the class variable deterministically follows from the observed evidence when dependencies among evidence is considered. This research will present some experimental results on how large the error in both likelihood and posterior will typically be, and how this error correlates with various dependency measures. These experimental results support my claim that complete dependency is the worst case scenario for the error in likelihood, and determinism is the worst case scenario for the error in posterior.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/29180
        Collections
        • Theses
        Utrecht university logo