On the errors introduced by the naive Bayes independence assumption
Summary
This research seeks to shed light on how well a naive Bayesian approach functions when we cannot be sure of independent evidence. This research will show just how large the error in likelihood and posterior probability of a class variable given certain evidence can possibly get when using a naive Bayesian approach. I will prove that complete dependency among the evidence variables is the worst case scenario for the error in likelihood when we have 2 or 3 pieces of evidence. Based on these results, this research introduces an equation to calculate the maximum error in likelihood under complete dependency, based on the number of observed evidence variables. I will also show that there is no real bound on the error in posterior, except that it cannot become equal to one, where the worst case scenario is when the class variable deterministically follows from the observed evidence when dependencies among evidence is considered. This research will present some experimental results on how large the error in both likelihood and posterior will typically be, and how this error correlates with various dependency measures. These experimental results support my claim that complete dependency is the worst case scenario for the error in likelihood, and determinism is the worst case scenario for the error in posterior.