Isolation Forest - The accuracy of isolation forest and the possibleeffects of misclassified anomalies
Summary
Anomaly detection is a topic in data science that is receiving more and more atten-tion. Not only for its benefits in the business world, but in all sorts of different areas(e.g. cybersecurity, health care, behaviour etc.). The purpose of this thesis is toanswer whether isolation forest (a machine learning anomaly detection algorithm)is accurate in classifying these anomalies and what the effects in a particular area(internet traffic) can be. The way to achieve this, is by taking a labelled data setand applying the algorithm to it, to see if it can find all the labelled anomalies. Forthe effects of misclassification, this paper will be looking at a specific area/data setand discuss all possible outcomes with its chances of happening. Isolation forestproves to be a valuable algorithm that can minimize risk and maximize benefits.