A User-Centered Explainable AI Visualization Study for Enhancing Decision Making in Law Enforcement
Summary
Police need to be able to analyse large amounts of data in order to enforce the law. Nevertheless, this task cannot be solely completed by humans, and therefore requires the utilization of Machine Learning (ML) models. However, a concern arises in terms of the lack of transparency of these models, which could have major consequences in high-stake scenarios. Hence, it is critical to provide explanations if we anticipate using such models in fields such as law enforcement. However, many visualisations have been developed to explicate Machine Learning (ML) models for data scientists instead of decision makers. This creates an issue as data scientists have distinct objectives when interacting with an ML model compared to decision makers. While data scientists possess technical knowledge, they lack domain knowledge and seek solutions to improve the model. Conversely, decision makers utilise the model’s output to inform their decision making processes. They possess domain knowledge but lack technical expertise. Due to distinct characteristics and different requirements when dealing with ML models, they also require a different explanations. In collaboration with the National Police Lab Artificial Intelligence (NPAI), this research developed a way to effectively visualise local explanations of ML models for decision makers in the law enforcement domain. We focused on decision makers within public order and safety domains. The interviews unveiled several prerequisites that were integrated into the design. The evaluation demonstrated that decision makers comprehended the visualization and that the tool facilitated decision making. Nevertheless, it emerged that the explanation was not entirely comprehensible to the decision makers. They could pinpoint the characteristics that influenced the classification of the risk and identify the risk that the model attributed to the incident. However, they lacked the ability to discern which features made a larger contribution, and the uncertainty score proved challenging to interpret.