Compressing Object Detectors for Bear Detection on Edge Devices
Summary
Camera traps are deployed in Romania to keep bears from entering villages in search of food. These battery-powered, low-energy devices rely on a deep neural network for effective bear detection. However, neural networks typically require a large amount of RAM to store their parameters, which leads to high energy consumption. This creates a challenge for deploying AI models on these edge devices with limited power.
To address this challenge, this thesis introduces a novel training approach that combines two model compression techniques and applies these to an object detection problem, using YOLOv5 as the base model, a battle-tested object detector based on the convolutional neural network architecture. The approach integrates two model compression techniques. The first technique is self-compression, which allows a model to learn to convert its parameters into smaller data types. The second technique is online knowledge distillation, where a smaller model acquires knowledge from a larger, more complex model that is training simultaneously. The novelty of this approach is the combination of these techniques, allowing the larger model to account for the self-compression process of the smaller model during training due to the online manner of knowledge distillation. This novel approach aims to optimize model compression while maintaining performance, creating an efficient object detector that can be deployed on devices with limited RAM.
The proposed approach results in a model that only requires 1.4 MB of memory for its parameters. This is almost 60 times fewer than the 83 MB required by the medium-sized YOLOv5 model, and five times fewer than the 7.1 MB used by the nano-sized YOLOv5 model. Even with substantial size reduction, the resulting model achieves an F1-score of 0.971 when classifying bears, which is comparable to the performance of the larger baseline models: the medium-sized YOLOv5 model has an F1-score of 0.985, and the nano-sized model scores 0.977. The results of this thesis demonstrate the potential of combining self-compression and knowledge distillation for energy-efficient object detectors.