EdgeBoost: Confidence boosting for resource constrained inference via selective offloading

Deploying large Deep Neural Networks with state-of-the-art accuracy on edge devices is often impractical due to their limited resources. This paper introduces EdgeBoost, a selective input offloading system designed to overcome the challenges of limited computational resources on edge devices. EdgeBoost trains and calibrates a lightweight model for deployment on the edge and, in addition, deploys a large, complex model on the cloud. During inference, the edge model makes initial predictions for input samples, and if the confidence of the prediction is low, the sample is sent to the cloud model for further processing, otherwise, we accept the local prediction. Through careful calibration, EdgeBoost reduces the communication cost by 55%, 27% and 20% for the CIFAR-100, ImageNet-1k and Stanford Cars datasets, respectively, when compared to an cloud-only solution while achieving on-par classification accuracy. Furthermore, EdgeBoost reduces the total inference latency from 148 ms to 123.84 ms per inference compared to a cloud-only solution. Our evaluation also shows that calibrating the edge model for such a collaborative edge–cloud setup results in accuracy gains of up to 8 percent point, compared to an uncalibrated edge model. Additionally, EdgeBoost, when used as an abstaining classifier, can improve accuracy by up to 9 percent points over an uncalibrated model. Finally, EdgeBoost outperforms the Early Exit and Entropy thresholding baselines and achieves comparable accuracy to state-of-the-art routing-based methods without the need for hosting the router on the edge.

Subjects

DDC Class

006.3: Artificial Intelligence

Lizenz

https://creativecommons.org/licenses/by/4.0/

Publication version

publishedVersion

Name

1-s2.0-S1389128625004049-main.pdf

Size

2.28 MB

Format

Adobe PDF

Options

EdgeBoost: Confidence boosting for resource constrained inference via selective offloading