Inference serving with end-to-end latency SLOs over dynamic edge networks
V. Nigade, P. Bauszat, H. Bal, L. Wang, Real-Time Systems 60 (2024) 239–290.
Download
No fulltext has been uploaded.
Journal Article
| Published
| English
Author
Nigade, Vinod;
Bauszat, Pablo;
Bal, Henri;
Wang, LinLibreCat 
Department
Abstract
<jats:title>Abstract</jats:title><jats:p>While high accuracy is of paramount importance for deep learning (DL) inference, serving inference requests on time is equally critical but has not been carefully studied especially when the request has to be served over a dynamic wireless network at the edge. In this paper, we propose Jellyfish—a novel edge DL inference serving system that achieves soft guarantees for end-to-end inference latency service-level objectives (SLO). Jellyfish handles the network variability by utilizing both data and deep neural network (DNN) adaptation to conduct tradeoffs between accuracy and latency. Jellyfish features a new design that enables collective adaptation policies where the decisions for data and DNN adaptations are aligned and coordinated among multiple users with varying network conditions. We propose efficient algorithms to continuously map users and adapt DNNs at runtime, so that we fulfill latency SLOs while maximizing the overall inference accuracy. We further investigate <jats:italic>dynamic</jats:italic> DNNs, i.e., DNNs that encompass multiple architecture variants, and demonstrate their potential benefit through preliminary experiments. Our experiments based on a prototype implementation and real-world WiFi and LTE network traces show that Jellyfish can meet latency SLOs at around the 99th percentile while maintaining high accuracy.
</jats:p>
Publishing Year
Journal Title
Real-Time Systems
Volume
60
Issue
2
Page
239-290
LibreCat-ID
Cite this
Nigade V, Bauszat P, Bal H, Wang L. Inference serving with end-to-end latency SLOs over dynamic edge networks. Real-Time Systems. 2024;60(2):239-290. doi:10.1007/s11241-024-09418-4
Nigade, V., Bauszat, P., Bal, H., & Wang, L. (2024). Inference serving with end-to-end latency SLOs over dynamic edge networks. Real-Time Systems, 60(2), 239–290. https://doi.org/10.1007/s11241-024-09418-4
@article{Nigade_Bauszat_Bal_Wang_2024, title={Inference serving with end-to-end latency SLOs over dynamic edge networks}, volume={60}, DOI={10.1007/s11241-024-09418-4}, number={2}, journal={Real-Time Systems}, publisher={Springer Science and Business Media LLC}, author={Nigade, Vinod and Bauszat, Pablo and Bal, Henri and Wang, Lin}, year={2024}, pages={239–290} }
Nigade, Vinod, Pablo Bauszat, Henri Bal, and Lin Wang. “Inference Serving with End-to-End Latency SLOs over Dynamic Edge Networks.” Real-Time Systems 60, no. 2 (2024): 239–90. https://doi.org/10.1007/s11241-024-09418-4.
V. Nigade, P. Bauszat, H. Bal, and L. Wang, “Inference serving with end-to-end latency SLOs over dynamic edge networks,” Real-Time Systems, vol. 60, no. 2, pp. 239–290, 2024, doi: 10.1007/s11241-024-09418-4.
Nigade, Vinod, et al. “Inference Serving with End-to-End Latency SLOs over Dynamic Edge Networks.” Real-Time Systems, vol. 60, no. 2, Springer Science and Business Media LLC, 2024, pp. 239–90, doi:10.1007/s11241-024-09418-4.