Distributed Online Service Coordination Using Deep Reinforcement Learning

Schneider, Stefan Balthasar; Qarawlus, Haydar; Karl, Holger

Distributed Online Service Coordination Using Deep Reinforcement Learning

S.B. Schneider, H. Qarawlus, H. Karl, in: IEEE International Conference on Distributed Computing Systems (ICDCS), IEEE, 2021.

Download

Distributed Online Service Coordination Using Deep Reinforcement Learning 606.32 KB

Conference Paper | English

Author

Schneider, Stefan Balthasar^LibreCat ; Qarawlus, Haydar; Karl, Holger^LibreCat

Department

Rechnernetze

Project

SFB 901
SFB 901 - Project Area C
SFB 901 - Subproject C4

Abstract

Services often consist of multiple chained components such as microservices in a service mesh, or machine learning functions in a pipeline. Providing these services requires online coordination including scaling the service, placing instance of all components in the network, scheduling traffic to these instances, and routing traffic through the network. Optimized service coordination is still a hard problem due to many influencing factors such as rapidly arriving user demands and limited node and link capacity. Existing approaches to solve the problem are often built on rigid models and assumptions, tailored to specific scenarios. If the scenario changes and the assumptions no longer hold, they easily break and require manual adjustments by experts. Novel self-learning approaches using deep reinforcement learning (DRL) are promising but still have limitations as they only address simplified versions of the problem and are typically centralized and thus do not scale to practical large-scale networks. To address these issues, we propose a distributed self-learning service coordination approach using DRL. After centralized training, we deploy a distributed DRL agent at each node in the network, making fast coordination decisions locally in parallel with the other nodes. Each agent only observes its direct neighbors and does not need global knowledge. Hence, our approach scales independently from the size of the network. In our extensive evaluation using real-world network topologies and traffic traces, we show that our proposed approach outperforms a state-of-the-art conventional heuristic as well as a centralized DRL approach (60% higher throughput on average) while requiring less time per online decision (1 ms).

Keywords

network management; service management; coordination; reinforcement learning; distributed

Publishing Year

2021

Proceedings Title

IEEE International Conference on Distributed Computing Systems (ICDCS)

Conference

IEEE International Conference on Distributed Computing Systems (ICDCS)

Conference Location

Washington, DC, USA

LibreCat-ID

21543

Cite this

Schneider SB, Qarawlus H, Karl H. Distributed Online Service Coordination Using Deep Reinforcement Learning. In: IEEE International Conference on Distributed Computing Systems (ICDCS). IEEE; 2021.

Schneider, S. B., Qarawlus, H., & Karl, H. (2021). Distributed Online Service Coordination Using Deep Reinforcement Learning. In IEEE International Conference on Distributed Computing Systems (ICDCS). Washington, DC, USA: IEEE.

@inproceedings{Schneider_Qarawlus_Karl_2021, title={Distributed Online Service Coordination Using Deep Reinforcement Learning}, booktitle={IEEE International Conference on Distributed Computing Systems (ICDCS)}, publisher={IEEE}, author={Schneider, Stefan Balthasar and Qarawlus, Haydar and Karl, Holger}, year={2021} }

Schneider, Stefan Balthasar, Haydar Qarawlus, and Holger Karl. “Distributed Online Service Coordination Using Deep Reinforcement Learning.” In IEEE International Conference on Distributed Computing Systems (ICDCS). IEEE, 2021.

S. B. Schneider, H. Qarawlus, and H. Karl, “Distributed Online Service Coordination Using Deep Reinforcement Learning,” in IEEE International Conference on Distributed Computing Systems (ICDCS), Washington, DC, USA, 2021.

Schneider, Stefan Balthasar, et al. “Distributed Online Service Coordination Using Deep Reinforcement Learning.” IEEE International Conference on Distributed Computing Systems (ICDCS), IEEE, 2021.

All files available under the following license(s):

Copyright Statement: