Distributed Online Service Coordination Using Deep Reinforcement Learning
S.B. Schneider, H. Qarawlus, H. Karl, in: IEEE International Conference on Distributed Computing Systems (ICDCS), IEEE, 2021.
Download
Conference Paper
| English
Author
Department
Abstract
Services often consist of multiple chained components such as microservices in a service mesh, or machine learning functions in a pipeline. Providing these services requires online coordination including scaling the service, placing instance of all components in the network, scheduling traffic to these instances, and routing traffic through the network. Optimized service coordination is still a hard problem due to many influencing factors such as rapidly arriving user demands and limited node and link capacity. Existing approaches to solve the problem are often built on rigid models and assumptions, tailored to specific scenarios. If the scenario changes and the assumptions no longer hold, they easily break and require manual adjustments by experts. Novel self-learning approaches using deep reinforcement learning (DRL) are promising but still have limitations as they only address simplified versions of the problem and are typically centralized and thus do not scale to practical large-scale networks.
To address these issues, we propose a distributed self-learning service coordination approach using DRL. After centralized training, we deploy a distributed DRL agent at each node in the network, making fast coordination decisions locally in parallel with the other nodes. Each agent only observes its direct neighbors and does not need global knowledge. Hence, our approach scales independently from the size of the network. In our extensive evaluation using real-world network topologies and traffic traces, we show that our proposed approach outperforms a state-of-the-art conventional heuristic as well as a centralized DRL approach (60% higher throughput on average) while requiring less time per online decision (1 ms).
Publishing Year
Proceedings Title
IEEE International Conference on Distributed Computing Systems (ICDCS)
Conference
IEEE International Conference on Distributed Computing Systems (ICDCS)
Conference Location
Washington, DC, USA
LibreCat-ID
Cite this
Schneider SB, Qarawlus H, Karl H. Distributed Online Service Coordination Using Deep Reinforcement Learning. In: IEEE International Conference on Distributed Computing Systems (ICDCS). IEEE; 2021.
Schneider, S. B., Qarawlus, H., & Karl, H. (2021). Distributed Online Service Coordination Using Deep Reinforcement Learning. In IEEE International Conference on Distributed Computing Systems (ICDCS). Washington, DC, USA: IEEE.
@inproceedings{Schneider_Qarawlus_Karl_2021, title={Distributed Online Service Coordination Using Deep Reinforcement Learning}, booktitle={IEEE International Conference on Distributed Computing Systems (ICDCS)}, publisher={IEEE}, author={Schneider, Stefan Balthasar and Qarawlus, Haydar and Karl, Holger}, year={2021} }
Schneider, Stefan Balthasar, Haydar Qarawlus, and Holger Karl. “Distributed Online Service Coordination Using Deep Reinforcement Learning.” In IEEE International Conference on Distributed Computing Systems (ICDCS). IEEE, 2021.
S. B. Schneider, H. Qarawlus, and H. Karl, “Distributed Online Service Coordination Using Deep Reinforcement Learning,” in IEEE International Conference on Distributed Computing Systems (ICDCS), Washington, DC, USA, 2021.
Schneider, Stefan Balthasar, et al. “Distributed Online Service Coordination Using Deep Reinforcement Learning.” IEEE International Conference on Distributed Computing Systems (ICDCS), IEEE, 2021.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]
Main File(s)
File Name
public_author_version.pdf
606.32 KB
File Title
Distributed Online Service Coordination Using Deep Reinforcement Learning
Access Level
Open Access
Last Uploaded
2021-03-18T17:12:56Z
Software: