{"file":[{"file_size":606321,"date_created":"2021-03-18T17:12:56Z","date_updated":"2021-03-18T17:12:56Z","access_level":"open_access","content_type":"application/pdf","file_id":"21544","relation":"main_file","title":"Distributed Online Service Coordination Using Deep Reinforcement Learning","creator":"stschn","file_name":"public_author_version.pdf"}],"type":"conference","_id":"21543","oa":"1","date_updated":"2022-01-06T06:55:04Z","conference":{"name":"IEEE International Conference on Distributed Computing Systems (ICDCS)","location":"Washington, DC, USA"},"abstract":[{"text":"Services often consist of multiple chained components such as microservices in a service mesh, or machine learning functions in a pipeline. Providing these services requires online coordination including scaling the service, placing instance of all components in the network, scheduling traffic to these instances, and routing traffic through the network. Optimized service coordination is still a hard problem due to many influencing factors such as rapidly arriving user demands and limited node and link capacity. Existing approaches to solve the problem are often built on rigid models and assumptions, tailored to specific scenarios. If the scenario changes and the assumptions no longer hold, they easily break and require manual adjustments by experts. Novel self-learning approaches using deep reinforcement learning (DRL) are promising but still have limitations as they only address simplified versions of the problem and are typically centralized and thus do not scale to practical large-scale networks.\r\n\r\nTo address these issues, we propose a distributed self-learning service coordination approach using DRL. After centralized training, we deploy a distributed DRL agent at each node in the network, making fast coordination decisions locally in parallel with the other nodes. Each agent only observes its direct neighbors and does not need global knowledge. Hence, our approach scales independently from the size of the network. In our extensive evaluation using real-world network topologies and traffic traces, we show that our proposed approach outperforms a state-of-the-art conventional heuristic as well as a centralized DRL approach (60% higher throughput on average) while requiring less time per online decision (1 ms).","lang":"eng"}],"file_date_updated":"2021-03-18T17:12:56Z","publisher":"IEEE","keyword":["network management","service management","coordination","reinforcement learning","distributed"],"date_created":"2021-03-18T17:15:47Z","status":"public","ddc":["000"],"author":[{"full_name":"Schneider, Stefan Balthasar","orcid":"0000-0001-8210-4011","id":"35343","first_name":"Stefan Balthasar","last_name":"Schneider"},{"first_name":"Haydar","last_name":"Qarawlus","full_name":"Qarawlus, Haydar"},{"id":"126","full_name":"Karl, Holger","last_name":"Karl","first_name":"Holger"}],"title":"Distributed Online Service Coordination Using Deep Reinforcement Learning","has_accepted_license":"1","related_material":{"link":[{"relation":"software","url":"https://github.com/ RealVNF/distributed-drl-coordination"}]},"publication":"IEEE International Conference on Distributed Computing Systems (ICDCS)","project":[{"_id":"1","name":"SFB 901"},{"_id":"4","name":"SFB 901 - Project Area C"},{"_id":"16","name":"SFB 901 - Subproject C4"}],"year":"2021","language":[{"iso":"eng"}],"user_id":"35343","department":[{"_id":"75"}],"citation":{"ieee":"S. B. Schneider, H. Qarawlus, and H. Karl, “Distributed Online Service Coordination Using Deep Reinforcement Learning,” in IEEE International Conference on Distributed Computing Systems (ICDCS), Washington, DC, USA, 2021.","ama":"Schneider SB, Qarawlus H, Karl H. Distributed Online Service Coordination Using Deep Reinforcement Learning. In: IEEE International Conference on Distributed Computing Systems (ICDCS). IEEE; 2021.","mla":"Schneider, Stefan Balthasar, et al. “Distributed Online Service Coordination Using Deep Reinforcement Learning.” IEEE International Conference on Distributed Computing Systems (ICDCS), IEEE, 2021.","apa":"Schneider, S. B., Qarawlus, H., & Karl, H. (2021). Distributed Online Service Coordination Using Deep Reinforcement Learning. In IEEE International Conference on Distributed Computing Systems (ICDCS). Washington, DC, USA: IEEE.","chicago":"Schneider, Stefan Balthasar, Haydar Qarawlus, and Holger Karl. “Distributed Online Service Coordination Using Deep Reinforcement Learning.” In IEEE International Conference on Distributed Computing Systems (ICDCS). IEEE, 2021.","short":"S.B. Schneider, H. Qarawlus, H. Karl, in: IEEE International Conference on Distributed Computing Systems (ICDCS), IEEE, 2021.","bibtex":"@inproceedings{Schneider_Qarawlus_Karl_2021, title={Distributed Online Service Coordination Using Deep Reinforcement Learning}, booktitle={IEEE International Conference on Distributed Computing Systems (ICDCS)}, publisher={IEEE}, author={Schneider, Stefan Balthasar and Qarawlus, Haydar and Karl, Holger}, year={2021} }"}}