{"year":"2022","user_id":"477","author":[{"id":"52265","first_name":"Adrian","last_name":"Redder","full_name":"Redder, Adrian","orcid":"https://orcid.org/0000-0001-7391-4688"},{"orcid":"https://orcid.org/ 0000-0001-7547-8111","last_name":"Ramaswamy","full_name":"Ramaswamy, Arunselvan","first_name":"Arunselvan","id":"66937"},{"last_name":"Karl","full_name":"Karl, Holger","id":"126","first_name":"Holger"}],"department":[{"_id":"75"}],"date_updated":"2022-11-18T09:33:42Z","publication":"arXiv:2201.00570","type":"preprint","citation":{"ama":"Redder A, Ramaswamy A, Karl H. Asymptotic Convergence of Deep Multi-Agent Actor-Critic Algorithms. <i>arXiv:220100570</i>. Published online 2022.","mla":"Redder, Adrian, et al. “Asymptotic Convergence of Deep Multi-Agent Actor-Critic Algorithms.” <i>ArXiv:2201.00570</i>, 2022.","bibtex":"@article{Redder_Ramaswamy_Karl_2022, title={Asymptotic Convergence of Deep Multi-Agent Actor-Critic Algorithms}, journal={arXiv:2201.00570}, author={Redder, Adrian and Ramaswamy, Arunselvan and Karl, Holger}, year={2022} }","apa":"Redder, A., Ramaswamy, A., &#38; Karl, H. (2022). Asymptotic Convergence of Deep Multi-Agent Actor-Critic Algorithms. In <i>arXiv:2201.00570</i>.","chicago":"Redder, Adrian, Arunselvan Ramaswamy, and Holger Karl. “Asymptotic Convergence of Deep Multi-Agent Actor-Critic Algorithms.” <i>ArXiv:2201.00570</i>, 2022.","short":"A. Redder, A. Ramaswamy, H. Karl, ArXiv:2201.00570 (2022).","ieee":"A. Redder, A. Ramaswamy, and H. Karl, “Asymptotic Convergence of Deep Multi-Agent Actor-Critic Algorithms,” <i>arXiv:2201.00570</i>. 2022."},"project":[{"name":"SFB 901 - C4: SFB 901 - Subproject C4","_id":"16"},{"_id":"1","name":"SFB 901: SFB 901"},{"name":"SFB 901 - C: SFB 901 - Project Area C","_id":"4"}],"title":"Asymptotic Convergence of Deep Multi-Agent Actor-Critic Algorithms","_id":"30791","language":[{"iso":"eng"}],"abstract":[{"text":"We present sufficient conditions that ensure convergence of the multi-agent\r\nDeep Deterministic Policy Gradient (DDPG) algorithm. It is an example of one of\r\nthe most popular paradigms of Deep Reinforcement Learning (DeepRL) for tackling\r\ncontinuous action spaces: the actor-critic paradigm. In the setting considered\r\nherein, each agent observes a part of the global state space in order to take\r\nlocal actions, for which it receives local rewards. For every agent, DDPG\r\ntrains a local actor (policy) and a local critic (Q-function). The analysis\r\nshows that multi-agent DDPG using neural networks to approximate the local\r\npolicies and critics converge to limits with the following properties: The\r\ncritic limits minimize the average squared Bellman loss; the actor limits\r\nparameterize a policy that maximizes the local critic's approximation of\r\n$Q_i^*$, where $i$ is the agent index. The averaging is with respect to a\r\nprobability distribution over the global state-action space. It captures the\r\nasymptotics of all local training processes. Finally, we extend the analysis to\r\na fully decentralized setting where agents communicate over a wireless network\r\nprone to delays and losses; a typical scenario in, e.g., robotic applications.","lang":"eng"}],"status":"public","date_created":"2022-04-06T06:53:52Z","external_id":{"arxiv":["2201.00570"]}}