---
res:
bibo_abstract:
- "We present sufficient conditions that ensure convergence of the multi-agent\r\nDeep
Deterministic Policy Gradient (DDPG) algorithm. It is an example of one of\r\nthe
most popular paradigms of Deep Reinforcement Learning (DeepRL) for tackling\r\ncontinuous
action spaces: the actor-critic paradigm. In the setting considered\r\nherein,
each agent observes a part of the global state space in order to take\r\nlocal
actions, for which it receives local rewards. For every agent, DDPG\r\ntrains
a local actor (policy) and a local critic (Q-function). The analysis\r\nshows
that multi-agent DDPG using neural networks to approximate the local\r\npolicies
and critics converge to limits with the following properties: The\r\ncritic limits
minimize the average squared Bellman loss; the actor limits\r\nparameterize a
policy that maximizes the local critic's approximation of\r\n$Q_i^*$, where $i$
is the agent index. The averaging is with respect to a\r\nprobability distribution
over the global state-action space. It captures the\r\nasymptotics of all local
training processes. Finally, we extend the analysis to\r\na fully decentralized
setting where agents communicate over a wireless network\r\nprone to delays and
losses; a typical scenario in, e.g., robotic applications.@eng"
bibo_authorlist:
- foaf_Person:
foaf_givenName: Adrian
foaf_name: Redder, Adrian
foaf_surname: Redder
foaf_workInfoHomepage: http://www.librecat.org/personId=52265
orcid: https://orcid.org/0000-0001-7391-4688
- foaf_Person:
foaf_givenName: Arunselvan
foaf_name: Ramaswamy, Arunselvan
foaf_surname: Ramaswamy
foaf_workInfoHomepage: http://www.librecat.org/personId=66937
orcid: https://orcid.org/ 0000-0001-7547-8111
- foaf_Person:
foaf_givenName: Holger
foaf_name: Karl, Holger
foaf_surname: Karl
foaf_workInfoHomepage: http://www.librecat.org/personId=126
dct_date: 2022^xs_gYear
dct_language: eng
dct_title: Asymptotic Convergence of Deep Multi-Agent Actor-Critic Algorithms@
...