TY - GEN
AB - We present sufficient conditions that ensure convergence of the multi-agent
Deep Deterministic Policy Gradient (DDPG) algorithm. It is an example of one of
the most popular paradigms of Deep Reinforcement Learning (DeepRL) for tackling
continuous action spaces: the actor-critic paradigm. In the setting considered
herein, each agent observes a part of the global state space in order to take
local actions, for which it receives local rewards. For every agent, DDPG
trains a local actor (policy) and a local critic (Q-function). The analysis
shows that multi-agent DDPG using neural networks to approximate the local
policies and critics converge to limits with the following properties: The
critic limits minimize the average squared Bellman loss; the actor limits
parameterize a policy that maximizes the local critic's approximation of
$Q_i^*$, where $i$ is the agent index. The averaging is with respect to a
probability distribution over the global state-action space. It captures the
asymptotics of all local training processes. Finally, we extend the analysis to
a fully decentralized setting where agents communicate over a wireless network
prone to delays and losses; a typical scenario in, e.g., robotic applications.
AU - Redder, Adrian
AU - Ramaswamy, Arunselvan
AU - Karl, Holger
ID - 30791
T2 - arXiv:2201.00570
TI - Asymptotic Convergence of Deep Multi-Agent Actor-Critic Algorithms
ER -