Scalable Multi-Agent Sensor Tasking Using Deep Reinforcement Learning

Peng Mun Siew, Massachusetts Institute of Technology; Tory Smith, United States Space Force, Massachusetts Institute of Technology; Richard Linares, Massachusetts Institute of Technology; Ravi Ponmalai, Aerospace Corporation

Keywords: Multi-agent Reinforcement Learning (MARL), Graph Neural Network (GNN), Sensor Tasking, Artificial Intelligence (AI), Machine Learning (ML), Deep Reinforcement Learning (DRL)

Abstract:

Satellite launches have seen a dramatic increase in recent years, driven by the growth of commercial and government
constellations for a range of applications, including communication, navigation, and Earth observation. While this
trend provides numerous benefits, it also puts pressure on existing ground-based sensor networks to keep pace with
the growing volume of objects. As a result, there is a need for more robust and sophisticated methods for monitoring
and managing the space environment to ensure the safe and efficient use of this valuable resource. Current methods
for allocating sensors to specific tasks are often manual, time-consuming, and prone to human error.

Artificial Intelligence (AI) and more specifically Multi-Agent Reinforcement Learning (MARL) have been shown to
compete with and surpass human and traditional coordination methods within a variety of domains from video games
to collision avoidance using surprisingly simple adaptions of Proximal Policy Optimization (PPO) algorithms to a
multi-agent environments. This performance has also shown to be scalable even when the MARL algorithms are
trained using relatively few agents.

Previous works have explored Reinforcement Learning (RL) techniques for the sensor tasking problem showing im-
proved performance over myopic policies. In traditional single-agent RL, an agent learns to make decisions, also
known as a policy, by interacting with its environment and discovering the best actions to take. After each iteration of
interaction with the environment, the agent will update its policy based on its observations when taking a particular ac-
tion from a particular state. For more complex environments, RL has been combined with deep neural networks (DNN)
in an actor-critic formulation such that an actor DNN represents the agents policy and a critic network estimates the
value of an agents action. State-of-the-art (SOTA) RL formulations have also included techniques such Proximal
Policy Optimization (PPO), which both simplifies and improves the stability of the learning process by preventing the
actors policy from deviating form the previous policy such that it doesnt make large and potentially harmful changes.

As its name implies, single-agent RL is limited to training a single agent to act in an environment and is thus limited
in its applicability to real-world environments in which there are often multiple agents acting cooperatively, or com-
petitively to achieve a single or separate goal(s). This vastly increases the complexity of finding optimal policies for
the agents interacting with the environment and with each other. In previous works, a simple scheduler based MARL
algorithm was used for the sensor tasking problem for space-based sensors, wherein a scheduler was used to inform
copies of a single agent on when it was their turn to act. While this did show improvement over myopic policies, this
method lacks cooperation between the agents as there is no information shared between them thus leading to redundant
observations, especially in scenarios where there is an overlap between Fields of Regard (FOR).

In this paper, we explore MARL formulations using multi-agent PPO (MAPPO) as the basis to coordinate and eval-
uate the tasking of a group of sensors toward a policy for each agent that reduces the mean trace covariance of the
Space Objects (SO) within all sensors field of regard (FOR). We conduct experiments on a simulated environment
using various number of agents placed at real-world ground-sensor locations with overlapping FORs and evaluate the
performance of our trained agents against myopic policies over a number of performance metrics when scaling the
number of SOs within the environment. Monte Carlo simulations are carried out to collect aggregate performance
statistics and to better understand the performance.

Date of Conference: September 19-22, 2023

Track: Machine Learning for SDA Applications

View Paper