Cislunar Space Situational Awareness Sensor Tasking using Deep Reinforcement Learning Agents

Peng Mun Siew, Massachusetts Institute of Technology; Daniel Jang, Massachusetts Institute of Technology; Thomas G. Roberts, Massachusetts Institute of Technology; Richard Linares, Massachusetts Institute of Technology; Justin Fletcher, United States Space Force Space Systems Command

Keywords: Deep Reinforcement Learning, Sensor Tasking, Proximal Policy Optimization, Cislunar, Space Domain Awareness

Abstract:

To maintain a robust catalog of resident space objects (RSOs), space situational awareness (SSA) mission operators depend on the ground- and space-based sensors to repeatedly detect, characterize, and track objects in orbit. Although some sensors are capable of monitoring large swaths of the sky with wide fields of view (FOV), others—such as maneuverable optical telescopes, narrow-band and imaging radars, or satellite laser ranging (SLR) systems—are restricted to relatively narrow FOVs and must slew at a finite rate from object to object as they observe them. Since there are many objects that a narrow FOV sensor could choose to observe within its field of regard (FOR), it must algorithmically create a schedule that dictates which direction to point and for how long: a combinatorial optimization problem known as the sensor tasking problem (Erwin, Albuquerque, Jayaweera, & Hussein, 2010). As more RSOs are added to the United States Space Command’s (USSPACECOM) RSO catalog with the advent of proliferated satellite constellations and the deployment of more accurate sensors that can detect smaller objects, the problem of tasking narrow FOV sensors becomes more pressing. For example, there are currently fewer than 3,000 active satellites in LEO (Union of Concerned Scientists, 2022), and it is estimated that by 2025 over 1,000 satellites could be launched each year (Ryan-Mosley, Winick, & Kakaes, 2019). The number of satellites will likely greatly outpace any increased capacity of SSA sensors, making efficient tasking of existing sensors extremely valuable.

Cislunar space is gaining popularity with numerous missions being planned for the near future. However, operating in the cislunar space poses additional risk to satellites due to the lack of space situational awareness solutions in this regime.  Without maintaining a proper catalog of the RSOs that are currently residing in the cislunar space, our space assets are susceptible to catastrophic collisions with these untracked RSOs. The cislunar orbital regime is unique in that propagation of orbits is not easily predicted nor learned due to the complex three-body dynamics. In this paper, we describe a specific application of a trained scheduler that was developed using deep reinforcement learning with the proximal policy optimization (PPO) algorithm and population-based training (Jang, Siew, Gondelach, & Linares, 2020; Roberts, Siew, Jang, & Linares, 2021) to a ground-based narrow FOV optical sensor located on Earth for the tracking of cislunar RSOs. 

A custom cislunar SSA environment is constructed using the OpenAI’s Gym library. The cislunar SSA environment is responsible for keeping track of the RSOs’ state, generating noisy measurements for each RSOs within the sensor’s FOV, propagating and updating the RSOs’ covariance, generating the observation array for the deep reinforcement learning agent, and computing the instantaneous reward of the current action (pointing direction). For each training episode, the cislunar RSOs’ state and covariance are randomly initialized from a cislunar RSO database. The cislunar RSO database is constructed using precomputed periodic solutions in the US National Aeronautics and Space Administration Jet Propulsion Lab’s three-body periodic orbit catalog. Each precomputed periodic solution is propagated over a full orbital period and portions of the trajectories that fall within our sensor’s field of regard are extracted to create the cislunar RSO database. This ensures that most of the randomly sampled cislunar RSO will enter the sensor’s field of regard over the 2-hours observation window. Only cislunar RSOs from the following periodic orbit families are included in our study; L1 Halo orbits, L2 Halo orbits, distant retrograde orbits, and Earth-Moon 3:1 resonant orbits. The RSOs are propagated using the circular restricted three-body problem formulation without any external perturbation and an Unscented Kalman Filter (UKF) formulation is used to propagate and update the RSOs’ covariance. The optical sensor in the cislunar SSA environment is modeled based on the Pan-STARRS system located at the Haleakala Observatory in Maui, Hawaii. The deep reinforcement learning agent is then trained on the cislunar SSA environment using different reward functions. The reward function plays a significant role in the training of the deep reinforcement learning agents, where it functions as an incentive mechanism to guide the deep reinforcement learning agent to a desirable environment state or action. Poorly designed reward function can lead to slow training and bad performance. The deep reinforcement learning agent’s performance is then compared to two random schedulers across several figures of merit, including the cumulative number of unique RSOs observed and the mean uncertainties (mean trace of the covariance matrix) of all cislunar RSOs in the scenario.

Date of Conference: September 27-20, 2022

Track: Machine Learning for SSA Applications

View Paper