Deep Reinforcement Learning Applications to Space Situational Awareness Scenarios

Benedict Oakes, University of Liverpool; Jason F. Ralph, University of Liverpool; Jordi Barr, Defence Science Technology Laboratory

Keywords: Deep Reinforcement Learning, Machine Learning for SDA Applications, Space Situational Awareness

Abstract:

Satellites have become an essential element of society in the 21st century, with a wide array of use and requirement, from communication and environmental monitoring to GPS and remote sensing. As such, it is of paramount importance to ensure that the space domain is well understood, so that we can maintain a functioning system for custody and cataloguing of orbiting bodies. However, it is harder than ever to properly monitor the space domain due to the significant number of new satellite launches. With limited sensing availability, there is a mismatch of sensor resources to targets as the number of targets exceeds the number of sensors. This is difficult in all the orbital regimes, but we focus here on the LEO regime where targets pass quickly over an individual sensor location.

In recent years, an increasing amount of research has been conducted into the applicability of reinforcement learning to a range of domains – including problems involving resource allocation or assignment – after successes in applying RL algorithms to different games with complex action spaces, such as StarCraft and Go [1], [2]. The impressiveness of these results relate to the complex action spaces that the RL algorithms had to learn policies for. RL is apt for modelling complex, high-dimensional problems that may take classic rule-based algorithms a long time to solve. Another advantage of reinforcement learning is that once a policy has been learned, it is readily and quickly applicable to other scenarios, where a rule-based algorithm would be forced to recalculate the policy.

RL has been successful in a wide range of problems and there has been some research conducted into the applicability of RL to the sensor scheduling problem in SSA, but this area remains in its infancy. Others have considered formulations of sensor scheduling for SSA, with some solutions for ground-based and space-based sensors [3, 4, 5, 6, 7, 8, 9, 10]. Although RL is widely applicable to the SSA problem in many forms, there have been relatively few in-depth studies testing the suitability of RL to solving the types of assignment problems found in sensor scheduling for SSA, including it’s applicability to the LEO regime.

In this paper, we demonstrate the use of 2 RL environments. In both, we consider a controllable Earth based sensor, that is free to point in different directions in the field of regard (FoR). We simulate satellites in LEO orbits that pass through the FoR. The sensor can make measurements of satellites if they appear in the sensor’s field of view (FoV). The inferred state of the target is estimated using an unscented kalman filter (UKF). In the first environment, the RL agent (the ground based sensor) tracks a known number of targets, and must select which targets are most appropriate to look at. In the second environment, the agent is not aware of where the targets are, and must move in the field of regard to locate potential targets. To generate sensor scheduling policies for these scenarios, we apply RL algorithms including Double-Deep Q-Network (DDQN), Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) to the environments to generate policies.

To evaluate the performance of RL algorithms in our environments, we compare the policies produced to a greedy policy, where at each time-step of the scenario, each possible action is considered and the best myopic (one-step ahead) action is chosen. We also inspect the results from the UKF and observe the change in target uncertainty throughout the observation period. We see a significant reduction in uncertainty for tracked objects using the RL policies.

This paper demonstrates the use of RL in the SSA domain. With some time-constrained offline learning, the agents learn policies that are capable of either choosing targets or locating them. These policies are general and work well on targets with different orbits, with the time to deploy a learned policy being very short. In future work, RL can be adapted to consider more advanced scenarios, where one is interested in unexpected target manoeuvres or collision events.

[1] Oriol Vinyals et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
[2] David Silver et al. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016.
[3] Daniel Jang, Peng Mun Siew, David Gondelach, and Richard Linares. Space situational awareness tasking for narrow field of view sensors: A deep reinforcement learning approach. 71st International Astronautical Congress. International Astronautical Federation, the International Academy of Astronautics, and the International Institute of Space Law, 2020.
[4] Peng Mun Siew, Daniel Jang, Thomas G Roberts, Richard Linares, and Justin Fletcher. Cislunar space situational awareness sensor tasking using deep reinforcement learning agents. 23rd Advanced Maui Optical and Space Surveillance Technologies, 2022.
[5] Peng Mun Siew, Tory Smith, Ravi Ponmalai, and Richard Linares. Scalable multi-agent sensor tasking using deep reinforcement learning. 24th Advanced Maui Optical and Space Surveillance Technologies, 2023.
[6] Peng Mun Siew, Daniel Jang, Thomas G. Roberts, and Richard Linares. Space-based sensor tasking using deep reinforcement learning. The Journal of the Astronautical Sciences, 69(6):1855–1892, 2022.
[7] R. Linares and R. Furfaro. An Autonomous Sensor Tasking Approach for Large Scale Space Object Cataloging. Advanced Maui Optical and Space Surveillance (AMOS) Technologies Conference, page 55, January 2017.
[8] R. Linares and R. Furfaro. Dynamic sensor tasking for space situational awareness via reinforcement learning. In Advanced Maui Optical and Space Surveillance Technologies Conference, page 36, September 2016.
[9] Ashton Harvey, Kathryn Laskey, and Kou-Chu Chang. Machine learning applications for sensor tasking with non-linear filtering. Sensors, 2021.
[10] Ashton E. Harvey and Kathryn B. Laskey. Online learning techniques for space situational awareness (poster). 2019 22th International Conference on Information Fusion (FUSION), pages 1–7, July 2019

Date of Conference: September 17-20, 2024

Track: Machine Learning for SDA Applications

View Paper