Johnathan Tucker, University of Colorado Boulder; Jackson Wagner, University of Colorado Boulder; Zachary Sunberg, University of Colorado Boulder
Keywords: Adaptive Stress Testing, Deep Reinforcement Learning, Machine Learning Validation, Validation
Abstract:
Future space domain awareness systems will increasingly rely on autonomous or semi-autonomous components. For example, cyber-physical systems in the form of sensors combined with machine learning are being proposed to improve space situational awareness. As these cyber-physical systems become more complex, it is increasingly important to understand how they might fail due to natural disturbances or the actions of an adversarial actor. The problem of finding such disturbances can be framed as a Markov Decision Process, and reinforcement learning can be used to find likely sequences of inputs that will cause a cyber-physical system to fail. This approach is known as adaptive stress testing (AST). The contribution of this work is to apply AST to the field of space domain awareness, specifically finding failures in a machine learning algorithm designed to detect anomalous space object behavior.
This work has impacts on Department of Defense (DoD) agencies and defense contractors that want to validate a wide range of autonomous systems, for example telescope tasking algorithms or automated anomaly detection, that they intend to integrate into their space awareness systems. Given the increased interest in these systems, flexible ways of finding failures is highly desirable. Furthermore, after failure determination, the autonomous systems can be improved and rendered more robust and reliable for future use.
Integrating cyber-physical systems in the form of machine learning algorithms and autonomous systems into existing space systems architectures will relieve operator burden and increase operational efficiency. Despite this, the adaptation of such systems is slow in the space domain as they can fail silently and unexpectedly. Autonomous system failures can have especially drastic consequences in the space domain where the architectures are critical for both safety and infrastructure. The issue of silent and unexpected failures coupled with the wide range of potential applications for autonomous systems in the space domain necessitates a flexible framework that can determine inputs that lead to failure in these systems. This paper aims to provide a flexible framework for validating space domain autonomous systems thus making them more robust to failure. In addition, this work addresses a gap in the literature when it comes to autonomous system validation for the space domain.
Validating autonomous systems in the space domain is particularly difficult as testing them in the real world is intractable due to cost or accessibility constraints. For example, validating an autonomous telescope tasking network would drain telescope resources from their critical application of real-time space domain awareness. Past work has developed validation methods for autonomous systems that take advantage of simulations. While simulations are computationally expensive to run, especially in the space domain, they provide a robust environment that accurately describes how the autonomous system will behave. Adaptive stress testing (AST) is a simulation-based validation framework for autonomous systems that have been developed by autonomous vehicle researchers. AST frames the problem of finding failures as a Markov Decision Process where the actions are disturbance inputs and the state is a combination of the autonomous system state and the environment in which the system operates. Although the framework is flexible as originally formulated, it has not been previously extended to the domain of space situational awareness.
This paper intends to extend the above work by formulating a Markov Decision Process (MDP) that describes the interaction between a space domain awareness cyber-physical system and an adversarial reinforcement learning agent. In this MDP, the actions are disturbances to the system in the form of sensor noise and/or spacecraft maneuvers. The state transition model encompasses the operation of the spacecraft being observed and the cyber-physical space awareness system. Finally a positive reward is given to the agent if it causes the system to fail and a negative reward is given in proportion to the unlikeliness of the disturbances. This second component guides the agent to find disturbances that seem innocuous. Additional intrinsic rewards may be added to guide the agent in its learning. Following the development of the MDP, this work will show how it can be solved using both deep reinforcement learning and Monte Carlo Tree Search. Effectively, the reinforcement learning agent will interact with the autonomous system by inputting a disturbance and will then receive the next state of the autonomous system and a reward. This process will continue until the reinforcement learning agent finds a sequence of disturbance inputs that lead to a failure in the autonomous system.
Specifically, we will demonstrate this approach on a one-class support vector machine with an optimal control-based estimator (OCBE) developed by Rivera et al. for anomalous space object maneuver detection in both cislunar and geosynchronous orbit regimes. The theoretical MDP framework developed in this work will be validated by determining a sequence of disturbance inputs that lead the one-class support vector machine to miss-classify anomalous space object maneuvers.
In summary, this paper contributes the following to the field of machine learning for space situational awareness:
Application of the adaptive stress testing framework to a space situational awareness-focused cyber-physical system.
A demonstration of how deep reinforcement learning and Monte Carlo Tree Search can find failure cases for a one-class support vector machine/OCBE designed to detect anomalous space object behavior.
Date of Conference: September 27-20, 2022
Track: Machine Learning for SSA Applications