Action-Free Inverse Reinforcement Learning for Evaluating Satellite Similarity and Anomaly Detection

David Witman, Slingshot Aerospace; Timothy Olson, Slingshot Aerospace; Brian Williams, Slingshot Aerospace; Dylan Kesler, Slingshot Aerospace; Belinda Marchand, Slingshot Aerospace

Keywords: Machine Learning, Anomaly Detection, Constellations

Abstract:

As the number and size of constellations operating in low Earth orbit expands, there is a need for understanding behavioral characteristics of interacting satellites within larger constellations. Near-real time algorithms that quantify expected behaviors and detect anomalous departures from the norm will be required for owner operators and constellation orbital neighbors to ensure safe operations in a congested and contested environment. Existing and planned space domain awareness data enable new methods to analyze the behaviors of satellites. By modeling individual satellites as actors with implicit goals, the collected sequential data can be interpreted as a series of states visited by each actor. Through this lens, we introduce a novel application of Inverse Reinforcement Learning (IRL) as a framework for representing satellite behaviors and, crucially, identifying distinctive behaviors within larger constellations. In particular, we develop a new “action-free” IRL approach that enables behaviors to be studied even in the absence of ground truth actions taken by the actors.

As a demonstration of the new IRL algorithm, we show that it can be applied to detect behavioral anomalies wherein the entity, or actor, of interest has a different behavioral intent or characteristic than its larger collective. Specifically for our purposes, we identify individual satellites (actors) that may have a mission or characteristic that is different from the remainder of its counterparts in the constellation. We developed this system on simulated collections of satellites and successfully transferred the method into the real world where we have detected distinctive behaviors in currently orbiting constellations.

Traditional forward Reinforcement Learning (RL), built on the concept of Markov Decision Processes, attempts to build a model (or policy) that ingests descriptive states and produces requisite actions that maximize a predefined long-term reward function. In contrast, IRL uses sequential observed state/action information to infer a proxy reward function that can then be used to build a policy model or replicate behaviors. The sequential decision making nature of RL/IRL allows for a unique ability to consider both short and long-term consequences of actions under given state information. Many existing IRL approaches infer the reward function in tandem with an action/policy mapping; these methods require information about both the observed states, as well as the actions taken by the agent. But there are many instances, such as space situational awareness, where definitive observed action data is not readily available or has to be derived from additional steps and observation data. To circumvent this, we outline a novel action-free IRL technique that attempts to capture the intended objectives that an actor appears to be prioritizing.

This method yields a state dependent reward function for every actor that can be compared across a larger set of actors to determine behavioral objective similarity. For the case of space situational awareness, we consider observed state information to build an individualized reward function for each constituent satellite within a larger constellation. Each individualized reward function is then used as a basis for computing similarity measures between constituent satellites across the entire constellation to build an overall similarity matrix. Given a descriptive similarity matrix for all satellites within a constellation, we use a set of existing anomaly detection methods, like Local Outlier Factor’s, to extract a score that can then be used to rank how anomalous a satellite is.

Our action-free IRL method is implemented within a larger suite of scalable algorithms that are used to consider many different features, feature cross-correlations and time scales for behavioral anomaly detection. To condense these larger results we consider this method as an ensemble member, such that many different instantiations can be represented. In this paper we show: (i) details on the novel action-free IRL method proposed here, (ii) an outline of the simulation capabilities that were used to develop this technique, and (iii) results on a simulated scenario wherein a small set of satellites was uniquely different from the larger constellation. In the results, we will demonstrate the scalability of this method and how we are able to implement it in such a way that reduces the memory footprint while maintaining a largely parallelizable structure.

Date of Conference: September 17-20, 2024

Track: Machine Learning for SDA Applications

View Paper