On-board, Autonomous, Hybrid Spacecraft Subsystem Fault and Anomaly Detection, Diagnosis, Root Cause Determination, and Recovery

Richard Stottler, Stottler Henke Associates, Inc.; Sowmya Ramachandran, Stottler Henke Associates, Inc.; Chris Healy, Stottler Henke Associates, Inc.; Abhimanyu Singhal, Stottler Henke Associates, Inc.; Evan Finnigan, Stottler Henke Associates, Inc.

Keywords: Autonomous Fault Detection, Space Domain Awareness (SDA), Fault Detection Diagnosis and Recovery, Model Based Reasoning (MBR), Machine Learning (ML), Artificial Intelligence (AI), Thermodynamic Variables

Abstract:

An important component of Space Situational Awareness (SSA) / Space Domain Awareness (SDA) is knowledge of the true status of friendly assets and whether any assets are under attack. Therefore, it is important to be able to detect faults and other anomalies, and determine the components involved and the root cause and whether that root cause is likely an external attack. Because these attacks may be both physical and/or by interfering with communications, it is also imperative that the satellite already have onboard an autonomous ability to, after diagnosis and root cause determination, determine the best method to recover mission capabilities, schedule the required recovery plan, and adaptively execute it.
Traditionally, Fault Detection, Isolation, and Recovery (FDIR) systems have utilized Model Based Reasoning (MBR), which requires knowledge of the subsystem design and the behavior of components down to the desired level of diagnosis. To the degree this information is readily available, it is important to make good use of it. However, the field of machine learning (ML) has also shown that systems can also learn, off-line, the normal behavior of complex systems in many different environments and states, and then detect abnormal behavior in real-time. These system can also be trained with known abnormal states, and recognize these more specifically when they occur.
This paper will describe progress on this work since our last paper presented at AMOS 2020. This includes further development and generalization of the hybrid approach to fault detection, diagnosis, and recovery; applying that approach to additional subsystems including a hardware model of the gateway electrical power system (EPS), to the exploration EVA Mobility Unit (xEVA) Portable Life Support System (PLSS) CO2 Removal and Thermal Management Subsystems, International Space Station (ISS) Urine Processing Assembly (UPA), and NASA Ames’ Graywater Recycling System; and integrating them with NASA’s core Flight System (cFS) and NASA JSC’s IRIS architecture.
The additional subsystems brought new challenges to be overcome. For example, the xEMU PLSS CO2 Removal subsystem included complex cyclic behavior and states due to the use of twin amine beds that were each alternatively switched between exposure to the ventilation system to remove CO2 and exposure to the vacuum of space to off-gas the previously absorbed CO2, thus refreshing their absorbing capability. These challenges led to the development a third, independent method for detecting anomalies, based on an analogy to thermodynamic variables. The Model-Agnostic Thermodynamic variable Anomaly Detection (MATAD) system performs automatic Characterization and Diagnosis of subsystem anomalies.
The hybridization emphasizes the benefits of each approach and mitigates the disadvantages. The benefits include the ability to detect and diagnose anomalies never before encountered; work well on Day One of in-space operations; effectively utilize existing design knowledge; succeed without large amounts of data; explain the reasoning and be human understandable; be flight certifiable; behave predictably; diagnose down to the lowest modelled component level; handle rare but modeled operating conditions; execute very quickly; discover unknown and subtle relationships (even across subsystems); and provide extra certainty of the diagnosis when all three approaches agree.
The design includes a Fault Detection module which consists of the three independent technologies for fault detection: MBR, SOMs, and MATAD. The Fault Detection module receives fault detection notifications from each of the three technologies, over time, and executes a Confirmation/Reconciliation procedure which considers each input (across both the three technologies and across different periods of time) and forwards the combined result to a Diagnosis module, which uses the given information and MBR to identify the specific faulty component and likely root causes, if possible, or a candidate set of culprits, otherwise. In the latter case, the Diagnosis module can often automatically narrow down this candidate set, over time, to the one responsible faulty component. In addition to their usual role detecting faults, SOMs can also supply diagnostic information, to the degree that the current fault happens to be one of the known faults that the SOMs were trained for. Otherwise, the SOMs will simply identify the telemetry data as anomalous and indicate which features are most important for this determination. Additionally, a Characterization module will be monitoring the behavior of components over time and updating its models and sending these updated models to the MBR modules to allow them greater precision in their determinations. Automatic planning, scheduling, and execution components determine a recovery plan, schedule the necessary actions, the adaptively execute the schedule.
The paper will describe the design and algorithms and present experimental results associated with testing on the subsystems listed above.

Date of Conference: September 27-20, 2022

Track: Space-Based Assets

View Paper