Autonomous, Hybrid Space System Fault and Anomaly Detection, Diagnosis, Root Cause Determination, and Recovery

Richard Stottler, Stottler Henke Associates, Inc.; Sowmya Ramachandran, Stottler Henke Associates, Inc.; Chris Healy, Stottler Henke Associates, Inc.; Abhimanyu Singhal, Stottler Henke Associates, Inc.

Keywords: Machine Learning (ML), Model-Based Reasoning (MBR), Automatic Fault Detection and Diagnosis, Automatic Root Cause Determination, SSA/SDA, Lunar Rovers, Lunar Power Generation, Spacecraft Mechanical Systems

Abstract:

An important component of Space Situational Awareness (SSA) / Space Domain Awareness (SDA) is knowledge of the true status of friendly assets and whether any assets are under attack. Therefore, it is important to be able to detect faults and other anomalies, and determine the components involved and the root cause and whether that root cause is likely an external attack. During space conflict, communications to satellites may be disrupted, requiring them to intelligently and autonomously “take care of themselves”, i.e. effectively detect faults, diagnose the root cause, and develop and execute a recovery plan, autonomously, without necessarily being able to communicate with ground controllers. This lack of communication is analogous to lunar rovers and power systems where communication can be disrupted by terrain and other factors.

Astrobotic, for NASA, is developing a rover that traverses over the lunar surface to an advantageous position, then unfurls a 60’ high photovoltaic mast to provide power for other lunar systems. Astrobotic’s Vertical Solar Array Technology (VSAT) will egress from its lander, transit to desired location (near the lunar South Pole), “wiggle” into the lunar soil, and then deploy a 60’ high solar array to generate and then distribute power to other lunar systems. The VSAT will include several subsystems, such as mobility, internal and external (to provide power to external systems) electrical power systems, thermal management, and array deployment, each of which must work smoothly in order for the operation to succeed. As the VSAT moves around the surface of the Moon, sensors are constantly providing information on how much traction is available and how quickly the rover is moving. As the solar array is unfurled, a gimbal system and inertial measurement units (IMUs) continuously monitor the array’s movement, including any lean. If the array leans too much, the entire rover is at risk of tipping over and failing the mission. Since the array is so tall compared to VSAT’s wheelbase, even just a few degrees of lean would be disastrous. In addition, driving on the Moon might cause communications to be lost, forcing the rover to navigate to a location where communications can be reestablished or to attempt to reach the intended destination autonomously, and large thermal swings between sunlight and shadow mean thermal management is ever critical.

So it is important that VSAT be equipped with the means to quickly detect problems, perform diagnosis and root cause determination, determine the best method to recover mission capabilities, schedule the required recovery plan, and adaptively execute it. True SSA/SDA requires an understanding of the true state of all space-related assets and their ability (or lack thereof) to continue to perform their primary missions.

Traditionally, Fault Detection, Isolation, and Recovery (FDIR) systems have utilized Model Based Reasoning (MBR), which requires knowledge of the subsystem design and the behavior of components down to the desired level of diagnosis. To the degree this information is readily available, it is important to make good use of it. However, the field of machine learning (ML) has shown that systems can also learn, off-line, the normal behavior of complex systems in many different environments and states, and then detect abnormal behavior in real-time. These systems can also be trained with known abnormal states, and recognize these more specifically when they occur.

This paper will describe progress on this work since our last paper presented at AMOS 2022. This includes further development and generalization of the hybrid approach to fault detection, diagnosis, and recovery and applying, that approach to the most critical aspects of VSAT.

The new types of subsystems (such as mechanical components and related sensors) brought new challenges to be overcome. Some concerns included quick reaction times needed to avoid tipping during mast deployment and, at the other end of the spectrum, detecting very slow changes that are hard to discern in sensor noise . (The mast moves very, very slowly while tracking the sun). In some cases, data is severely limited, reducing the applicability of ML.

These challenges led to the development of a third, independent method for detecting anomalies, based on an analogy to thermodynamic variables: the Model-Agnostic Thermodynamic variable Anomaly Detection (MATAD) system performs automatic Characterization and Diagnosis of subsystem anomalies. Similar to how actual thermodynamic values such as pressure and temperature help to summarize in macro form the condition of a large number of micro aspects (e.g. the speeds of individual molecules), MATAD loosely uses this same concept to summarize groups of sensor values. Examples include mean or variance over the last N received datapoints for a sensor. Others are min, max, and max jump between two samples. Additional functions utilized Fourier Transforms (FTs) of the incoming data stream, generating additional variables such as average frequency, min frequency, max frequency, peak frequency, and amplitude of peak frequency. Note that every generating function is defined over the last N samples so that one function may be “mean over the last 100 samples” and another may be “mean over the last 500 samples.” Because looking at multiple time scales can be helpful, TRIAD maintains multiple “versions” of each type of function where each version corresponds to a different N.

The hybridization emphasizes the benefits of each approach and mitigates the disadvantages. The benefits of the hybrid system include the ability to detect and diagnose anomalies never before encountered; working well on Day One of operations; effectively utilizing existing design knowledge; succeeding without large amounts of data; explaining the reasoning and being human understandable; being rigorously certifiable; behaving predictably; diagnosing down to the lowest modelled component level; handling rare but modeled operating conditions; executing very quickly; discovering unknown and subtle relationships (even across subsystems); and providing extra certainty of the diagnosis when all three approaches agree.

Date of Conference: September 19-22, 2023

Track: Space Domain Awareness

View Paper