Ofer Dagan, University of Colorado Boulder; Tyler Becker, University of Colorado Boulder; Zachary Sunberg, University of Colorado Boulder
Keywords: Hypothesis, Planning, POMDPs, Sensor Tasking
Abstract:
Introduction: Over the past few years there has been an exponential increase in the number of resident space objects (RSO), from about 19,000 to 30,000 objects in 2017 to 2024, with an expected increase to hundreds of thousands in the next decade. From a space domain awareness (SDA) perspective this poses a significant challenge. One of the core problems in SDA is catalog maintenance, it refers to the problem of tasking a set of sensors to maintain custody over an existing ‘catalog’ of RSOs. Consider a scenario when during routine monitoring following some sensor tasking schedule for catalog maintenance, an anomaly in the orbit of one of the RSOs is detected. As a result, the operator is interested in checking multiple hypotheses that might explain it, e.g., engine misfire, deployment of solar panels, or possibly an intentional maneuver. To resolve the uncertainty over the correct hypothesis, the operator would want to gather more information about the specific RSO – the object of interest (OOI) so would want to divert resources to collect more observations. But which sensor should be tasked? When should the measurement be collected? And how does it affect the original tasking schedule and the catalog maintenance task?
This work explores the problem defined as hypothesis-driven planning by applying to the sensor tasking problem. The core idea is to augment the original sensor tasking schedule with a discrete set of hypotheses and allow the planning algorithm to choose the actions that collect data on the OOI while still performing well in the original catalog maintenance task. Note that the two objectives compete, as diverting a sensor to observe the OOI instead of scanning the catalog will increase uncertainty of over the catalog. The main challenge in solving this planning problem to reason about possible scenarios of taking all possible actions, that is deciding for every sensor for every time step in the current schedule if it should change its task or not, and what are the possible outcomes of doing so.
This research addresses the challenge by extending a rigorous decision-making framework called a POMDP – partially observable Markov decision process. This enables reasoning over multiple hypotheses while allowing tractable solutions – agnostic to the size of state and observation space, using existing tree search algorithms such as Monte-Carlo tree search (MCTS). We focus on hypotheses that stem from different dynamic models that might cause an anomaly in the OOI orbit and explore different objectives (reward functions) to balance the goals of determining the most likely hypothesis while performing well with respect to the original tasking schedule.
Technical Background: The multi-sensor tasking problem can be formulated as a POMDP, an optimization formalism for decision making under uncertainty. It allows reasoning over possible outcomes for cases where the true state of the system cannot be observed. Formally, a POMDP is defined by the tuple (S,A,T,O,Z,R, γ), where S, A are the sets of all possible states and actions, respectively. In the hypothesis driven planning problem, this POMDP state includes both the orbital elements for the satellite and any information related to the hypothesis, such as the status of internal systems or intentions. T(s,a,s’)=p(s’|s,a) is a stochastic state transition model, which defines the probability of transitioning to state s’ from state s after taking action a, O is the set of all possible observations, and Z(s’,a,o)=p(o|s’,a) is the stochastic observation function, defined as the conditional probability of seeing observation o after taking action a and reaching state s’. Finally, the reward function R(s,a) determines the immediate reward the agent receives when taking action a at state s, and γ is a discount factor with values between 0 and 1.
Solving a POMDP problem requires reasoning over every possible action-observation pair for each time step in the planning horizon. In continuous domains, such as in the SDA problem, the problem becomes intractable, as there are countlessly many possible action-observation sequences, from the current time step to the planning horizon, where each such sequence resulting in a different belief over the unknown state. The problem is exacerbated in the hypothesis-driven context since each action-observation pair spawns multiple beliefs, one for each hypothesis.
Problem statement: Consider a ‘base’ sensor tasking plan or planning problem P, where each sensor is tasked with a series of observations to take to different RSOs for some time horizon (e.g., 24 hours). Suppose that at time step 0, the operator observes a surprising behavior of one of the RSOs, the OOI. The operator raises 3 questions, hypotheses, that might explain the origin of the orbit anomaly: (i) did the OOI deploy solar panels thus changing the drag of the object? (ii) did it fire the thrusters to deliberately change orbits? Or is it still trying to keep its nominal orbit?
The hypothesis-driven POMDP problem P posed in this work searches for an optimal policy π* (new tasking schedule) that maximizes the reward over the two competing requirements — deciding which hypothesis (i)-(iii) is most likely correct, while making the least significant changes to the original tasking plan P.
Contributions: To solve the hypothesis-driven tasking problem, we suggest a new formulation, that we call hypothesis-driven belief MDP. A belief-MDP can be seen as a generalization of POMDP that allows reasoning about beliefs (distributions over states) instead of states. This enables reasoning about information-gathering actions such as additional measurements or control inputs given to the system that can help resolve uncertainty and determine the most accurate hypothesis. To motivate actions that can help resolve uncertainty and determine the most probable hypothesis while still performing well in the underlying POMDP problem, we explore new reward functions, explicitly rewarding in-time decisions. Simulation results demonstrate the advantage of the new reward functions over entropy-based reward, balancing between timely hypothesis decisions and the underlying problem objectives.
Date of Conference: September 16-19, 2025
Track: Space Domain Awareness