Reinforcement Learning for Space-to-Space Surveillance: Autonomous Scheduling for Resident Space Object Imaging

Daniel Huterer Prats, University of Colorado Boulder; Hanspeter Schaub, University of Colorado Boulder; Chris Wheeler, Interactive Aptitude

Keywords: Space Situational Awareness, Space Domain Awareness, Reinforcement Learning, Autonomy, Scheduling, Space-to-Space Surveillance

Abstract:

The rapid growth of space traffic and increasing geopolitical interest in space activities necessitate advanced Space Situational Awareness (SSA) and Space Domain Awareness (SDA) capabilities. Traditional ground-based and space-based surveillance systems face limitations in coverage, resolution, and revisit rates. This work proposes a novel approach to SSA/SDA through autonomous space-to-space surveillance, where an inspecting spacecraft actively images Resident Space Objects (RSOs) and other spacecrafts. Recent advancements in space-based sensor tasking for Earth observation have demonstrated the feasibility of using neural networks for autonomous on-board scheduling, optimizing imaging sequences while dynamically adapting to mission conditions, such as under cloud coverage constraints [1][2][3]. By leveraging Reinforcement Learning (RL), the imaging spacecraft dynamically schedules imaging actions while managing onboard power and momentum resources, thus optimizing long-term observational efficiency. These prior works using RL-based satellite autonomy primarily focus on Earth observation, where imaging targets exhibit easily predictable motion relative to an Earth-centered reference frame [4]. In contrast, RSOs follow diverse and evolving orbital trajectories as they can fly at a different altitude, eccentricity and inclination with respect to the imaging spacecraft, introducing new challenges in observation scheduling and visibility management.

Our approach models the space-to-space surveillance problem as a Markov Decision Process (MDP) where the RL-agent must select from a set of actions, including imaging target spacecrafts, downlinking, charging its battery, or performing momentum dumping. The core methodology of this research involves developing and training a RL-based scheduling agent within Basilisk*, the high-fidelity simulation framework providing precise orbital dynamics modeling, sensor emulation, and spacecraft subsystem interactions. Imaging is conducted by aligning the inspecting spacecraft’s camera boresight axis with the target vector. Targets are assigned priorities, which weight the reward function, influencing the RL agent’s decision-making process. 

Unlike traditional Earth observation scheduling, where ground targets typically remain fixed relative to an Earth-centered reference frame, the dynamic nature of RSOs introduces new challenges compared to the prior Earth observation scheduling [1-4]. Targets may traverse a wide range of orbital regimes, exhibiting diverse inclinations, velocities, and relative motion patterns. The RL-agent must continuously assess and predict target movement to select the optimal imaging sequence while ensuring safe spacecraft operations. Moreover, there are constraints as to which targets can be imaged at any given moment, determined by the line-of-sight with the inspecting spacecraft and the eclipse status as only the ones in visible range will be considered. Determining which RSOs are visible at a given point in time presents a big challenge due to their dynamic nature and the horizon of the earth blocking a large portion of the field-of-view. An additional constraint will be the eclipse state of the target spacecraft, meaning that targets that are eclipsed by earth, and hence not-illuminated, cannot be successfully imaged. Other constraints, such as the type of attitude maneuver allowed, which must avoid exposing the imaging instrument to direct sunlight, presenting an additional complication that is not present for most earth-observing satellites, will be considered for future work.

One key aspect of the methodology is the integration of onboard resource management. The spacecraft must balance imaging operations with energy constraints, accounting for battery charge levels, and reaction wheel states as well as data storage limitations. Excessive reaction wheel torque accumulation necessitates momentum dumping maneuvers, which can temporarily limit imaging capabilities. The RL-agent must autonomously navigate these constraints, maximizing surveillance effectiveness while ensuring long-term operational sustainability. The main optimization metrics used for comparison of the overall surveillance performance are the rewards obtained by the agent as well the total number of imaged targets over an episode. Another investigation will be the gradual rewarding of downlinking illuminated images. This serves to simulate the utility of having the most recent images on the ground, where operators can use them for space intelligence and SSA.

Expected outcomes of this research include the development of an RL-driven scheduling policy capable of dynamically prioritizing RSO targets maintaining continuous awareness of multiple RSOs. The trained agent should exhibit the ability to:

Efficiently select and switch between targets in a highly orbital regime (LEO).
Maintain onboard resource balance while ensuring mission longevity.
Ability to adapt to changing reward balances including downlink and imaging thresholds.

The importance of this work for SSA and SDA is in the application of state-of-the-art task scheduling for Earth-Observation Satellites applied to Space-to-Space surveillance. Current space surveillance techniques predominantly rely on ground-based sensors which are subject to an earth-fixed field of view and are obstructed by atmospheric effects, whereas this research adds to a shift toward active, autonomous, and adaptive space-based surveillance which benefits from imaging unobstructed by Earth’s atmosphere and a dynamic field of view. 

This work contributes to the goal of maintaining a secure and sustainable space environment by enhancing the understanding of targeting RSOs from a spacecraft, a task significantly more challenging than pointing at an Earth-fixed target due to the complex dynamics, lighting and line-of-sight constraints. Future extensions of this work could introduce the uncertainties of the RSO targets’ states which shall be reduced through continuous imaging by the primary spacecraft. Later expansions on this work could involve multi-agent collaborations, where multiple inspecting spacecraft coordinate to provide persistent coverage of a specific area of interest above the ground. Finally, the integration of additional sensors, such as radar, IR or hyperspectral imaging, could further enhance tracking fidelity, especially when the targets are in eclipse. Ultimately, the proposed methodology represents a step toward a more resilient and intelligent SSA/SDA infrastructure, ensuring continued space security in an increasingly congested orbital domain.

*https://avslab.github.com/basilisk

[1]       P. M. Siew, D. Jang, T. G. Roberts, and R. Linares, “Space-Based Sensor Tasking Using Deep Reinforcement Learning,” J. Astronaut. Sci., vol. 69, no. 6, pp. 1855–1892, Dec. 2022, doi: 10.1007/s40295-022-00354-8.

[2]       L. Q. Mantovani, Y. Nagano, and H. Schaub, “REINFORCEMENT LEARNING FOR SATELLITE AUTONOMY UNDER DIFFERENT CLOUD COVERAGE PROBABILITY OBSERVATIONS”.

[3]       M. A. Stephenson and H. Schaub, “Optimal Agile Satellite Target Scheduling with Learned Dynamics,” J. Spacecr. Rockets, vol. 0, no. 0, pp. 1–12, doi: 10.2514/1.A36097.

[4]       M. Stephenson and H. Schaub, “REINFORCEMENT LEARNING FOR EARTH-OBSERVING SATELLITE AUTONOMY WITH EVENT-BASED TASK INTERVALS”.

Date of Conference: September 16-19, 2025

Track: Machine Learning for SDA Applications

View Paper