Imperfect Information Games and Counterfactual Regret Minimization in Space Domain Awareness

Tyler Becker, University of Colorado Boulder; Zachary Sunberg, University of Colorado Boulder

Keywords: counterfactual regret minimization, game theory, sensor tasking, custody maintenance

Abstract:

As space becomes an increasingly contested domain, it will become more important for competitors to manage their space domain awareness (SDA) resources shrewdly. Game theory provides a principled mathematical framework for choosing actions that accomplish a goal while interacting with other actors. This work will use a partially observable game, also known as an extensive-form game, to model and solve a satellite custody maintenance problem. In particular, the method proposed in this work will yield sensor tasking strategies that are unexploitable, that is, a satellite operator who wishes to leave the custody of the sensor can have no better strategy than playing according to the Nash equilibrium that the method calculates.
The resulting strategies are stochastic so that they can keep the satellite operator guessing as to when their satellite is being observed, and they have the potential to take complex orbital dynamics into account, a consideration that would be difficult for human planners to handle without computational aids.

Space domain awareness can involve rapidly changing strategies and deception. Space superiority will be determined by ones ability to automatically account for both shifts in others strategies as well as any possible deception behind each of these strategies. Either player in a non-cooperative SDA game will undoubtedly have some uncertainty about the physical state of the other actor’s equipment, their strategy, or how their strategy may evolve. For this reason, this paper aims to both plan for and engage in difficult-to-exploit actions by implementing counterfactual regret minimization (CFR) which has enjoyed success in challenging information-focused games like Poker and takes into account all aforementioned uncertainties. State and player strategy uncertainty is accounted for by planning over information states rather than the physical states themselves. Each information state is a set of all physical states that are plausible given the agent’s observations. The uncertainty in how the other player’s strategy may evolve is accounted for by playing a Nash Equilibrium strategy.

Optimal control techniques yield a control policy that is a deterministic mapping between state and action. In contrast, when a sequential game is partially observable, a Nash Equilibrium strategy may be stochastic. To illustrate why stochastic strategies might be useful in the satellite custody problem, consider what could happen if a known deterministic strategy were used for sensor tasking: the satellite operator could deliberately or accidentally find a strategy that directly exploits this deterministic mapping by making a custody-breaking maneuver between measurements. On the other hand, the stochastic strategy allows for a certain probability of feigning advantageous or disadvantageous states through bluffing, preventing the other player from immediately inferring state information and developing exploitative counter-strategies.

One particular challenge for applying game theory to space domain awareness is the relatively expensive dynamics integration required for constructing the game tree. Previous methods for solving games with differentiable dynamics have used iterative linear quadratic game formulations to achieve a continuous control Nash Equilibrium, but they do not account for partial observability and the consequent possibility of deception. On the other hand, previous CFR methods take into account this partial observability at the cost of relatively simple generative models revolving around table-top games. To reconcile these inequities, we are working to devise a method for operating with both computationally costly orbital dynamics and partial state observability. This work uses sparse Monte Carlo methods to eschew exhaustive game tree traversals.

Ultimately, the Nash equilibrium strategies yielded by solving this problem will require less oversight due to their robustness against all possible counter-strategies. Furthermore, even if the strategies recommended by this game-theoretic approach are not used directly in practice, the solutions can be used to estimate how much sensor budget will be needed to detect or deter unwanted behavior.

At an even higher level, game theory can be used to analyze the performance and deterrence capability of entire sensor networks or inform policy decisions and enforcement strategies. For instance, when designing sensor networks, game theory results can inform how many and what type of sensors will be needed to accomplish situational awareness goals and where to place these sensors for maximum effect. Because mixed strategies are a superset of deterministic pure strategies, when another agent is trying to exploit the strategy, the optimal mixed strategy found with CFR is guaranteed to perform just as well if not better than any deterministic strategy. The implication this could have on SDA is a higher sensor tasking strategy effectiveness at a lower cost. This lower cost could be realized through fewer required scanning actions, or even fewer total sensors resulting in lower maintenance costs.

In summary, this paper contributes the following to the field of space domain awareness:

New methods for solving imperfect information extensive form games on continuous state and action spaces.
Unexploitable game theoretic SDA sensor tasking strategies.
Tools for analyzing the effectiveness of SDA sensors and sensor networks in a non-cooperative multiplayer environment.

Date of Conference: September 27-20, 2022

Track: SSA/SDA

View Paper