The Sensor Management Prisoners Dilemma: A Deep Reinforcement Learning Approach

Weston Faber, L3Harris

Keywords: Deep Reinforcement Learning, Game Theory, Sensor Tasking, Sensor Network Management

Abstract:

The number of sensor resources available to track Resident Space Objects has dramatically increased in recent years due to the increase of commercial sensor providers, investments in government sensor capabilities, and the need for data supporting Space Domain Awareness (SDA) and Space Traffic Management (STM). The current state-of-the-practice is for sensor providers to perform sensor collection campaigns in an independent, distributed fashion. However, end users are leveraging multiple sensor providers to meet their goals of SDA and/or STM. Since each provider is acting independently of one another, the end user receives a sub-optimal data set that contains redundant data. This can be a good thing when redundancy is needed to ensure task completion, but the end user has limited control over which tasks are redundant. One solution to this problem would be to centralize the tasking, i.e., require that tasking is performed at a single centralized location. This is not a feasible approach since providers would then have to relinquish control of their network to a single end user, limiting their ability to meet other end user requirements.
Instead of a centralized approach we encourage sensor providers to take a responsible tasking approach that meets the needs for both the provider and the end user. This responsible approach seeks end user objective optimality within a distributed network paradigm. To achieve this approach we frame the problem as a repeated simultaneous game. In doing so, we show a connection between our problem in SDA/STM sensor network management and the classical problem called, The Prisoner’s Dilemma. The Prisoner’s Dilemma is a two-player simultaneous game in which each player has on opportunity to confess or to lie. When both players confess they each get a moderately negative sentence. If one player lie’s and the other confesses then the liar receives a severe negative sentence while the player that confesses receives no penalty. However, if both players lie then they each receive a mild sentence. In this game, the Pareto Optimal occurs when both players choose to lie, resulting in mild sentences for both. However, when a player attempts to determine a dominant strategy all indications point to choosing the option to confess. This leads to a Nash Equilibrium where both players choose to confess and both receive a moderate sentence. In this paper, we model each sensor network provider as a player in the game then develop a player’s strategy using Deep Reinforcement Learning (DRL). This approach allows us to avoid Nash Equilibria and improve overall reward. Of course, the sensor management problem is more complex since different sensor networks have different capabilities and do not always have the same options. Furthermore, players do not know all of the capabilities of the other players, making it an incomplete, imperfect game. These complexities and the generalization of the problem are discussed at length in the paper. This method is applied to a real global optical network and performance improvements are shown against a myopic policy.

Date of Conference: September 15-18, 2020

Track: Machine Learning for SSA Applications

View Paper