Deep Learning for Cislunar Object Detection

Luca Ghilardi, University of Arizona; Roberto Furfaro, University of Arizona; Vishnu Reddy, University of Arizona

Keywords: Cislunar, image processing, transformer, simulation, object identification

Abstract:

Astronomical image processing for objects in the cislunar space can be very challenging due to the low signal-to-noise (SNR) ratio, given by the high brightness of the Moon, and the significant errors in the predicted position of such objects. Furthermore, with the increasing amount of data gathered by spacecraft and telescopes, the need for efficient and accurate automated identification systems has become more urgent. Most astronomical image processing techniques for objects not in Near-Earth-Orbit(NEO) are based on taking multiple exposures at regular intervals and determining if the target exhibits consistent movement from one frame to another. The limitation of this technique is that it requires knowledge of the movement of the object in advance. Nevertheless, it is a valid approach for conducting follow-up observations of a known object.

In this paper, we want to test two encoder architectures to autonomously identify objects orbiting the cislunar space. The first encoder is a convolutional encoder, characteristic of convolutional neural networks (CNNs).  Specifically, we adopt the famous UNet. This architecture is well known for its flexibility and performance with small datasets. The UNet is already been studied in various vision-based space applications with good results.

The second method is based on vision transformers (ViT) made by Dosovitskiy et al. in their paper “An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale” In recent years, machine learning techniques have shown great promise in this field, particularly in object identification tasks. CNNs have been the most popular approach for object identification in astronomical images due to their flexibility and good performance with small datasets. More recently, a new encoder architecture based on a vision transformer is quickly becoming state-of-the-art in image processing tasks. The ViT is a deep neural network architecture that replaces the traditional convolutional layers with a series of self-attention layers introduced by Vaswani et al. in “Attention Is All You Need.” It has shown superior performance in image classification tasks when large datasets are available. ViT’s key advantages are handling high-resolution images, variable-size inputs, and better generalization, maintaining a global receptive field throughout the encoder.
Since the size of the dataset strongly influences the model’s performance. The first idea was to build a large one by simulating the optical observations using Blender’s open-source ray tracing software. However, the final goal of this research is to develop a deployable algorithm that we can use for real observations. Recent in-house tests showed us that the algorithms that perform well on synthetic data perform poorly on real ones. Therefore we decided to take real observations of the area around the moon and synthetically add a target using a 2D Gaussian distribution.
Once the dataset has been created, the authors will explore different approaches to modifying the network architecture. We will explore the models’ performance on a single image fed to the network and with a sequence of images of the same target. The model’s performance will also be evaluated with different noise levels to explore the SNR limits for reliable object identification in the cislunar space.

Date of Conference: September 19-22, 2023

Track: Cislunar SDA

View Paper