Kyle Merry, Sandia National Laboratories; John Ossorgin, Sandia National Laboratories; Zachary Mekus, Sandia National Laboratories
Keywords: ML/AI, SDA, Transformer, Swin Transformer, Convolutional Neural Network, CNN, Feature Extraction, Backbone
Abstract:
The computer vision community is continuously developing new architectures and methodologies for neural network processing of imagery, but not all these advancements are transferable to Space Domain Awareness (SDA). Many of the popular neural network components and best practices, such as choices in normalization, preprocessing, can have negative effects on a neural network’s ability to process SDA data. Of particular interest are decisions in the feature extraction, or “backbone” layers of a neural network, which can prevent it from capturing the salient information in an SDA scene. In this paper, we analyze several popular architecture choices, observe their effects on a model’s performance against SDA image processing tasks, and make recommendations for designing backbone architectures for processing SDA data.
Neural networks provide state-of-the-art performance for nearly all common image processing tasks, such as detection, segmentation, classification, and de-noising. However, most of the publications and improvements to these models focus on the kind of imagery common on social media. Imagery of people, buildings, and vehicles taken from a camera phone differ fundamentally from the type of imagery produced for SDA, and models differ accordingly. Model architectures codify assumptions about the data that they process, such as scale, pixel value distribution, feature size, and object shapes. Many of the commonly available backbone models include components which rely on properties that are not present in SDA imagery can prevent a network from learning its task. By identifying these components and potential alternatives, we can modify advanced image processing architectures to improve their suitability to SDA. In comparison to social media or ImageNet-like imagery, SDA imagery are sparse, have high dynamic range, small features and thin objects that occupy few of the pixels in their bounding box. Sparse, high dynamic range imagery interacts poorly with the most popular normalization layers. SDA neural networks must survey large, high-resolution images while perceiving small, pixel-scale features.
In this paper, we consider components that are common in backbone models designed for ImageNet-like and remote sensing data, such as preprocessing, attention, and normalization layers, and aspects such as receptive field. We analyze the suitability of these components to processing SDA data and identify practices that significantly hinder or enhance performance. We provide recommendations for modifying popular image processing backbone models to improve their performance on SDA data, and a recommended backbone model architecture designed for general-purpose SDA image processing.
Date of Conference: September 17-20, 2024
Track: Machine Learning for SDA Applications