A Modular Benchmarking Framework for Evaluating Large Language Models in Space Situational Awareness using Notice to Space Operators Data

Trier Mortlock, Lawrence Livermore National Laboratory; Ronit Agarwala, Lawrence Livermore National Laboratory; Jayson Luc Peterson, Lawrence Livermore National Laboratory; Imene Goumiri, Lawrence Livermore National Laboratory; Jason Bernstein, Lawrence Livermore National Laboratory

Keywords: Natural Language Processing, Large Language Models, Space Domain Awareness, Notice to Space Operators

Abstract:

The rapid proliferation of space assets has generated an unprecedented volume of publicly available data, presenting both opportunities and challenges for advancing Space Situational Awareness (SSA). Notice to Space Operators (NOTSOs), unclassified messages issued by the Joint Commercial Operations Cell (JCO), play a critical role in this domain by providing timely updates on significant space events such as maneuvers, launches, photometric changes, and signature anomalies. Despite their utility, the creation and analysis of NOTSOs remain labor-intensive, and their potential to inform predictive insights into future SSA events is underutilized. This work explores how large language models (LLMs), with their advanced language comprehension and reasoning capabilities, can address these challenges and enhance SSA efforts. We propose a modular benchmarking framework to evaluate the application of LLMs in SSA tasks using NOTSO data. Our framework consists of three primary components: (i) a NOTSO data ingestion pipeline that organizes unstructured NOTSO data into a searchable database, (ii) a semantic retrieval algorithm for extracting relevant contextual information, and (iii) LLMs fine-tuned or adapted to SSA-specific tasks using techniques such as supervised fine-tuning (SFT) and retrieval-augmented generation (RAG). We showcase this framework’s ability to generate a detailed knowledge graph from NOTSOs that provides increased SSA through automated relationship extraction techniques. We also present a case study on satellite maneuver prediction from real-world NOTSOs. Using our framework, we conduct extensive ablation studies to assess the impact of retrieval strategies and LLM configurations, testing models ranging from smaller-scale (1 billion parameters) to large-scale (70 billion parameters) and even larger proprietary frontier models. This work represents the first systematic investigation of LLMs for SSA applications using NOTSO data. By introducing a modular and extensible evaluation framework, we provide a foundation for benchmarking LLM performance in SSA-related tasks. Our results highlight the potential of LLMs to enhance SSA event notification, analysis, and prediction, offering valuable tools and insights for the broader SSA community. This research ultimately aims to advance the integration of AI-driven approaches in SSA, fostering collaboration and innovation in addressing the challenges of an increasingly congested space environment.

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

Date of Conference: September 16-19, 2025

Track: Machine Learning for SDA Applications

View Paper