Resolving Conflicts in Anthropogenic Space Object Data Through Weight Distribution Networks with Embedded Data Curation

Nevan Simone, The University of Texas at Austin; Maria Esteva, University of Texas at Austin; Moriba Jah, The University of Texas at Austin

Keywords: object resolution, data curation, PageRank, space domain awareness

Abstract:

An Anthropogenic Space Object (ASO) has properties that characterize it, denote its origin, and describe its orbit at an epoch. These properties are provided by data collections from different sources to ASTRIAGraph (http://astria.tacc.utexas.edu/AstriaGraph/), a knowledge system to explore space sustainability. Across collections, data for a particular ASO may vary significantly due to modeling, observational, or human errors. In ASTRIAGraph, different collections provide information about the same ASO many times, allowing for conflicting inputs on its characteristics and orbital state. While data may be generated with the best knowledge and instrumentation available to each source, conflicts in the data must be resolved to achieve transparency, predictability, and accountability in Space Domain Awareness (SDA). To address this problem, we devised metrics to suggest which collection is more reliable in providing a given field value. Informed by data curation best practices, the metrics use the PageRank algorithm (PR) to assign and distribute weight across the fields present in the different collections. The outcome is a resolution of conflicting values to enable more precise ASO identification and characterization. 

PR is used by search engines to rank websites using a weight distribution network that determines the probability of arriving at one web page from another through embedded hyperlinks. Pages are represented as nodes, and hyperlinks are modeled as the linking edges between them. Each node contains a weight, and a loop through all nodes distributes weight along the edges. A node’s new weight is the sum of all weights on incoming edges. The process ends when each node weight converges, and the resulting values are interpreted as the relative significance of web pages. A page’s weight can increase when more pages link to it, or it can decrease if hyperlinks referencing it break.

ASTRIAGraph gathers data from seven collections on a continuous basis. Data is both static (e.g. ASO’s identifiers and launching states) whose values do not change, and dynamic (e.g. orbital parameters), changing often, even daily. ASTRIAGraph is implemented in a Neo4J graph database where the data from the different collections is integrated using a unifying data model to normalize the field labels during the data ingest process (See data model field labels at http://astriaservices.tacc.utexas.edu/liveschema). To resolve differences between the data values contributed by different collections about an ASO, we adapted PR to the structure of the ASTRIAGraph database. In our implementation, a node is defined as a field present in a collection including current and historical data, and linked to other nodes in the same collection and to similar nodes across collections. The methods used to assign and distribute weights are based on curation best practices.

Data curation entails maintaining, preserving, and assuring the quality of research data. Fundamental to curation is evaluating the degree of reliability of a collection to provide data that can be used with confidence. In this work we implement reliability through three metrics: a) completeness, b) coincidence, and c) consistency. Completeness measures exhaustiveness in the number and variety of nodes in a collection, and whether their values are blank. Coincidence refers to whether or not nodes defined by the same property contain agreeing values across collections. Consistency examines changes and blanks through the historical data provided by a collection. Our weight distribution network connects all nodes in the same collection and similar nodes across the different collections. Every link between nodes is assigned a scaling factor to set what portion of distributed weight is sent to each connected node. Completeness is a set weight which is penalized for every blank value in the node’s data. Completeness and consistency sum to generate each node’s initial weight. Coincidence determines the link scaling factors across collections by computing the percentage of shared values between nodes. The new weight of a node is the sum of all weights on incoming links, completing the loop of weight distribution. This loop terminates when the weights of all nodes converge to an acceptable  limit. 

The results of the study show that nodes’ final weights cluster collections in three tiers of reliability. A collection aggregates more weight if it contains more nodes with data for a large number of ASOs, and shows fewer blank values and value changes over time. These traits maximize the weight gained from all three metrics. Collections that share fewer similar nodes with other collections form a second tier of reliability. In this case, node weights are more strongly impacted by completeness and consistency than by coincidence. If one collection’s data is a subset of another, the lesser coincidence for one collection leads to its nodes having the lowest final weights, along with collections with nodes with many blanks. Results of the three metrics were evaluated qualitatively by a curator and a domain expert. They concluded that the metrics are useful to resolve data conflicts by suggesting which of the collections involved provide a more reliable data value, and specially when assessing static data. Indeed, they noted the complexities of evaluating dynamic nodes due to the precision of their values, which are sensitive to physical models and time of observations (e.g eccentricity, ballistic coefficient). They suggest that this is an area of further study requiring the design of more elaborate comparison algorithms. 

Due to the sheer volume and the diversity of data collections, ASO data curation cannot be manually conducted. Instead, patterns in different collections over time can be used to resolve data discrepancies according to a weighting informed by curation best practices. The metrics are holistic because they indicate reliability of a collection’s nodes in relation to the quality of the collections providing data to a knowledge system. Curation-based weight distribution metrics can provide ever-evolving insights and fair assessments on the data that supports space domain awareness at a scale not currently available.
 

Date of Conference: September 19-22, 2023

Track: Space Domain Awareness

View Paper