Science - Space Science scenario
Semantic models of data dependencies
Image: Solar on the ISS Columbus module, Courtesy of the European Space Agency
One of the PERICLES case study partners, B.USOC, supports experiments on the International Space Station (ISS) and is the curator of both the raw science and operations data. Space Science experiments are typically very expensive to design, develop, and operate, as are the results to disseminate. Many different stakeholders are involved in setting up and running the experiments, such as; the engineers who take care of the design, development, and maintenance; the mission operators that plan, coordinate, operate, and monitor experiments and accompanying activities; space agencies that own data and provide infrastructure; and finally, the scientists that process the results to gain scientific knowledge.
Each of these different stakeholder groups have their own specific experimental objectives yet they all need to follow the appropriate rules and guidelines set within the context of the broader mission. In particular, solar scientists typically concentrate on analyzing only the data generated by their particular experiment. Mission operators on the other hand often consider only the data which impacts the daily operation of the payload. Ancillary data is only used for exceptional troubleshooting or for failure analysis. Therefore the data identified or pre-defined as preservation-relevant is specific to each experiment. The breadth of data collected or used during operations is therefore very wide and varied, and may include for example commands sent to the ISS, telemetry coming back (containing scientific data but also status information on the experiment), operator documentation (logs, meeting minutes, anomaly reports, regulations, and schedules), various types of engineering documents (design, manuals, certifications, and reports), scientific outputs (processed raw data, papers) and so on.
Typically the end-to-end cycle of a space science experiment, from initial concept to final dissemination of the scientific results, runs over multiple years or even decades. In most cases it is practically impossible to replicate experiments to obtain identical results of previous runs, as the conditions in space are continually changing. For example the Space Station constantly shifts in position, the characteristics of the space environment such as the solar radiation are constantly in flux, and experimental hardware deteriorates.
Many types of change can influence and alter aspects of these experiment lifecycles in one way or the other, such as changes in regulations and rules, processes and procedures, data protocols, movement of people and teams, changes in an experiment’s input variables, as well as experimental hardware status and configuration. All of these factors can ultimately impact the scientific outputs of the experiments. However, for experiment results to be useful for other scientific communities (e.g. climate scientists), as well as future engineers, operators and data owners, not only the raw data are required but also appropriate environmental metadata and documentation to interpret and potentially re-use this data.
One of the science scenarios within PERICLES is to investigate and propose solutions to the problem of preserving the knowledge relevant for and captured in the context of space experiments or operational tasks. This preservation requires modelling the experiment and its environment to create a semantic model of what knowledge is relevant and how different pieces of knowledge interact or depend on each other. Various tools can then link to knowledge and data sources (such as manuals, incoming telemetry links, logbooks), perform semantic extraction on them, and help in populating this semantic model with the collected data using all sources including the scientific products delivered by the scientists and their production process.
Re-use of the data constitutes the last step. This re-use can have several motivations: firstly a better knowledge of physical data entering the scientific retrieval process leads to production of a new version. This process is frequent in earth observations where the spectroscopic databases used in the retrieval improve continuously, for example, in the case of terrestrial ozone mapped from a satellite, complete reprocessing happens on an average time scale of two years. It can also be justified by improvements in retrieval algorithms, in the case of SOLAR ISS data, it is a consequence of new analysis of the in-flight calibration procedures.
A second case of reuse is when new scientific objectives appear. This is the case for solar spectral irradiance, where short term variations have been discovered. Then the final scientific results change from long term averages to a succession of observations for which the time fluctuations become important as they could for example be the master clock of oscillators in the earth system. This second case is the one for which complete preservation of the entire data collection becomes mandatory. This complex repository can then only be efficiently treated using the concepts and tools developed in PERICLES.
David De Weerdt, SPACEAPPS - SpaceApps Project Manager (firstname.lastname@example.org);
Simon Waddington, KCL- Research Fellow, Centre for e-Research (email@example.com);
Christian Muller, B.USOC, Science Coordinator (firstname.lastname@example.org)