An exemplar in data reuse: 2015 International Space Apps Challenge
For the last four years, NASA have staged yearly Space Apps Hackathons as part of a NASA incubator innovation program. Space Apps events last two days, are held worldwide in a large number of locations, and bring together an extremely large number of people. The 2013 hackathon, for example, included 8300 participants from 83 cities in 44 countries across the world. Teams choose to tackle challenges set by the organisers, of which the majority are provided by NASA and some by partner organisations. To learn more about how challenges such as these could influence the preservation and lifecycle of space data, I attended the 2015 hackathon hosted by the Met Office in Exeter [https://2015.spaceappschallenge.org/location/exeter/].
Activities like these Hackathons are amongst others a response to the increasing need of demonstrating large-scale impact of data gathered and created by importantly funded institutions. Clearly, these types of work events are a form of reuse of data and a type of research (experimental, emergent) that is characteristic of the very nature of research itself. And both aspects constitute a challenge for dealing with them from a perspective of preservation.
For active space data preservation, events such as the Space Apps Challenge mean that a dataset long considered dormant can suddenly attract a great deal of interest, analysis, interpretation, visualisation and purpose. That's a very good thing for the dataset and its originating agency, of course: funders, creators, scientists and the general public all hope that each dataset can and will eventually be used to its fullest potential and achieve the impact of which it is capable.
During each hackathon, space-related datasets, problems and tasks are explored by participants from a broad variety of domains. Some challenges are data-led: visualise data gathered by recent studies or projects; visualise the motion of celestial objects; innovatively reprocess or analyse information to enrich it or to increase its utility for other purposes. These may be based around deep space data, but may equally explore Earth-bound observations for purposes such as environmental monitoring, identification of various classes of natural events or even open-source air-traffic tracking. Others are targeted towards supporting activity by humans and robots, either on Earth or in space: build a better space suit; software or hardware to help astronauts exercise – how do you build a FitBit for free fall? - or relax; making better use of available data to support active co-working between robots and humans, an increasingly relevant form of collaboration that will definitely benefit from well preserved data and their context.
Two days is a very short time – in fact, much less time is available, since we began work at 11am on Saturday after making project pitches and forming groups and downed tools just 25 hours later, at midday on Sunday. Even forming groups is a lengthy task, especially for first-time participants. Attendees may work in various relevant industries such as electronics, computing or the geosciences; they may research, teach in higher or further education, or study in some field or another; many attend as hobbyists. Participants come from a wide breadth of backgrounds and fields ranging from art to textile engineering via mathematics and glaciology, almost all of which prove to be relevant to some project or another.
SpaceApps Challenge is short, sharp and computationally-intensive. Nonetheless, it is long enough to achieve some impressive pieces of work. One of the winning groups at the Exeter campus took just 25 hours to build a backpropagating neural network for the classification of asteroids, extract a bunch of metrics characterising asteroid light curves and apply discrete Fourier transforms to extract frequency information, resulting in 70% accuracy in classification over a sample set.
Events such as this offer the possibility to raise the profile of space science data and to encourage innovative analysis, reuse and combination of datasets. The event continues to grow in popularity; statistics from this year's event show that 12574 participants from 135 locations participated with a total of 928 projects. From a preservation perspective, this type of rapid-innovation activity clearly raises some challenging reuse cases adding further insight into one of the research domains (space science) currently explored by the PERICLES project:
- Data must be accessible and appropriately documented for contemporary participants. You can’t expect people to reuse data if they are not able to find it and understand it.
- The wonderful chaos of heterogeneity that is the SpaceApps toolkit is a challenge in itself – every web framework, C++ library, .NET file, R import or Processing script ever written is fair game for inclusion in a SpaceApps project. As such, these projects typically resist straightforward classification or inclusion into an orderly API. The conscious decision not to impose a standardised toolkit on participants increases the accessibility of the event. However, the consequent fragmentation of projects into a multiplicity of programming languages may mean that it’s more difficult to understand what people have written, impacting on the ability to reuse material or to generate accurate archival metadata.
- Projects often make use of prototype materials, either software or hardware. This poses a question on how best to support experimentation in research with half-formed ideas and how best to relate to a moment of innovation that doesn’t prioritise production quality.
- The SpaceApps challenge does not require any particular output format which means the results are very variable and would require some thought to normalise back into a standardised form. Also the project outcomes are mostly stored on cloud platforms such as GitHub, Dropbox and YouTube, which present some inherent risks in in terms of long-term reliability and sustainability.
In summary, events as ephemeral as weekend hackathons do offer a significant challenge in preservation and sustainability terms. They make us realise that people will reuse data in a way that we can’t predict and this needs to be taken into consideration and supported if we don’t want datasets to be perceived as static. It also focuses our attention on the importance of preserving space data and actively working with new initiatives to define the criteria for sustainable reuse of project outcomes.
From a personal perspective, I enjoyed the event a great deal. Our team built a project that combined wearable biometric sensors, robotics and sensor data interpretation for human and artificial observers. We showed how an autonomous platform could investigate and respond to the plight of a human during a communications failure, using a variety of data sources and interaction modes. Project participant and Met Office Open Innovation manager Mike Saunby christened it 'the LASSIE protocol', and I'm happy to say that LASSIE [https://2015.spaceappschallenge.org/project/lassie/] won an award for Best Mission Impact. Thanks go to the project team, including Russell Taylor and Mitchel Wang on robotics, Nigel and I on biometric data collection and visualisation and Mike on flexible display technologies and comms. I'd also like to thank the MET office for running an excellent hackathon, and of course NASA. Hope to see you all again - same time next year?
Emma Tonkin is a member of the KCL research team for PERICLES. She is an information science researcher with a background in physics and holds a PhD in social and physical sensors for wearable computing.