Back to Volume
Paper: Data-driven Space Science at ESAC Science Data Centre
Volume: 523, Astronomical Data Analysis Software and Systems XXVIII
Page: 409
Authors: Martinez, B.; Barbarisi, I.; Gonzalez, J.; Fernandez, M.; Laantee, C.; Merin, B.; Nieto, S.; Perez, H.; Salgado, J.; de Teodoro, P.
Abstract: For many scientists nowadays, the first step in doing science is exploring the data computationally. New approaches to data-driven science are needed due to the big increase of space science mission's data in volume, heterogeneity, velocity and complexity. This applies to ESA space science missions, whose archives are hosted at the ESAC Science Data Centre (ESDC). Some examples are: Gaia archive, whose size is estimated to grow up to 1PB and 6000 billion of objects, Solar Orbiter archive, which is expected to handle several time series with more than 500 millions of records, and Euclid archive, which shall be able to handle up to 10PB of data. The ESDC aims, as a major objective, to maximize the scientific exploitation of the archived data. Challenges are not limited to manage the large volume of data, but also to allow collaboration between scientists, to provide tools for exploring and mining the data, to integrate data (the value of data explodes when it can be linked with other data), or to manage data in context (track provenance, handle uncertainty and error). In this paper, those solutions, which ESDC is exploring in different areas for handling these challenges, will be presented. Specifically: storage of big catalogues through distributed databases (e.g., Greenplum, Postgres-XL); storage of long time series in high resolution via time series oriented databases (TimeScaleDB); fulfill data analysis requirements via Elasticsearch or Spark/Hadoop; and enabling scientific collaboration and closer access to data via JupyterLab, Python client libraries, and integration with pipelines using containers.
Back to Volume