|
|
Paper: |
Data Sharing and Publication Using the SciDrive Service |
Volume: |
485, Astronomical Data Analysis Software and Systems XXIII |
Page: |
465 |
Authors: |
Mishin, D.; Medvedev, D.; Szalay, A. S.; Plante R.; Graham, M. |
Abstract: |
Despite all the progress made during the last years in the field of
cloud data storage, the problem of fast and reliable data storage for
the scientific community still remains open. The SciDrive project meets
the need for a free open-source scientific data publishing
platform. Having the primary target audience of astronomers as the
largest data producers, the platform however is not bound to any
scientific domain and can be used by different communities. Our current
installation provides a free and safe storage platform for scientists to
publish their data and share it with the community with the simplicity
of Dropbox. The system allows service providers to harvest from the files and derive their broader context in a fairly automated
fashion. Collecting various scientific data files in a single location or
multiple connected sites allows building an intelligent system of
metadata extractors. Our system is aimed at simplifying the cataloging and
processing of large file collections for the long tail
of scientific data. We propose an extensible plugin architecture for
automatic metadata extraction and storage. The current implementation
targets some of the data formats commonly used by the astronomy
communities, including FITS, ASCII and Excel tables, TIFF images, and YT
simulations data archives. Along with generic metadata, format-specific
metadata is also processed. For example, basic information about
celestial objects is extracted from FITS files and TIFF images, if
present. This approach makes the simple BLOB storage a
smart system providing access to various data in its
own representation, such as a database for files containing tables, or
providing additional search and access features such as full-text
search, image pyramids or thumbnails creation, simulation dataset id
extractor for fast search. A 100TB implementation has just been put into
production at Johns Hopkins University. |
|
|
|
|