Back to Volume
Paper: From FITS to SQL - Loading and Publishing the SDSS Data
Volume: 314, Astronomical Data Analysis Software and Systems XIII
Page: 38
Authors: Thakar, A.; Szalay, A.S.; Gray, J.
Abstract: For large astronomical databases like the SDSS Science Archive, data loading is potentially the most time-consuming and labor-intensive part of archive operations, and it is also the most critical: it is the last chance to examine the data before it is published. We attempted to automate this job as much as possible, and to make it easy to diagnose data and loading errors. We describe the sqlLoader—a distributed workflow system of modules that check, load, validate and publish the data to the databases. The workflow is described by a directed acyclic graph whose nodes are the processing modules. It is designed for parallel loading and is controlled from a web interface (Load Monitor). The validation stage represents a systematic and thorough scrubbing of the data. Finally, the different data products are merged into a set of linked tables that can be efficiently searched with specialized indices and pre-computed joins.
Back to Volume