ASPCS
 
Back to Volume
Paper: Scaling Up Data Cube Indexing Services for Content-based Searches In the Chilean Virtual Observatory
Volume: 521, Astronomical Data Analysis Software and Systems XXVI
Page: 535
Authors: Mendoza, M.; Barrientos, A.; Araya, M.; Solar, M.
Abstract: Content-based search tools are key building blocks for the construction of large scale virtual observatories. Recently, we created an automatic method for data cube indexing (Araya et al. 2016) capable of automatically detecting and recording ROIs while reducing the necessary storage space. Currently, we are putting our codes in the production pipeline of ChiVO, the Chilean Virtual Observatory, an initiative which belongs to IVOA and seeks to provide the capability of content-based searches on data cubes to the astronomical community. In this paper we show how to scale up our first prototypes to a large-scale data center. Efforts involved in automatic molecular line labeling are the main focus of this paper. In specific, we propose a new method for spectra modeling, that uses Splatalogue as a training data set to learn species and transitions in new/unseen data cubes. Our model is based on Latent Dirichlet Allocation, a probabilistic generative model that is capable to capture the co occurrence of emision lines in different channels. Our model uses Splatalogue to create a channel vocabulary, processing each species as a document. The model comprises a collection of species/transitions in a comprehensive collection of channel-energy pairs. In addition, we extend the model using Labeled Latent Dirichlet Allocation, exploring the capabilities of our approach to label lines in an unsupervised fashion. To the best of our knowledge, this is the first time that probabilistic generative models are used to label spectra in astronomy, being the related work devoted to discriminative models (as SVM-based classifiers). The main advantage of our proposal is the ability to model sparse, high dimensional data in a comprehensive fashion, and to use posterior inference to label new unseen data.
Back to Volume