|
|
Paper: |
Exorcising the Ghost in the Machine: Synthetic Spectral Data Cubes for Assessing Big Data Algorithms |
Volume: |
495, Astronomical Data Analysis Software and Systems XXIV (ADASS XXIV) |
Page: |
57 |
Authors: |
Araya, M.; Solar, M.; Mardones, D.; Hochfärber, T. |
Abstract: |
The size and quantity of the data that is being generated by large astronomical
projects like ALMA, requires a paradigm change in astronomical data analysis.
Complex data, such as highly sensitive spectroscopic data in the form of large
data cubes, are not only difficult to manage, transfer and visualize, but they
make traditional data analysis techniques unfeasible. Consequently, the attention has been placed on machine learning and
artificial intelligence techniques, to develop approximate and adaptive methods
for astronomical data analysis within a reasonable computational time.
Unfortunately, these techniques are usually sub optimal, stochastic and strongly
dependent of the parameters, which could easily turn into “a ghost in the
machine” for astronomers and practitioners. Therefore, a proper assessment of
these methods is not only desirable but mandatory for trusting them in
large-scale usage. The problem is that positively verifiable results are scarce
in astronomy, and moreover, science using bleeding-edge instrumentation
naturally lacks of reference values. We propose an Astronomical SYnthetic Data
Observations (ASYDO), a virtual service that generates synthetic spectroscopic
data in the form of data cubes. The objective of the tool is not to produce
accurate astrophysical simulations, but to generate a large number of labelled
synthetic data, to assess advanced computing algorithms for astronomy and to
develop novel Big Data algorithms. The synthetic data is generated using a set
of spectral lines, template functions for spatial and spectral distributions,
and simple models that produce reasonable synthetic observations. Emission lines
are obtained automatically using IVOA's SLAP protocol (or from a relational
database) and their spectral profiles correspond to distributions in the
exponential family. The spatial distributions correspond to simple functions
(e.g., 2D Gaussian), or to scalable template objects. The intensity, broadening
and radial velocity of each line is given by very simple and naive physical
models, yet ASYDO's generic implementation supports new user-made models, which
potentially allows adding more realistic simulations. The resulting data cube is
saved as a FITS file, also including all the tables and images used for
generating the cube. We expect to implement ASYDO as a virtual observatory
service in the near future. |
|
|
|
|