|
|
Paper: |
The Distributed Cloud Based Engine for Knowledge Discovery in Massive Archives of Astronomical Spectra |
Volume: |
512, Astronomical Data Analysis Software and Systems XXV |
Page: |
689 |
Authors: |
Škoda, P.; Koza, J.; Palička, A.; Lopatovský, L.; Peterka, T. |
Abstract: |
The current archives of large-scale spectroscopic surveys, such as SDSS or
LAMOST, contain millions of spectra. As some interesting objects (e.g.
emission line stars or quasars) can be identified only by checking the shapes of
certain spectral lines, machine learning techniques have to be applied,
complemented by flexible visualisation of results.
We present VO-CLOUD, the distributed cloud-based engine, providing the
user with a comfortable web-based environment for conducting machine
learning experiments with different algorithms running on multiple nodes. It
allows visual backtracking of the individual input spectra at different stages
of preprocessing, which is important for checking the nature of outliers or
precision of classification.
The engine consists of a single master server, representing the user portal,
and several workers, running various types of machine learning tasks. The
master holds the database of users and their experiments, predefined
configuration parameters for individual machine learning models and a
repository for data to be preprocessed. The workers have different
capabilities based on the installed libraries and the hardware configuration of their host
(e.g. number of CPU cores or GPU card type) and more may be dynamically added to
provide new machine learning methods. |
|
|
|
|