|
|
Paper: |
Pre-feasibility Study of Astronomical Data Archive Systems Powered by Public Cloud Computing and Hadoop Hive |
Volume: |
521, Astronomical Data Analysis Software and Systems XXVI |
Page: |
608 |
Authors: |
Eguchi, S. |
Abstract: |
The size of astronomical observational data is increasing yearly.
For example, while Atacama Large Millimeter/submillimeter Array is
expected to generate 200 TB raw data every year, Large Synoptic
Survey Telescope is estimated to produce 15 TB raw data every night.
Since the increasing rate of computing is much lower than that of
astronomical data, to provide high performance computing (HPC)
resources together with scientific data will be common in the next decade.
However, the installation and maintenance costs of a HPC system can
be burdensome for the provider.
I note public cloud computing for an alternative way to get
sufficient computing resources inexpensively.
I build Hadoop and Hive clusters by utilizing a virtual private server (VPS)
service and Amazon Elastic MapReduce (EMR), and measure their performances.
The VPS cluster behaves differently day by day, while the EMR clusters
are relatively stable.
Since partitioning is essential for Hive, several partitioning algorithms
are evaluated.
In this paper, I report the results of the benchmarks
and the performance optimizations in cloud computing environment. |
|
|
|
|