|
|
Paper: |
Content-Aware Data Discovery on VO Catalogs Using Succinct Representations |
Volume: |
527, Astronomical Data Analysis Software and Systems XXIX |
Page: |
13 |
Authors: |
Araya, M.; Arroyuelo, D.; Saldías, C.; Solar, M. |
Abstract: |
VO-services and online astronomical archives in general allow to discover data
resources based on the metadata that each resource provides. Content-aware data
discovery is the process of searching for patterns within the content of the
resources, for example over the values of astronomical catalogs, and returning
how many matches each resource produces. While a combination of existing
protocols and services might produce this result, scaling up to a large number
of resources while maintaining reasonable query speeds is a challenging problem.
We propose using succinct representations to produce compressed intermediate
files where these queries can be performed with low computational complexity. In
particular, we focus on tabular data resources (i.e. catalogs), where a
content-aware query can be casted as an attribute-retrieval problem. We show
that these intermediate files can be computed directly from VOTable results from
TAP services, so a succinct (and compressed) representation of any catalog
available over this standard can be obtained. We compare our results with
standard SQL queries over a popular DBMS, showing that for most of the queries
our approach outperforms the state of the art. |
|
|
|
|