|
|
Paper: |
Web Scraping for Astronomy |
Volume: |
461, Astronomical Data Analysis Software and Systems XXI |
Page: |
319 |
Authors: |
Derriere, S.; Boch, T. |
Abstract: |
Astronomical web sites and portals are used daily by astronomers, and
are increasingly interactive and customizable, mainly through the use of
JavaScript. In addition, information often arises from the linking of remotely
distributed data and contents. All these potential links can not always be
defined in advance and stored in a web document for at least two reasons: they
could potentially increase the size of the document source by a large fraction;
and sometimes only the user (and not the document creator) knows where relevant
links should be provided. Web scraping is the process of automatically
collecting Web information. In this context, we started developing a method
allowing retrieval of remote information, and display of this information
(including links to remote websites) in the current document, triggered by a
very simple action from the user: the selection of a portion of text in the web
document. Our first prototype deals with astronomical object names. It is
written in JavaScript, and can easily be implemented in a web document, or used
as a bookmarklet. Whenever the user selects a portion of text in a web document,
a request to the Sesame name resolver is made to test if this is a valid object
identifier. On success, information retrieved in JSON allows to display a
tooltip with additional information on this object such as its coordinates,
links to various CDS services, image thumbnails, etc. We present the current
status of this work, and discuss how it could be extended in the future to other
applications. |
|
|
|
|