Web Scraping for Astronomy


Paper:	Web Scraping for Astronomy
Volume:	461, Astronomical Data Analysis Software and Systems XXI
Page:	319
Authors:	Derriere, S.; Boch, T.
Abstract:	Astronomical web sites and portals are used daily by astronomers, and are increasingly interactive and customizable, mainly through the use of JavaScript. In addition, information often arises from the linking of remotely distributed data and contents. All these potential links can not always be defined in advance and stored in a web document for at least two reasons: they could potentially increase the size of the document source by a large fraction; and sometimes only the user (and not the document creator) knows where relevant links should be provided. Web scraping is the process of automatically collecting Web information. In this context, we started developing a method allowing retrieval of remote information, and display of this information (including links to remote websites) in the current document, triggered by a very simple action from the user: the selection of a portion of text in the web document. Our first prototype deals with astronomical object names. It is written in JavaScript, and can easily be implemented in a web document, or used as a bookmarklet. Whenever the user selects a portion of text in a web document, a request to the Sesame name resolver is made to test if this is a valid object identifier. On success, information retrieved in JSON allows to display a tooltip with additional information on this object such as its coordinates, links to various CDS services, image thumbnails, etc. We present the current status of this work, and discuss how it could be extended in the future to other applications.