ShadeMS: Rapid Plotting of Big Radio Interferometry Data


Paper:	ShadeMS: Rapid Plotting of Big Radio Interferometry Data
Volume:	532, ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XXX
Page:	385
Authors:	Smirnov, O. M.; Heywood, I.; Perkins, S. J.; van Rooyen, R.
Abstract:	The raw outputs of a radio interferometer, i.e. the complex visibility data and all associated metadata, while of little interest to the end-user astronomer per se, contain a wealth of information about the functioning of the instrument and software pipelines, and can provide vital diagnostics during the entire data reduction process. It is therefore important to be able to visualize them in all sorts of ways. However, the sheer size of these datasets (e.g. upwards of a billion measurements for even a short MeerKAT observation) calls for fairly sophisticated plotting techniques that can represent both dense data and outliers, and do it in a reasonable timeframe. This is well beyond the capabilities of our trusted workhorse Matplotlib. The shadeMS tool addresses this problem, drawing on two recently developed technologies: the Datashader suite, which provides functionality for rendering huge datasets onto two-dimensional canvases using a variety of aggregation and categorization options, and the DaskMS library, which provides a native mapping from the Measurement Set, the standard radio astronomy data format, to Dask arrays, which facilitate massively parallel computation, and are natively supported by Datashader. The premise of shadeMS is to support the plotting of anything versus anything, aggregated by anything and coloured (i.e. categorized) by anything, via a straightforward command-line or Python interface. The use of Dask means that a large number of cores can be efficiently exploited, making the plotting process I/O-limited in many cases. This allows data processing pipelines to produce a rich variety of diagnostic plots with relatively little overhead.