GPUs and Multiclustering for Big Data Computing


Paper:	GPUs and Multiclustering for Big Data Computing
Volume:	538, ADASS XXXII
Page:	63
Authors:	A. S. Anku; M. A. Diego; V. Navarro; J. Reerink
DOI:	10.26624/BPNU2117
Abstract:	Space missions produce an unimaginable amount of data, which at some point has to be passed through pipelines to be cleaned, processed and transformed. Later on, the data will be in one way or another stored and analysed. Multiplying that amount by the number of ongoing and planned space exploration missions at the European Space Agency (ESA) shows that not only the tools, but also the architecture, should support the immense volume of information. An extension project for ESA Datalabs was started, with the aim of offering the science community tools to manage operating on big amounts of data outside of their devices. By accelerating computational operations with graphics processing units (GPUs), the user will see the immediate benefit in the speed at which the result is obtained, while behind the scenes, this means an effective use of threads, memory, and concurrent access to resources by the libraries. Multiclustering targets the aspect of sharing the data with the community. Most of the times moving or copying it across the Internet is both complex and time-consuming, so a good solution is to bring the user to the data. with ESA Datalabs built on Kubernetes clusters, constraints to a particular operating system are dropped. Managed properly, the clusters offer persistence and most importantly, scalability—if the users need more resources or the platform has to scale, this can be handled by adding new clusters, for example, with a GPU.