Back to Volume
Paper: Analyzing Scientific Big Data with HeAT
Page: 89
Authors: Comito, C.; Götz, M.; Debus, C.; Coquelin, D.; Krajsek, K.; Knechtges, P.; Hagemeier, B.; Tarnawa, M.; Blind, L.
Abstract: The exponential increase in data volumes over the last years means researchers in all fields of science are scrambling to port their data analysis to high-performance-computing (HPC) applications. Python is the standard programming language within the scientific community, with the SciPy stack the clear reference for data analysis. While parallelizing SciPy code can be relatively straightforward if the problem is easy to separate into parallel tasks, scientists today are still mostly on their own when it comes to solving more complex problems, requiring ad-hoc communication among CPUs/GPUs. The Helmholtz Analytics Toolkit (HeAT) is meant to bridge this gap. HeAT is an open-source Python tensor library for scientific parallel computing and machine learning. Under the hood, low-level operations and high-level algorithms are optimized to exploit the available resources, be it a dual-core laptop or a supercomputer. At the same time, HeAT's NumPy-like API makes it straightforward for SciPy users to implement HPC applications, or to parallelize their existing ones. HeAT relies on PyTorch for its data objects, which implies fast on-process operations and GPU support. Our recent benchmarks show that the current early-phase HeAT can achieve a significant speed-up compared to popular parallel Python frameworks.
Back to Volume