Back to Volume
Paper: Self-describing Portable Dataset Container
Volume: 527, Astronomical Data Analysis Software and Systems XXIX
Page: 543
Authors: Huang, M.
Abstract: With Self-describing Portable Dataset Container (SPDC) the user can pack data of different formats into modular dataset and Products, together with annotation (description and units) and metadata (Parameters about data). SPDC accommodates highly complex associated and nested structures. Access APIs of the components of “SPDCs” are convenient, making it easier for scripting and data mining directly “on SPDCs”. The toString() method of major container classes outputs nicely formatted text representation of complex data. SPDCs are portable (de/serializable) in human-friendly standard format (JSON implemented), so that machine data processors on different platforms can parse, access internal components, or re-construct an SPDC. Even a human with a web browser can understand the data. Most SPDC Products and components implement event sender and listener interfaces to facilitate scalable data-driven processing pipelines. SPDC storage “pools” are provided for 1) data storage and, 2) for all persistent data to be referenced to with URNs (Universal Resource Names). References of SPDC can become components of Context products, therefor enabling SPDCs to encapsulate rich, deep, sophisticated, and accessible contextual data, yet remain light-weight. For data processors, a web server with RESTful APIs is implemented, suitable for Docker containers in pipelines that mix legacy software or software of incompatible environments to form an integral data processing pipeline.
Back to Volume