SciDFS: An In-situ Processing System for Scientific Array Data based on Distributed File System

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 135
  • Download : 0
Recently, the amount of array data generated by scientific observation instruments increases rapidly. The array data is usually stored in standard formats such as HDF5 and NetCDF. To support high-level queries on the array data, a number of array DBMSs such as SciDB have been proposed. However, they typically have two drawbacks: slow data loading and not directly supporting standard formats. In particular, slow data loading is fatal since the speed of scientific data generation might be faster than that of data loading. To solve those drawbacks, we propose a distributed in-situ processing system called SciDFS that exploits a distributed file system (DFS) for storing and managing array data. SciDFS is a hybrid system that tightly integrates the query processing layer of an array DBMS with a DFS via an in-situ layer. It stores raw array data as DFS blocks very fast and processes queries in an in-situ manner by accessing the relevant DFS blocks. Through experiments using NASA's real satellite array data, we have shown three major features of SciDFS: high performance data loading (50X faster than SciDB), fast in-situ query processing performance, and running legacy applications for the HDF5 format. ? 2018 IEEE.
Publisher
IEEE
Issue Date
2018-01-17
Language
English
Citation

IEEE International Conference on Big Data and Smart Computing, BigComp 2018, pp.375 - 382

DOI
10.1109/BigComp.2018.00062
URI
http://hdl.handle.net/10203/274433
Appears in Collection
CS-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0