Parallelization of Multi-query Processing for Hierarchical Data Streams계층 구조 스트림 데이터를 위한 다중 질의 병렬 처리 기법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 444
  • Download : 0
Recently, as increasing amounts of information are stored, exchanged, and presented using eXtensible Markup Language (XML), it becomes more and more important to adequately process XML streams. Meanwhile, the multicore architecture has been the norm for all computing systems in recent years as it provides the CPU-level support of parallelism. However, existing algorithms for processing XML streams do not fully take advantage of the facility since they have not been devised to run in parallel. They also show a degraded processing performance as the number of user queries increases. In this thesis, we propose several methods to parallelize the finite state automata(FSA)-based XML stream processing technique efficiently. We transform a large collection of XPath expressions into multiple FSA-based query indexes and then process XML streams in parallel by virtue of index-level parallelism. Each core works only with its own query index so that no synchronization issue occurs while filtering XML streams with multiple path patterns given by users. Moreover, proposed algorithm permits query processing to share input scans and path solutions to reduce redundant processing and save computations and I/Os. We also present an in-memory MapReduce model that enables to process a large collection of twig pattern joins over XML streams simultaneously. Twig pattern joins in our approach are performed by multiple H/W threads in a shared and balanced way. In addition, we address performance issues in the in-memory MapReduce by providing a sophisticated run-time workload balancing scheme. It is achieved by computing the cost of each twig pattern join operation before actual joining. Extensive experiments show that our algorithm outperforms conventional algorithms by up to ten times on an 8-core CPU for processing 10 million XPath expressions over XML streams. Through extensive experiments with synthetic XML dataset, we prove that our parallel algorithms are efficient and scalable.
Advisors
Lee, Yoon-Joonresearcher이윤준researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2017
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학부, 2017.2,[iv, 59 p. :]

Keywords

data streams; XML; query processing; parallel processing; multicore architecture; intra-node parallelism; 데이터 스트림; 질의처리; 병렬처리; 멀티코어; 단일노드 병렬화

URI
http://hdl.handle.net/10203/242081
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=675850&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0