Histogram transformation methodology for data distribution in parallel joins병렬 결합 연산의 데이타 분산을 위한 히스토그램 변환기법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 364
  • Download : 0
As parallel database computers have been more popular, the design of efficient parallel join algorithms has been one of major issues of the database system area. Basically, the conventional parallel join algorithms have two phases: partitioning the joined relations and locally joining the partitioned relations in parallel. In real databases, it is often found that certain values for a given attribute occur more frequently than other values. This phenomenon is referred to as data skew. With skewed data distribution, the join algorithms have encountered many difficulties to achieve good load balancing among processors in the joining phase. This thesis discusses a data distribution problem for load balance on parallel join algorithms to minimize total execution time. We first propose a data distribution framework to resolve load imbalance and bucket overflow in parallel join. Using the histogram transformation technique, the framework transforms a histogram of skewed data to a desired distribution that corresponds to the relative computing power of node processors in the system. Next we propose an efficient parallel join algorithm for handling skewed data based on the proposed data distribution method. The main idea is to use the data distribution framework for allocating relations evenly across all the nodes. The proposed join algorithm works in three phases: the histogram evaluation phase, the partitioning phase, and the joining phase. The histogram evaluation phase first obtains a cumulative histogram of the hashed values of the join attribute for the smaller relation. Then, it determines a histogram equalization transfer function and boundary values for partitioning. In the subsequent partitioning phase, the relations are distributed among the nodes using the boundary values. Finally, the partitioned relations are locally joined on each node processor in parallel. We also present a parallel join algorithm for a heterogeneous system where computing power of eac...
Advisors
Kim, Tae-Gon김태곤
Description
한국과학기술원 : 전기및전자공학과,
Publisher
한국과학기술원
Issue Date
1995
Identifier
99071/325007 / 000845119
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학과, 1995.2, [ viii, 109 p. ]

URI
http://hdl.handle.net/10203/36251
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=99071&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0