DSpace at KOASAS: Histogram transformation methodology for data distribution in parallel joins

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Ph.D.(박사논문)

Histogram transformation methodology for data distribution in parallel joins병렬 결합 연산의 데이타 분산을 위한 히스토그램 변환기법

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 384
Download : 0

Export

Park, Ung-Kyu / 박웅규

As parallel database computers have been more popular, the design of efficient parallel join algorithms has been one of major issues of the database system area. Basically, the conventional parallel join algorithms have two phases: partitioning the joined relations and locally joining the partitioned relations in parallel. In real databases, it is often found that certain values for a given attribute occur more frequently than other values. This phenomenon is referred to as data skew. With skewed data distribution, the join algorithms have encountered many difficulties to achieve good load balancing among processors in the joining phase. This thesis discusses a data distribution problem for load balance on parallel join algorithms to minimize total execution time. We first propose a data distribution framework to resolve load imbalance and bucket overflow in parallel join. Using the histogram transformation technique, the framework transforms a histogram of skewed data to a desired distribution that corresponds to the relative computing power of node processors in the system. Next we propose an efficient parallel join algorithm for handling skewed data based on the proposed data distribution method. The main idea is to use the data distribution framework for allocating relations evenly across all the nodes. The proposed join algorithm works in three phases: the histogram evaluation phase, the partitioning phase, and the joining phase. The histogram evaluation phase first obtains a cumulative histogram of the hashed values of the join attribute for the smaller relation. Then, it determines a histogram equalization transfer function and boundary values for partitioning. In the subsequent partitioning phase, the relations are distributed among the nodes using the boundary values. Finally, the partitioned relations are locally joined on each node processor in parallel. We also present a parallel join algorithm for a heterogeneous system where computing power of eac...

Advisors: Kim, Tae-Gon; 김태곤

Description: 한국과학기술원 : 전기및전자공학과,

Publisher: 한국과학기술원

Issue Date: 1995

Identifier: 99071/325007 / 000845119

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 전기및전자공학과, 1995.2, [ viii, 109 p. ]

URI: http://hdl.handle.net/10203/36251

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=99071&flag=dissertation

Appears in Collection: EE-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Histogram transformation methodology for data distribution in parallel joins병렬 결합 연산의 데이타 분산을 위한 히스토그램 변환기법

KOASAS

Communities & Collections