DSpace at KOASAS: Efficient histograms for accurate selectivity estimation of multi-dimensional range queries

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Ph.D.(박사논문)

Efficient histograms for accurate selectivity estimation of multi-dimensional range queries다차원 영역 질의의 정확한 선택도 추정을 위한 효율적인 히스토그램

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 562
Download : 0

Export

Roh, Yohan-J. / 노요한

The histogram, which is a simple representation for distribution of a large data set, is widely used for estimation of the query result sizes that has many applications for various types of query processing. Estimates for the histogram buckets that partially overlap with the query region are computed based on the assumption that all the data objects in a bucket are uniformly distributed. However, it has been shown to be intractable to organize histogram buckets such that objects in every bucket are uniformly distributed. In most heuristic histogram methods, there often exist skews or clusters of objects in data distributions within the histogram buckets, which degrades the accuracy of the estimates. In this dissertation, we explore the issue of how to construct effective histograms for estimation of the result sizes of multi-dimensional range queries. We propose three new histogram methods: the skew-tolerant histogram, the quad tree-based histogram, and the minimal-skew cover histogram. In the first part of this dissertation, we propose a new histogram method, called the skew-tolerant histogram for two or three dimensional geographic data objects that are used in many real-world applications in practice. The proposed method provides a significantly enhanced accuracy in a robust manner even for the data set that has a highly skewed distribution. When constructing a histogram, our method detects and utilizes the clusters of objects present in various parts of a data set. By directly utilizing clusters in organizing buckets, our proposed method can provide an enhanced accuracy in a robust manner over skewed distributions. Through extensive performance experiments, we show a considerable accuracy improvement of the proposed method. In the second part of this dissertation, we propose a new histogram method, called the quad tree-based histogram that is based on the use of the existing quad tree for multi-dimensional data sets. The compact representation of the t...

Advisors: Kim, Myoung Ho researcher; 김명호 researcher

Description: 한국과학기술원 : 전산학과,

Publisher: 한국과학기술원

Issue Date: 2010

Identifier: 418775/325007 / 020045082

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 전산학과, 2010.2, [ viii, 102 p. ]

Keywords: Query Optimization; Database systems; Histograms; 히스토그램; 질의 최적화; 데이터베이스 시스템

URI: http://hdl.handle.net/10203/33299

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=418775&flag=dissertation

Appears in Collection: CS-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Efficient histograms for accurate selectivity estimation of multi-dimensional range queries다차원 영역 질의의 정확한 선택도 추정을 위한 효율적인 히스토그램

KOASAS

Communities & Collections