Efficient histograms for accurate selectivity estimation of multi-dimensional range queries다차원 영역 질의의 정확한 선택도 추정을 위한 효율적인 히스토그램

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 562
  • Download : 0
The histogram, which is a simple representation for distribution of a large data set, is widely used for estimation of the query result sizes that has many applications for various types of query processing. Estimates for the histogram buckets that partially overlap with the query region are computed based on the assumption that all the data objects in a bucket are uniformly distributed. However, it has been shown to be intractable to organize histogram buckets such that objects in every bucket are uniformly distributed. In most heuristic histogram methods, there often exist skews or clusters of objects in data distributions within the histogram buckets, which degrades the accuracy of the estimates. In this dissertation, we explore the issue of how to construct effective histograms for estimation of the result sizes of multi-dimensional range queries. We propose three new histogram methods: the skew-tolerant histogram, the quad tree-based histogram, and the minimal-skew cover histogram. In the first part of this dissertation, we propose a new histogram method, called the skew-tolerant histogram for two or three dimensional geographic data objects that are used in many real-world applications in practice. The proposed method provides a significantly enhanced accuracy in a robust manner even for the data set that has a highly skewed distribution. When constructing a histogram, our method detects and utilizes the clusters of objects present in various parts of a data set. By directly utilizing clusters in organizing buckets, our proposed method can provide an enhanced accuracy in a robust manner over skewed distributions. Through extensive performance experiments, we show a considerable accuracy improvement of the proposed method. In the second part of this dissertation, we propose a new histogram method, called the quad tree-based histogram that is based on the use of the existing quad tree for multi-dimensional data sets. The compact representation of the t...
Advisors
Kim, Myoung Horesearcher김명호researcher
Description
한국과학기술원 : 전산학과,
Publisher
한국과학기술원
Issue Date
2010
Identifier
418775/325007 / 020045082
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학과, 2010.2, [ viii, 102 p. ]

Keywords

Query Optimization; Database systems; Histograms; 히스토그램; 질의 최적화; 데이터베이스 시스템

URI
http://hdl.handle.net/10203/33299
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=418775&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0