Efficient signature file declustering methods for parallel processing병렬처리를 위한 효율적인 요약화일 디클러스터링 방법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 394
  • Download : 0
The signature file is an abstraction of documents, which has been studied as a storage structure for unformatted data. Since the size of the signature file is much smaller than that of a data file, it has been shown that the signature file can effectively work as a filter that immediately discards most non-qualifying documents for a given query. Although sequential organization of a signature file works well for a data file with a small size, its performance becomes a problem when the size of a data file is large. Many organizations of a signature file can improve the performance based on a tree or hashing techniques for single processor systems. There have also been many attempts to make the schemes run for parallel environment. The Hamming Filter shows good declustering performance for some partial match queries. It declusters a signature file by using the Linear Code Decomposition Method(LCDM) that is used for detecting and correcting errors while transmitting data. The LCDM yields practically no execution skew, if the data is not skewed. However, since the LCDM allocates signatures with the same suffix into the same processing node, it can not avoid data skew if many signatures have the same suffix. In addition, it has problems that make the LCDM difficult for parallelism such as non-scalability and non-determinism. In this dissertation we have proposed two signature file declustering methods, called MIN-entropy, Inner-product respectively that overcome the problems in the LCDM. They decluster signature file dynamically based on the current status of signature allocation. Thus, the MIN-entropy and the Inner-product can cope with a variety of workloads and configurations. We have showed through the performance evaluation based on the statistical modeling that the MIN-entropy and the Inner-product give better retrieval performance than the LCDM for data sets with various distributions such as uniform distribution, normal distribution and exponential distribu...
Advisors
Kim, Myoung Horesearcher김명호researcher
Description
한국과학기술원 : 전산학과,
Publisher
한국과학기술원
Issue Date
1999
Identifier
156215/325007 / 000935297
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학과, 1999.8, [ [ix], 102 p. ]

Keywords

Declustering; Parallel processing; Information retrieval; Signature file; Indexing method; 인덱싱; 분산기법; 병렬처리; 정보검색; 요약화일

URI
http://hdl.handle.net/10203/33145
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=156215&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0