In general, traditional database management systems are well suited for a variety of business-processing applications in the commercial world. These applications typically use formatted data. However many new applications require to support records(or documents) containing free texts as well as formatted fields. Such new applications include library systems, medical information systems, office information systems, geographical database systems, CAD/CAM systems, and a variety of military applications.
In this thesis, we propose new signature-based multikey access methods based on a term discrimination property in order to support such new applications. Using the property, we differentiate highly discriminatory terms(primary terms) from lowly discriminatory terms (secondary terms) and constuct a new, efficient access structure, e.g. inverted files and hash-table files, for primary terms so that we may achieve good retrieval performance.
Due to the two-path structure of new multikey access methods, we propose a multi-term query processing strategy which gains entire accesses to primary terms and partial accesses to secondary terms. The strategy makes it possible to access the constant number of blocks, regardless of the number of secondary terms. In order to provide better performance on retrieval, we cluster similar record signatures in a small number of blocks. To make the clustering easy and effective, we construct clusters by means of the similarity of primary terms rather than all the terms. This still keeps clustering benefits to a high degree.
To evaluate the space-time performance of new signature-based multikey access methods, we provide an analytic model to estimate them in terms of retrieval time, storage overhead, and insertion time. In addition, to verify the analytic model, we implement new multikey access methods and acquire experimental results which agree with theoretical results. We show from the performance results that new multikey access metho...