DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kim, Min Soo | ko |
dc.contributor.author | Whang, Kyu-Young | ko |
dc.contributor.author | Lee, Jae-Gil | ko |
dc.date.accessioned | 2013-03-07T21:41:05Z | - |
dc.date.available | 2013-03-07T21:41:05Z | - |
dc.date.created | 2012-02-06 | - |
dc.date.created | 2012-02-06 | - |
dc.date.created | 2012-02-06 | - |
dc.date.created | 2012-02-06 | - |
dc.date.created | 2012-02-06 | - |
dc.date.issued | 2007-11 | - |
dc.identifier.citation | COMPUTER SYSTEMS SCIENCE AND ENGINEERING, v.22, pp.365 - 379 | - |
dc.identifier.issn | 0267-6192 | - |
dc.identifier.uri | http://hdl.handle.net/10203/91431 | - |
dc.description.abstract | Approximate string matching is to find all the occurrences of a query string in a text database allowing a specified number of errors. Approximate string matching based on the n-gram inverted index (simply, n-gram Matching) has been widely used. A major reason is that it is scalable for large databases since it is not a main memory algorithm. Nevertheless, n-gram Matching also has drawbacks: the query performance tends to be bad, and many false positives occur if a large number of errors are allowed. in this paper, we propose an inverted index structure, which we call the n-gram/2L-Approximation index, that improves these drawbacks and an approximate string matching algorithm based on it. The n-gram/2L-Approximation is an adaptation of the n-gram/2L index [4], which the authors have proposed earlier for exact matching. inheriting the advantages of the n-gram/2L index, the n-gram/2L-Approximation index reduces the size of the index and improves the query performance compared with the n-gram inverted index. In addition, the n-gram/2L-Approximation index reduces false positives compared with the n-gram inverted index if a large number of errors are allowed. We perform extensive experiments using the text and protein databases. Experimental results using databases of 1 GBytes show that the n-gram/2L-Approximation index reduces the index size by up to 1.8 times and, at the same time, improves the query performance by up to 4.2 times compared with those of the n-gram inverted index. | - |
dc.language | English | - |
dc.publisher | C R L PUBLISHING LTD | - |
dc.title | n-Gram/2L-approximation: a two-level n-gram inverted index structure for approximate string matching | - |
dc.type | Article | - |
dc.identifier.wosid | 000252749200005 | - |
dc.identifier.scopusid | 2-s2.0-38949126548 | - |
dc.type.rims | ART | - |
dc.citation.volume | 22 | - |
dc.citation.beginningpage | 365 | - |
dc.citation.endingpage | 379 | - |
dc.citation.publicationname | COMPUTER SYSTEMS SCIENCE AND ENGINEERING | - |
dc.contributor.localauthor | Kim, Min Soo | - |
dc.contributor.localauthor | Whang, Kyu-Young | - |
dc.contributor.localauthor | Lee, Jae-Gil | - |
dc.description.isOpenAccess | N | - |
dc.type.journalArticle | Article | - |
dc.subject.keywordAuthor | approximate string matching | - |
dc.subject.keywordAuthor | n-gram | - |
dc.subject.keywordAuthor | inverted index | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.