Joins on encoded and partitioned data

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 383
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorLee, Jae-Gilko
dc.contributor.authorAttaluri, Gopiko
dc.contributor.authorBarber, Ronaldko
dc.contributor.authorChainani, Nareshko
dc.contributor.authorDraese, Oliverko
dc.contributor.authorHo, Frederickko
dc.contributor.authorIdreos, Stratosko
dc.contributor.authorKim, Min-Sooko
dc.contributor.authorLightstone, Sam S.ko
dc.contributor.authorLohman, Guy M.ko
dc.contributor.authorMorfonios, Konstantinosko
dc.contributor.authorMurthy, Keshavako
dc.contributor.authorPandis, Ippokratisko
dc.contributor.authorQiao, Linko
dc.contributor.authorRaman, Vijayshankarko
dc.contributor.authorSamy, Vincent Kulandaiko
dc.contributor.authorSidle, Richardko
dc.contributor.authorStolze, Knutko
dc.contributor.authorZhang, Lipingko
dc.identifier.citationProceedings of the VLDB Endowment, v.7, no.13, pp.1355 - 1366-
dc.description.abstractCompression has historically been used to reduce the cost of storage, I/Os from that storage, and buffer pool utilization, at the expense of the CPU required to decompress data every time it is queried. However, significant additional CPU efficiencies can be achieved by deferring decompression as late in query processing as possible and performing query processing operations directly on the still-compressed data. In this paper, we investigate the benefits and challenges of performing joins on compressed (or encoded) data. We demonstrate the benefit of independently optimizing the compression scheme of each join column, even though join predicates relating values from multiple columns may require translation of the encoding of one join column into the encoding of the other. We also show the benefit of compressing "payload" data other than the join columns "on the fly," to minimize the size of hash tables used in the join. By partitioning the domain of each column and defining separate dictionaries for each partition, we can achieve even better overall compression as well as increased flexibility in dealing with new values introduced by updates. Instead of decompressing both join columns participating in a join to resolve their different compression schemes, our system performs a light-weight mapping of only qualifying rows from one of the join columns to the encoding space of the other at run time. Consequently, join predicates can be applied directly on the compressed data. We call this procedure encoding translation. Two alternatives of encoding translation are developed and compared in the paper. We provide a comprehensive evaluation of these alternatives using product implementations of each on the TPC-H data set, and demonstrate that performing joins on encoded and partitioned data achieves both superior performance and excellent compression.-
dc.publisherAssociation for Computing Machinery-
dc.titleJoins on encoded and partitioned data-
dc.citation.publicationnameProceedings of the VLDB Endowment-
dc.contributor.localauthorLee, Jae-Gil-
dc.contributor.localauthorKim, Min-Soo-
dc.contributor.nonIdAuthorAttaluri, Gopi-
dc.contributor.nonIdAuthorBarber, Ronald-
dc.contributor.nonIdAuthorChainani, Naresh-
dc.contributor.nonIdAuthorDraese, Oliver-
dc.contributor.nonIdAuthorHo, Frederick-
dc.contributor.nonIdAuthorIdreos, Stratos-
dc.contributor.nonIdAuthorLightstone, Sam S.-
dc.contributor.nonIdAuthorLohman, Guy M.-
dc.contributor.nonIdAuthorMorfonios, Konstantinos-
dc.contributor.nonIdAuthorMurthy, Keshava-
dc.contributor.nonIdAuthorPandis, Ippokratis-
dc.contributor.nonIdAuthorQiao, Lin-
dc.contributor.nonIdAuthorRaman, Vijayshankar-
dc.contributor.nonIdAuthorSamy, Vincent Kulandai-
dc.contributor.nonIdAuthorSidle, Richard-
dc.contributor.nonIdAuthorStolze, Knut-
dc.contributor.nonIdAuthorZhang, Liping-
Appears in Collection
IE-Journal Papers(저널논문)CS-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.


  • mendeley


rss_1.0 rss_2.0 atom_1.0