Recently, great attention has been shown to distributed stream processing systems (DSPSs). In many applications of DSPSs, network communication is the key bottleneck of performance, and hence it is crucial to minimize communication costs. In this dissertation, we address communication-efficient query processing in DSPSs.
An emerging challenge in large-scale DSPSs is to efficiently process multiple continuous aggregation queries, which are one of the most common query types in streaming applications. Since a naive approach that executes each query separately can lead to scalability and efficiency problem, multiple aggregation queries must be processed collectively, rather than separately. In the first part of this dissertation, we propose an efficient method for collectively processing multiple aggregation queries. Running at a local site, our proposed method finds the smallest set of aggregates that need to be sent to the global site in order to correctly answer all the queries and thus, it minimizes the number of required message transmissions. Since our proposed method operates on-the-fly, it can also efficiently handle registration or deregistration of queries at any time. Grounded in linear algebra, we prove that our proposed method is optimal in terms of communication costs.
In sensor networks, which are one of the most popular types of DSPSs, the event detection process can be regarded as a join of two relations, i.e., a sensor table and a condition table, where a condition table is a set of tuples each of which contains condition information about a certain event. When join operations are used for event-detection, it is desirable, if possible, to perform `in-network` joins in order to reduce the communication cost. In the second part of this dissertation, we propose an in-network join algorithm, called \emph{HIPaG}. In HIPaG, a condition table is partitioned into several fragments. Those fragments are stored either in paths from the base station (i.e., t...