Community detection, also known as graph clustering, has been extensively studied in the literature. The goal of community detection is to partition vertices in a graph into densely-connected components so-called communities. In recent applications, however, an entity is associated with multiple aspects of relationships and multiple attributes from multiple data sources, which brings new challenges in community detection. These multimodal data sources can be naturally modeled as a multi-layer graph composed of multiple interdependent layers and mapping functions, where each layer represents an intra-relationship and each mapping function represents inter-relationship between two layers. Great efforts have therefore been made to tackle the problem of community detection in multi-layer graphs.
In this dissertation, we propose novel frameworks for community detection from multiple data sources based on the multi-layer graph model. Among various combinations of multiple data sources, we deal with two representative cases: (i) multiple aspects of relationships and (ii) multiple attributes. The first case deals with multiple social graphs which consist of a set of users involved with different types of relationships. The second case deals with attributed graphs which consists of a set of users involved with social relationships as well as associated with multiple attributes. Particulary, we focus on a geosocial graph which has attracted much attention thanks to the widespread use of location-aware mobile devices. Since locations accessed by users can be regarded as various geographic preferences or interests of users, a geosocial graph is a representative case of attributed graphs.
In the first part of this dissertation, we propose a novel framework for differential flattening, which facilitates the analysis of pillar multi-layer graphs, and apply this framework to community detection. Differential flattening merges multiple graphs into a single graph such that the graph structure with the maximum clustering coefficient is obtained from the single graph. It has two distinct features compared with the existing approaches. First, dealing with multiple layers is done independently of a specific community detection algorithm whereas previous approaches rely on a specific algorithm. Thus, any algorithm for a single graph becomes applicable to multi-layer graphs. Second, the contribution of each layer to the single graph is determined automatically for the maximum clustering coefficient. Since differential flattening is formulated as an optimization problem, the optimal solution is easily obtained by well-known algorithms such as interior point methods. Extensive experiments were conducted using the LFR benchmark networks as well as the DBLP, 20 Newsgroups, and MIT Reality Mining networks. The results show that our approach of differential flattening leads to discovery of higher-quality communities than baseline approaches and the state-of-the-art algorithms.
In the second part of this dissertation, we propose a novel framework for geosocial co-clustering, which facilitates the analysis of attributed graphs with a focus on a geosocial graph. Geosocial co-clustering is formulated by non-negative matrix tri-factorization with dual regularizers. The existing matrix tri-factorization algorithms, however, suffer from a significant computational overhead when handling large-scale data sets in many real world applications. Our proposed framework takes advantage of the intrinsic properties of geosocial networks to reduce the computational overhead without compromising accuracy. First, the numbers of users and locations are effectively reduced through coarsening of our framework. Then, we decompose the matrix tri-factorization of a single large matrix into a series of multiple smaller sub-matrix tri-factorizations. To this end, the optimal split of the entire matrix is determined by crossing minimization and optimization of the minimum description length. Experiments conducted using four real-world geosocial networks show that our framework for geosocial co-clustering reduces the elapsed time by 19 to 69 times while achieving the accuracy of up to 95.2% compared with the state-of-the-art co-clustering algorithm.
The strength of this dissertation lies in a wide variety of community detection from multiple heterogeneous data sources. Since the two cases proposed are expected to cover a significant proportion of the cases with multiple data sources, we believe that this work will enhance the quality of community detection in social networks from multiple data sources.