Stochastic Blockmodel with Cluster Overlap, Relevance Selection, and Similarity-Based Smoothing

Cited 3 time in webofscience Cited 7 time in scopus
  • Hit : 182
  • Download : 0
Stochastic blockmodels provide a rich, probabilistic framework for modeling relational data by expressing the objects being modeled in terms of a latent vector representation. This representation can be a latent indicator vector denoting the cluster membership (hard clustering), a vector of cluster membership probabilities (soft clustering), or more generally a real-valued vector (latent space representation). Recently, a new class of overlapping stochastic blockmodels has been proposed where the idea is to allow the objects to have hard memberships in multiple clusters (in form of a latent binary vector). This aspect captures the properties of many real-world networks in domains such as biology and social networks where objects can simultaneously have memberships in multiple clusters owing to the multiple roles they may have. In this paper, we improve upon this model in three key ways: (1) we extend the overlapping stochastic blockmodel to the bipartite graph case which enables us to simultaneously learn the overlapping clustering of two different sets of objects in the graph; the unipartite graph is just a special case of our model, (2) we allow objects (in either set) to not have membership in any cluster by using a relevant object selection mechanism, and (3) we make use of additionally available object features (or a kernel matrix of pairwise object similarities) to further improve the overlapping clustering performance. We do this by explicitly encouraging similar objects to have similar cluster membership vectors. Moreover, using nonparametric Bayesian prior distributions on the key model parameters, we side-step the model selection issues such as selecting the number of clusters a priori. Our model is quite general and can be applied for both overlapping clustering and link prediction tasks in unipartite and bipartite networks (directed/undirected), or for overlapping co-clustering of general binary-valued data. Experiments on synthetic and real-world datasets from biology and social networks demonstrate that our model outperforms several state-of-the-art methods.
Publisher
IEEE
Issue Date
2013-12
Language
English
Citation

IEEE 13th International Conference on Data Mining (ICDM), pp.817 - 826

ISSN
1550-4786
DOI
10.1109/ICDM.2013.156
URI
http://hdl.handle.net/10203/275468
Appears in Collection
CS-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 3 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0