Bayesian variable selection in clustering high-dimensional data via a mixture of finite mixtures

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 58
  • Download : 0
When clustering high-dimensional data, it is often important to identify variables that discriminate the clusters. Meanwhile, a common issue in clustering is to determine the number of clusters. In this study, we propose a new method that simultaneously performs clustering and variable selection, while inferring the number of clusters from the data. We formulate the clustering problem using a finite mixture model with a symmetric Dirichlet weights prior, while also placing a prior on the number of components. That is, we utilize a mixture of finite mixtures. We handle the variable selection problem by introducing a latent binary vector, which represents the inclusion/exclusion of variables. We update the binary vector for variable selection using a Metropolis algorithm and perform inference on the cluster structure using a split-merge Markov chain Monte Carlo technique. We demonstrate the advantage of our method using simulated and two real DNA microarray datasets.
Publisher
TAYLOR & FRANCIS LTD
Issue Date
2021-08
Language
English
Article Type
Article
Citation

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, v.91, no.12, pp.2551 - 2568

ISSN
0094-9655
DOI
10.1080/00949655.2021.1902526
URI
http://hdl.handle.net/10203/287118
Appears in Collection
IE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0