Sparse clustering of mixed data with likelihood based feature ranking우도 기반 변수 정렬을 통한 혼합형 데이터의 희소 군집화

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 5
  • Download : 0
Mixed data refer to tabular data which include both numerical and categorical features and they have become prevalent in various fields such as finance and medical studies. In our study, we propose a simple yet powerful sparse clustering technique for mixed data. Our approach combines model-based Gaussian-multinomial mixture model with partitioning method, leveraging the advantages of both. Also, we utilize the difference in log-likelihoods between cluster assignment and non-assignment of each feature to induce sparsity and feature selection tailored to the practitioner's needs. The proposed method performs under with high-dimensional settings where the number of features exceeds the number of observations, due to its straightforward structure and capacity to induce sparsity. Furthermore, our model can select different features for each cluster and offers feature importance rankings which greatly enhances interpretability of the clustering result compared to other sparse clustering techniques for mixed data. We demonstrate our method's performance using synthetic and real data and observe that it has competitive performance compared to some state-of-the-art mixed data clustering methods.
Advisors
안정연researcher
Description
한국과학기술원 :산업및시스템공학과,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 산업및시스템공학과, 2023.8,[iii, 20 p. :]

Keywords

혼합형 데이터▼a군집화▼a로그-우도▼a변수 정렬▼a변수 선택; Mixed data▼aClustering▼aLog-likelihood▼aFeature Ranking▼aFeature Selection

URI
http://hdl.handle.net/10203/320619
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1045807&flag=dissertation
Appears in Collection
IE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0