DSpace at KOASAS: Hybrid imputation of cluster-based k-NN and maximum likelihood estimation in software project data

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Master(석사논문)

Hybrid imputation of cluster-based k-NN and maximum likelihood estimation in software project data군집기반 k-NN과 최대우도추정법을 결합한 소프트웨어 프로젝트 데이터용 하이브리드 대치법

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 708
Download : 0

Export

Lee, Dong-Ho / 이동호

Missing data is one of the common problems that software practitioners face often when they analyze software project data. In the empirical software engineering community, k-NN and Maximum likelihood estimation were known to be effective to software project data. However, they have the following limitations in applying alone to software project data: (1) the imputation accuracy of k-NN is affected by the homogeneity of data, and (2) Maximum likelihood estimation is ineffective in the data set containing less than 100 project instances. To cope with these limitations of existing techniques in applying them alone to software project data, hybrid imputation techniques combining several methods have been developed. However, it can be applied to only software project data with less than 100 project instances. In this paper, we propose a hybrid imputation method using cluster-based k-NN and Maximum likelihood estimation in software project data. Maximum likelihood estimation is applied first and then Hierarchical clustering partitions software project data into clusters. Initial imputation using Maximum likelihood estimation makes k-NN use the non-missing data of project instances having missing data, in its searching; partitioning software project data into clusters increases the homogeneity of data set. After finding most $\it{k}$ similar project instances in the cluster, an average between the result of k-NN and that of Maximum likelihood estimation is taken. In the empirical study, we evaluated our approach and other five methods by experiments on 2,160 data sets, which are generated by injecting missing data into the two industrial data sets such as software project data measured in a bank in Korea and ISBSG data set. The results of the Wilcoxon rank sum test confirm that our approach outperforms the other five methods with respect to the data set size, the number of missing attributes, the missing data percentage, and the missingness mechanism.

Advisors: Bae, Doo-Hwan researcher; 배두환 researcher

Description: 한국과학기술원 : 전산학전공,

Publisher: 한국과학기술원

Issue Date: 2009

Identifier: 303642/325007 / 020073371

Language: eng

Description: 학위논문(석사) - 한국과학기술원 : 전산학전공, 2009.2, [ vi, 47 p. ]

Keywords: imputation; k-NN; maximum likelihood estimation; software project data; cluster; 대치법; k 최근접이웃대치법; 최대우도추정법; 소프트웨어 프로젝트 데이터; 클러스터; imputation; k-NN; maximum likelihood estimation; software project data; cluster; 대치법; k 최근접이웃대치법; 최대우도추정법; 소프트웨어 프로젝트 데이터; 클러스터

URI: http://hdl.handle.net/10203/34840

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=303642&flag=dissertation

Appears in Collection: CS-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Hybrid imputation of cluster-based k-NN and maximum likelihood estimation in software project data군집기반 k-NN과 최대우도추정법을 결합한 소프트웨어 프로젝트 데이터용 하이브리드 대치법

KOASAS

Communities & Collections