DSpace at KOASAS: (A) pattern-based approach to identifying and correcting outliers in software project data

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Ph.D.(박사논문)

(A) pattern-based approach to identifying and correcting outliers in software project data소프트웨어 프로젝트 데이터에 대한 패턴 기반의 이상치 검출 및 정제 기법

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 493
Download : 0

Export

Yoon, Kyung-A / 윤경아

Despite the importance of the quality of Software Project Data (SPD), problematic data inevitably occurs during data collection. These data are called as outliers, which are the SPD instances with abnormal values on certain attributes. We call these attributes the abnormal attributes of outliers. To improve the quality of SPD instances, it is necessary to identifying outliers and their abnormal attributes, and correcting abnormal values should be considered also. Although few existing approaches identify outliers and their abnormal attributes, these approaches are not effective in (1) identifying the abnormal attributes when the outlier has abnormal values on more than the specific number of its attributes and (2) identifying the outliers that contains the abnormal values of attributes other than a specific attribute related to the base algorithm. The existing approach correcting abnormal values of outliers has the tendency to generate many new outliers by its improper correction. In this paper, we propose a pattern-based approach to identifying and correcting outliers in SPD instances: after discovering the reliable frequent patterns that reflect the typical characteristics of the SPD instances, outliers and their abnormal attributes are detected by matching the SPD instances with those patterns. Then, the abnormal values of the outliers are corrected by replacing with the weighted mean of k similar SPD instances, which are completely matched with the most similar and significant patterns with the outliers. Empirical studies were performed on three industrial data sets and 64 artificial data sets with injected outliers. The detection accuracy results demonstrate that our approach outperforms five other approaches by an average of 35.27% and 107.5% in detecting the outliers and abnormal attributes, respectively, on the industrial data sets, and an average of 61.51% and 110.93% respectively on the artificial data sets. In addition, the correction accura...

Advisors: Bae, Doo-Hwan researcher; 배두환 researcher

Description: 한국과학기술원 : 전산학과,

Publisher: 한국과학기술원

Issue Date: 2010

Identifier: 418729/325007 / 020035189

Language: eng

Description: 학위논문(박사) - 한국과학기술원 : 전산학과, 2010.2, [ vii, 82 p. ]

Keywords: software data; data cleaning; data quality; outlier; noisy data; 노이지 데이터; 소프트웨어 데이터; 데이터 정제; 데이터 품질; 이상치

URI: http://hdl.handle.net/10203/33292

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=418729&flag=dissertation

Appears in Collection: CS-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

(A) pattern-based approach to identifying and correcting outliers in software project data소프트웨어 프로젝트 데이터에 대한 패턴 기반의 이상치 검출 및 정제 기법

KOASAS

Communities & Collections