A term weighting approach exploiting external data for cancer clause classification from free-text radiology reports방사선과 보고서의 암 절 분류에 외부 데이터를 활용한 자질 가중치에 관한 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 700
  • Download : 0
Radiology reports are written by a medical expert via analysing radiology images such as CT and MRI. It consists of cancer clause and non-cancer clauses. We focus on text classification for cancer and non-cancer classes. This data has two unique characters. First, the number of cancer clauses is much smaller than the number of non-cancer clauses. Second, important terms for cancer also occur in the non-cancer class. Since it is often difficult to determine the cancer based on radiology images, some clauses are labelled as non-cancer in spite of having important terms for cancer. Recently, term weighting approaches have been proposed to solve the data imbalance problem. However, we argue that it sometimes gives weight wrongly due to duplicate terms. Consequently, we utilize cancer related external data to calculate term weights. Since external data is highly related with cancer, we can find important terms for cancer and calculate its weight. Based on calculated weights from external data, term weights in the cancer class are increased and term weights in the non-cancer class are decreased. Through the experiment, proposed method showed enhanced performance than term weighting methods using the training data.
Advisors
Myaeng, Sung-Hyonresearcher맹성현
Description
한국과학기술원 : 전산학과,
Publisher
한국과학기술원
Issue Date
2013
Identifier
515123/325007  / 020113190
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전산학과, 2013.2, [ v, 39 p. ]

Keywords

Text classification; Imbalanced data; Radiology report; Term weighting scheme; 방사선과 보고서; 용어 가중치; 문서 분류; 외부 데이터; External data

URI
http://hdl.handle.net/10203/180446
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=515123&flag=dissertation
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0