Domain adaptation in sentiment classification based on probabilistic models확률 모델에 기반한 의견 분류에서의 도메인 적응

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 444
  • Download : 0
Sentiment classification is a task to determine overall contextual polarity of a review document. Sentiment classification can be used for a company to check the problem of their products or services from the large data. It also can be used for customer to decide the products or services they would consume. There are two main difficulties dealing with sentiment classification. First, the documents are usually represented as a bag-of-words model and the dimension of such document data is very large, so we need methods to extract or reduce the number of dimension. Secondly, if the domain is different for training data and testing data, the performance decreased severely. However, it is hard to get the labeled data for the all the domain we are interested in. To extract or reduce the dimension, we tried three methods: principal component analysis (PCA), conditional entropy (CE), and independent component analysis (ICA). We can reduce the dimension using PCA without any loss of information. By changing the estimation of probability a little bit, we are able to achieve more balanced estimation of CE, which gives robust recognition through different number of features we selected. ICA can make the features independent, so that it was expected to give better result when we used it with CE. However, experiments suggest that ICA is not useful for CE. To resolve the problem of domain difference, we propose domain adapting Boltzmann machine algorithm. The big difference between domains comes from the word dictionary used for each domain. So we take the approach to generate target domain words that are not appearing in source domain, and vice versa. In this thesis, we first applied this idea to simple toy problem and then real world problem. We improved the classification accuracy using our algorithm.
Advisors
Lee, Soo-Youngresearcher이수영
Description
한국과학기술원 : 전기및전자공학과,
Publisher
한국과학기술원
Issue Date
2013
Identifier
513315/325007  / 020113491
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학과, 2013.2, [ v, 57 p. ]

Keywords

sentiment classification; domain adaptation; Boltzmann machine; conditional entropy; 의견 분류; 도메인 적응; 볼츠만 머신; 조건부 엔트로피; 독립 요소 분석; independent component analysis

URI
http://hdl.handle.net/10203/180995
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=513315&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0