DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Myaeng, Sung-Hyon | - |
dc.contributor.advisor | 맹성현 | - |
dc.contributor.author | Song, Sa-Kwang | - |
dc.contributor.author | 송사광 | - |
dc.date.accessioned | 2011-12-13T05:27:59Z | - |
dc.date.available | 2011-12-13T05:27:59Z | - |
dc.date.issued | 2011 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=466475&flag=dissertation | - |
dc.identifier.uri | http://hdl.handle.net/10203/33335 | - |
dc.description | 학위논문(박사) - 한국과학기술원 : 전산학과, 2011.2, [ 88 p. ] | - |
dc.description.abstract | Term weighting for document ranking and retrieval has been an important research topic in Information Retrieval for decades. We propose a novel term weighting method that utilizes availability of past retrieval results consisting of the queries that contain a particular term, retrieval documents, and their relevance judgments. A term’s evidential weight, DP (Discrimination Power) which we propose in this paper, depends on the degree to which the mean weighting scores for the relevant and non-relevant document distributions are different in the relevance-judged past document collection. It also takes into account the rankings and similarity values of the relevant and non-relevant documents to make a compensation for incorrect positions or scores in the retrieved document list. The experiments were performed using two well-known open-source search engines, Terrier and Indri, and four different ranking models including TFIDF, DFR (Divergence From Randomness) BM25, Hiemstra Language Model, and Indri Language Model. Our experimental result using a standard test collection (TREC-3,4, and 5) shows that a term weighting scheme that incorporates the notion of evidential weights outperforms the four baseline scheme. It is interesting to note that we obtained the performance increase with only a small number of terms found in the relatively small number of past queries. An additional analysis of how the effectiveness changes as the number of terms having DP value increases shows that DP has strong applicability given a large set of queries because the effect of DP is in proportion to the number of DP terms. Further analysis shows the notion of evidential weight, not based on the entire collection but based on the relevance-judged documents, is clearly distinct from IDF. In addition, an experiment was performed and showed significant result on TREC Web Blogs collection to show the proposed method is feasible to apply to general Web search. As a result, we designed a new te... | eng |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | Evidential Weight | - |
dc.subject | Language Model | - |
dc.subject | Information Retrieval | - |
dc.subject | 가중치 부여방법 | - |
dc.subject | 어절 분별력 | - |
dc.subject | 경험적 가중치 | - |
dc.subject | 랭킹 모델 | - |
dc.subject | 정보 검색 | - |
dc.subject | Term Weighting | - |
dc.subject | Discrimination Power | - |
dc.title | (A) novel term weighting scheme based on discrimination power | - |
dc.title.alternative | 질의 어절의 고유한 분별력에 기반한 어절 가중치 부여방법 연구 | - |
dc.type | Thesis(Ph.D) | - |
dc.identifier.CNRN | 466475/325007 | - |
dc.description.department | 한국과학기술원 : 전산학과, | - |
dc.identifier.uid | 020035904 | - |
dc.contributor.localauthor | Myaeng, Sung-Hyon | - |
dc.contributor.localauthor | 맹성현 | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.