DSpace at KOASAS: Design and implementation of a community-based cluster crawler using the link structure and text information of hyperlinks

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Master(석사논문)

Design and implementation of a community-based cluster crawler using the link structure and text information of hyperlinks하이퍼링크의 링크 구조와 텍스트 정보를 이용한 커뮤니티 기반의 클러스터 크롤러의 설계 및 구현

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 390
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Whang, Kyu-Young	-
dc.contributor.advisor	황규영	-
dc.contributor.author	Khamidov, Ravshan	-
dc.date.accessioned	2011-12-13T06:06:53Z	-
dc.date.available	2011-12-13T06:06:53Z	-
dc.date.issued	2007	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=268875&flag=dissertation	-
dc.identifier.uri	http://hdl.handle.net/10203/34783	-
dc.description	학위논문(석사) - 한국과학기술원 : 전산학전공, 2007. 8, [ vii, 39 p. ]	-
dc.description.abstract	Community-limited search is a technique for improving the quality of search output by limiting the search within a specified community. A community in this thesis refers to a collection of semantically-related web pages. There have been few techniques proposed for finding such communities. The incremental cluster crawler, proposed by Kim, finds communities incrementally using the link structure of web pages crawled. This crawler, however, has some drawbacks. For instance, it does not consider the text information. Moreover, seed URLs affect clustering quality because one community is created for each seed URL. In this thesis, we propose a new method for finding communities incrementally. The key idea is to use both the link structure and the text information. Specifically, it first computes the similarity based on the link structure and the text information separately, and then combines the two resulting similarity scores. To compute the similarity based on the text information, we use the text embedded in the hyperlink to a target web page instead of the text in the target web page itself. By using both the link structure and text information, the proposed method can improve the overall clustering quality. We also propose a method for merging communities to reduce the influence of seed URLs on the clustering quality. The proposed method merges communities that are created from different seed URLs by computing the similarity between communities. Experimental results show that the proposed method improves the clustering quality by up to 3 times compared with the incremental cluster crawler proposed by Kim.	eng
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	web crawling	-
dc.subject	web clustering	-
dc.subject	web community	-
dc.subject	웹 크롤링	-
dc.subject	웹 클러스터링	-
dc.subject	웹 커뮤니티	-
dc.subject	web crawling	-
dc.subject	web clustering	-
dc.subject	web community	-
dc.subject	웹 크롤링	-
dc.subject	웹 클러스터링	-
dc.subject	웹 커뮤니티	-
dc.title	Design and implementation of a community-based cluster crawler using the link structure and text information of hyperlinks	-
dc.title.alternative	하이퍼링크의 링크 구조와 텍스트 정보를 이용한 커뮤니티 기반의 클러스터 크롤러의 설계 및 구현	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	268875/325007	-
dc.description.department	한국과학기술원 : 전산학전공,	-
dc.identifier.uid	020044370	-
dc.contributor.localauthor	Whang, Kyu-Young	-
dc.contributor.localauthor	황규영	-

Appears in Collection: CS-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Design and implementation of a community-based cluster crawler using the link structure and text information of hyperlinks하이퍼링크의 링크 구조와 텍스트 정보를 이용한 커뮤니티 기반의 클러스터 크롤러의 설계 및 구현

KOASAS

Communities & Collections