Scalable Anti-TrustRank with Qualified Site-level Seeds for Link-based Web Spam Detection

Cited 0 time in webofscience Cited 2 time in scopus
  • Hit : 141
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorWhang, Joyce Jiyoungko
dc.contributor.authorJung, Yeonsungko
dc.contributor.authorKang, Seonggooko
dc.contributor.authorYoo, Donghoko
dc.contributor.authorDhillon, Inderjitko
dc.date.accessioned2020-11-25T01:50:21Z-
dc.date.available2020-11-25T01:50:21Z-
dc.date.created2020-11-25-
dc.date.created2020-11-25-
dc.date.issued2020-04-21-
dc.identifier.citation29th International World Wide Web Conference, WWW 2020, pp.593 - 602-
dc.identifier.urihttp://hdl.handle.net/10203/277597-
dc.description.abstractWeb spam detection is one of the most important and challenging tasks in web search. Since web spam pages tend to have a lot of spurious links, many web spam detection algorithms exploit the hyperlink structure between the web pages to detect the spam pages. In this paper, we conduct a comprehensive analysis of the link structure of web spam using real-world web graphs to systemically investigate the characteristics of the link-based web spam. By exploring the structure of the page-level graph as well as the site-level graph, we propose a scalable site-level seeding methodology for the Anti-TrustRank (ATR) algorithm. The key idea is to map a website into a feature space where we learn a classifier to prioritize the websites so that we can effectively select a set of good seeds for the ATR algorithm. This seeding method enables the ATR algorithm to detect the largest number of spam pages among the competitive baseline methods. Furthermore, we design work-efficient asynchronous ATR algorithms which are able to significantly reduce the computational cost of the traditional ATR algorithm without degrading the performance in detecting spam pages while guaranteeing the convergence.-
dc.languageEnglish-
dc.publisherAssociation for Computing Machinery-
dc.titleScalable Anti-TrustRank with Qualified Site-level Seeds for Link-based Web Spam Detection-
dc.typeConference-
dc.identifier.scopusid2-s2.0-85091700703-
dc.type.rimsCONF-
dc.citation.beginningpage593-
dc.citation.endingpage602-
dc.citation.publicationname29th International World Wide Web Conference, WWW 2020-
dc.identifier.conferencecountryCH-
dc.identifier.conferencelocationTaipei-
dc.identifier.doi10.1145/3366424.3385773-
dc.contributor.localauthorWhang, Joyce Jiyoung-
dc.contributor.nonIdAuthorJung, Yeonsung-
dc.contributor.nonIdAuthorKang, Seonggoo-
dc.contributor.nonIdAuthorYoo, Dongho-
dc.contributor.nonIdAuthorDhillon, Inderjit-
Appears in Collection
CS-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0