Understanding multilingualism in Switzerland using text mining algorithms텍스트 마이닝 알고리즘을 이용한 다중 언어 사회 스위스에 대한 이해

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 421
  • Download : 0
Many of today’s societies are made up of multiple language groups, including groups of monolingual speakers and multilingual speakers of several different languages. We can ask many interesting questions about those societies including how widely each language is used, what topics are communicated in each language, whether there are time differences in the way information gets to each language group, and whether and how members of a language group communicate with members of another language group. We tackle these questions by looking at Switzerland, a highly multilingual society, with a large corpus of geotagged Twitter data. Specifically, we crawled 47 million tweets from 97,577 users, identified the language for each of those tweets, and analyzed those tweets using topic and language analysis tools. By using hierarchical Dirichlet scaling process, a nonparametric topic model for labeled data, we discover which topics are most popular for English, German, French monolinguals, as well as English-German, English-French, and German-French bilingual users. We analyze hashtags for major world events to understand whether certain groups have earlier access to information. We look at the general language use to compare the language variety of monolingual and bilingual users. By applying these computational methods to a large corpus of tweets from Switzerland, we show that there are many interesting linguistic and sociolinguistic phenomena that can be uncovered.
Advisors
Oh, Hae-Yunresearcher오혜연
Description
한국과학기술원 : 전산학과,
Publisher
한국과학기술원
Issue Date
2014
Identifier
592443/325007  / 020113144
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전산학과, 2014.8, [ iv, 23 p. ]

Keywords

Text Mining; 토픽 모델링; 트위터; 소셜미디어; 다중 언어; 텍스트 마이닝; Multilingualism; Social Media; Twitter; Topic Modelling

URI
http://hdl.handle.net/10203/196857
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=592443&flag=dissertation
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0