KRIT: a Korean readability index with a hybrid transformerKRIT: Transformer 기반 한글 가독성 지표 모델

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 105
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorMoon, Sue Bok-
dc.contributor.advisor문수복-
dc.contributor.authorWi, Hee Ju-
dc.date.accessioned2023-06-26T19:31:25Z-
dc.date.available2023-06-26T19:31:25Z-
dc.date.issued2022-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=997585&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/309526-
dc.description학위논문(석사) - 한국과학기술원 : 전산학부, 2022.2,[iv, 26 p. :]-
dc.description.abstractThe readability index is an index indicating the level of text. It can be used in various fields, such as book recommendation, writing ability evaluation, personalized recommendation, online bot detection and even in fake news analysis. The traditional readability models utilize simple linguistic features with simple regression models. In very recent years, readability research utilizing deep learning models has been conducted. However, in Korea, readability research is very scarce and there are even no public datasets or automated baseline models while English readability research has. The existing Korean readability indexes were developed using a simple regression model, evaluated with very small data and even do not evaluated with the evaluation metrics. Therefore, we propose a novel Korean readability index model, KRIT, that considers both grammatical structure and lexical meaning based on transformer encoder with transformer-based pretrained language model, BERT, for Korean. For the dataset, we used 25,449 sentences from Korean textbook data, written for ages 8-16, grouped into 4 grade-level classes. We compared the performance of KRIT with the existing Korean or English readability model and demonstrated that our proposed model outperforms other baselines with the accuracy of 0.746 and MAE 0.327. According to our knowledge, it is a first attempt to use deep learning NLP techniques, pretrained word embedding and transformer encoder architecture, for Korean readability assessment and evaluated with enough data.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.titleKRIT: a Korean readability index with a hybrid transformer-
dc.title.alternativeKRIT: Transformer 기반 한글 가독성 지표 모델-
dc.typeThesis(Master)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :전산학부,-
dc.contributor.alternativeauthor위희주-
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0