KRIT: a Korean readability index with a hybrid transformerKRIT: Transformer 기반 한글 가독성 지표 모델

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 104
  • Download : 0
The readability index is an index indicating the level of text. It can be used in various fields, such as book recommendation, writing ability evaluation, personalized recommendation, online bot detection and even in fake news analysis. The traditional readability models utilize simple linguistic features with simple regression models. In very recent years, readability research utilizing deep learning models has been conducted. However, in Korea, readability research is very scarce and there are even no public datasets or automated baseline models while English readability research has. The existing Korean readability indexes were developed using a simple regression model, evaluated with very small data and even do not evaluated with the evaluation metrics. Therefore, we propose a novel Korean readability index model, KRIT, that considers both grammatical structure and lexical meaning based on transformer encoder with transformer-based pretrained language model, BERT, for Korean. For the dataset, we used 25,449 sentences from Korean textbook data, written for ages 8-16, grouped into 4 grade-level classes. We compared the performance of KRIT with the existing Korean or English readability model and demonstrated that our proposed model outperforms other baselines with the accuracy of 0.746 and MAE 0.327. According to our knowledge, it is a first attempt to use deep learning NLP techniques, pretrained word embedding and transformer encoder architecture, for Korean readability assessment and evaluated with enough data.
Advisors
Moon, Sue Bokresearcher문수복researcher
Description
한국과학기술원 :전산학부,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전산학부, 2022.2,[iv, 26 p. :]

URI
http://hdl.handle.net/10203/309526
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=997585&flag=dissertation
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0