FP2VEC : new molecular featurizer inspired by natural language processingFP2VEC : 자연어 처리를 활용한 새로운 분자 표현식

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 533
  • Download : 0
The quantitative structure-activity relationship (QSAR) models are regression or classification models to predict the chemical properties of compounds. An exact prediction of QSAR models can save time and costs compared with actual experiments. For the prediction of QSAR model, the molecular featurizer, the numerical expression of a chemical compound is also important. Recently, the machine learning and deep learning techniques are widely used to develop new molecular featurizers to improve the prediction accuracy of QSAR model. Here we introduce the new method for the molecular featurizer, FP2VEC, inspired by the natural language processing techniques. Our new method can express the chemical compounds as a vector representation which is trained by a supervised learning method. And we built a QSAR model using a simple convolutional neural network to evaluate the prediction performance of the FP2VEC method. We evaluated the prediction performance of our model against four for the classification tasks and five datasets for the regression tasks. And we compared our model with other molecular featurizer models. On the classification tasks, our model showed the best prediction accuracy among the benchmark models on three out of four datasets. Also, our model implemented with multi-task learning method outperformed other the benchmark models. And on the regression tasks, our model showed the best performance two out of five datasets. Lastly, we tested the effect of the hyperparameters in our model, and some hyperparameters influenced to the prediction accuracy significantly. As a result, our new molecular featurizer based on NLP techniques provides more useful information and improved the prediction accuracy of QSAR prediction compared with the previous methods.
Advisors
Kim, Dong Supresearcher김동섭researcher
Description
한국과학기술원 :바이오및뇌공학과,
Publisher
한국과학기술원
Issue Date
2019
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 바이오및뇌공학과, 2019.2,[iv, 28 p. :]

Keywords

Molecular featurizer▼aquantitative structure-activity relationship▼aQSAR▼anatural language processing▼aNLP▼aconvolutional neural network▼aCNN▼amulti-task learning▼aQSAR prediction; 분자 표현식▼a정량적 구조 활성 관계 모델▼a자연어 처리▼a합성곱 신경망▼a멀티태스킹 학습▼a정량적 구조 활성 관계 예측

URI
http://hdl.handle.net/10203/266145
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=843182&flag=dissertation
Appears in Collection
BiS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0