PPI-BERT: Pretraining transformers with masked sequence-structure of protein fragments for learning protein-protein interactionsPPI-BERT: 단백질-단백질 상호작용 학습을 위한 마스크된 서열-구조의 단백질 단편 구성의 사전 학습된 트랜스포머

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 4
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisor김호민-
dc.contributor.advisorKim, Homin-
dc.contributor.advisor차미영-
dc.contributor.authorJung, Hyunkyu-
dc.contributor.author정현규-
dc.date.accessioned2024-07-30T19:31:43Z-
dc.date.available2024-07-30T19:31:43Z-
dc.date.issued2024-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1097251&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/321671-
dc.description학위논문(석사) - 한국과학기술원 : 전산학부, 2024.2,[iv, 29 p. :]-
dc.description.abstractThe ability to treat 3D data or point clouds has tremendously impacted various applications. Proteins are functional components in biological processes that comprise amino acid residues linked by peptide bonds. Linear polypeptides fold into a specific 3D structure and form a complex with other proteins or biomolecules for their cellular functions. Predicting whether two proteins interact, also known as protein-protein interactions (PPI), is a fundamental challenge in biomedical fields. Here, we propose PPI-BERT, a pre-trained Transformer to learn PPI using protein sequences and structures repre- sented as heterogenous point clouds. Our model uses a rotation invariant method to obtain a canonical representation of protein structures and segments them into fragments of fixed amino acid lengths while retaining information regarding atom positions and amino acid classes. This “sequence-structure” representation is used to train a tokenizer that learns discrete token IDs to optimize the sequence and structure reconstruction. Masked modeling is used to train the Transformer encoder model on tokenized fragments. Our self-supervised model was trained on protein complex structures (N=85,885) from the Protein Data Bank. Evaluation shows that our model outperforms existing methods in two critical PPI downstream tasks: binding and interface region predictions. These results are an important step toward developing computational models for PPI applications such as drug discovery.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subject단백질 구조▼a기하적 심층 학습▼a비지도 학습▼a사전 훈련된 모델▼a마스크 모델-
dc.subjectProtein structure▼aGeometric deep learning▼aUnsupervised learning▼aPre-trained model▼aMasked model-
dc.titlePPI-BERT: Pretraining transformers with masked sequence-structure of protein fragments for learning protein-protein interactions-
dc.title.alternativePPI-BERT: 단백질-단백질 상호작용 학습을 위한 마스크된 서열-구조의 단백질 단편 구성의 사전 학습된 트랜스포머-
dc.typeThesis(Master)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :전산학부,-
dc.contributor.alternativeauthorCha, Meeyoung-
Appears in Collection
CS-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0