DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Choi, Jung Kyoon | - |
dc.contributor.advisor | 최정균 | - |
dc.contributor.author | Yang, Woojin | - |
dc.contributor.author | 양우진 | - |
dc.date.accessioned | 2018-05-23T19:34:07Z | - |
dc.date.available | 2018-05-23T19:34:07Z | - |
dc.date.issued | 2017 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=718828&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/241811 | - |
dc.description | 학위논문(박사) - 한국과학기술원 : 바이오및뇌공학과, 2017.8,[iv, 83 p. :] | - |
dc.description.abstract | One of the greatest challenges in cancer genomics is to distinguish driver mutations from passenger mutations. Whereas recurrence is a hallmark of driver mutations, it is difficult to observe recurring noncoding mutations owing to a limited amount of whole-genome sequenced samples. Hence, it is required to develop a method to predict potentially recurrent mutations. In this work, I developed a random forest classifier that predicts regulatory mutations that may recur based on the features of the mutations repeatedly appearing in a given cohort. Recurrent mutations can arise at the same site or affect the same gene from different sites. Here I identified a set of mutations arising from individual samples and altering different cis-regulatory elements that converge on a common gene via chromatin interactions. With breast cancer and lung cancer as a model, I profiled up-to 50 quantitative features describing genetic and epigenetic signals at the mutation site, transcription factors whose binding motif were disrupted by the mutation, and genes targeted by long-range chromatin interactions. A true set of mutations for random forest was generated by interrogating publicly available pan-cancer genomes based on our statistical model of mutation recurrence. The performance of my random forest classifier was evaluated by cross validations. My methods enable to characterize recurrent regulatory mutations using a limited number of whole-genome samples, and based on the characterization, to predict potential driver mutations whose recurrence is not found in the given samples but likely to be observed with additional samples. The mutations and genes identified in this fashion showed strong relevance to cancer, in contrast to those with site-specific recurrence. My methods were capable of accurately predicting mutations recurring at the target gene level but not those recurring at the same site. In conclusion, I propose a novel approach to discovering potential cancer-driving mutations in noncoding regions. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | 머신러닝▼a후성유전체▼a암 체세포 돌연변이▼a크로마틴 원거리 상호작용▼a전사체 | - |
dc.subject | machine learning▼aepigenome▼acancer somatic mutation▼adistal chromatin interaction▼atranscriptome | - |
dc.title | Machine learning for the identification of noncoding driver mutations in cancer | - |
dc.title.alternative | 암 세포에서 발생하는 돌연변이의 기능을 확인하기 위한 머신러닝 알고리즘 연구 | - |
dc.type | Thesis(Ph.D) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :바이오및뇌공학과, | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.