Bio marker derivation for disease screening using genetic mutation subsets and characteristics based on selective searching algorithm = 선별 탐색 알고리즘 기반 유전변이 집합 및 특성을 이용한 질병 판별 마커 도출

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 310
  • Download : 0
In this thesis, we try to extract a disease screening marker based on genetic mutations related to diseases from the whole genome or exome sequencing data. Although there are many studies for finding disease-related genetic characteristics from the genomic data whose size are tens to hundreds of gigabytes, the actual bio markers used in the clinical medicine occupy only a small part of the total information. This is because only partial genetic information is considered such as some genes in the cases of existing methods. Additionally, a mutual relationship of mutations have been rarely studied. Therefore, in this thesis, we propose a selective searching algorithm which examines the relationship between genetic characteristics and a disease from the whole genome or exome data by considering the combination of genetic mutations. First, we propose a searching algorithm for a combination of disease-related mutations based on the whole exome sequencing data. Here, we consider point mutations such as SNVs and InDels. In the extraction algorithm, we filter candidate mutations by applying the learning concept. The entire samples are divided into training and test samples, and marker extraction and validation samples are randomly selected from the training samples. From marker extraction samples, we extract disease-related mutations that have many changes in disease samples and few changes in normal samples. Then, we apply extracted disease-related mutations to validation samples, and select only mutations whose accuracy is maintained in validation samples. The random selection of the marker extraction samples and the validation samples is repeated until the number of selected mutations is converged. Then, we propose an objective function-based searching algorithm to find a combination of disease-related mutations. The combination of disease-related mutations is obtained by applying the objective function-based searching algorithm to extracted candidate mutations related to a disease. Finally, we apply the proposed searching algorithms for the combination of disease-related mutations to whole exome sequencing data of acute myeloid leukemia (AML). Then, we analyze the validity of the proposed marker and extracted genes. To check the validity of the proposed marker, the proposed threshold-based classification, support vector machine (SVM) and convolutional neural network (CNN) are used. Second, we propose a searching algorithms for a combination of the disease-related mutations based on the whole genome sequencing data which includes exome, intron and inter-genic regions. The extraction process of candidate mutations is the same as the whole exome data-based method. We newly propose the objective function of the searching algorithm for the whole genome sequencing data. In the case of the whole genome sequencing data, the number of candidate mutations is quite large value compared to the whole exome sequencing data. Thus, the objective function is redefined in the consideration of the classification accuracy, difference, variance for disease and normal groups in training samples. In addition, we extract the disease screening marker from major genes and their inter-genic regions. To confirm the performance of the disease screening marker based on the whole genome sequencing data, we observe classification results for test samples by applying the proposed threshold, SVM and CNN methods. Finally, we compare the whole exome data-based marker with the whole genome data-based marker.
Cho, Dong-Horesearcher조동호researcher
한국과학기술원 :전기및전자공학부,
Issue Date

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2019.2,[vi, 78 p. :]


Molecular diagnosis▼abio marker▼adisease screening marker▼acombination of genetic mutations▼aselective searching algorithm▼alearning algorithm▼awhole exome sequencing data▼awhole genome sequencing data▼aacute myeloid leukemia; 분자 진단▼a바이오 마커▼a질병 판별 마커▼a유전 변이 집합▼a선별 탐색 알고리즘▼a러닝 알고리즘▼a전체 엑솜 염기서열 데이터▼a전체 유전체 염기서열 데이터▼a급성 골수성 백혈병

Appears in Collection
Files in This Item
There are no files associated with this item.


  • mendeley


rss_1.0 rss_2.0 atom_1.0