Multi-task vision transformer for multi-modal medical image processing다중 모달리티 의료영상 처리를 위한 다중 작업 학습 기반 비전 변환기 모델

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 304
  • Download : 0
In this paper, we proposes various Vision Transformer-based models for different medical imaging modalities. Leveraging the intrinsic properties of the Vision Transformer, we applied the proposed methods to classification, segmentation and regression task using optical coherence tomography and radiograph images, to verify the benefit of using Vision Transformer compared with the conventional convolutional neural network. Optical coherence tomography is a medical imaging modality that utilizes light to obtain high resolution images with micrometer scale without harming the living tissue thanks to its non-invasiveness. In addition, as the acquisition speed is fast, it can be used as a real-time imaging modality which can be obtained during the medical procedures. Due to these properties, the optical coherence tomography has been widely used in the field of cardiology for evaluation of underlying pathology in patients with acute coronary syndrome. The optical coherence tomography image has three-dimensional structure with the frames stacked in direction of vessels. Leveraging these volumetric three-dimensional structure, we devises an algorithm that utilizes the Transformer to process the sequential optical coherence tomography images in similar way to natural language processing. Owing to its convenience and the cost-effectiveness, the radiograph image has been widely used for the purpose of screening for a variety of pathologic conditions. As it has two-dimensional structure similar to the natural image, the algorithms devised to process the natural image can readily be applied to this imaging modality. Therefore, we introduces the algorithms based on the properties of Vision Transformer, a recently introduced attention-based architecture without convolution, to improve the generalization capacity as well as the model performance given the limited data and label. In addition, noting that the Vision Transformer is suitable for multi-task learning and distributed learning, we also introduces the multi-task distributed learning methods tailored for Vision Transformer. Finally, as Vision Transformer-based model can benefit more form the self-supervised and semi-supervised learning than convolutional neural networks, we proposes a self-evolving framework that can amalgamate the strengths of two methods under the common ground of knowledge distillation.
Advisors
Ye, Jong Chulresearcher예종철researcher
Description
한국과학기술원 :바이오및뇌공학과,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 바이오및뇌공학과, 2023.2,[xviii, 154 p. :]

Keywords

Deep learning▼aVision transformer▼aOptical coherence tomography▼aRadiograph▼aMulti-task learning▼aDistributed learning▼aKnowledge distillation▼aSelf-supervised learning▼aSelf-training; 비전 변환기▼a광간섭 단층 촬영▼a방사선 촬영▼a다중 작업 학습법▼a분산 학습법▼a지식 증류 기법

URI
http://hdl.handle.net/10203/308034
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1030407&flag=dissertation
Appears in Collection
BiS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0