DSpace at KOASAS: From unimodal to multimodal learning with adaptive alignment for 2D-3D visual recognition

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Ph.D.(박사논문)

From unimodal to multimodal learning with adaptive alignment for 2D-3D visual recognition2D-3D 시각 인식을 위한 적응적 정렬을 활용한 유니모달부터 멀티모달 학습 기법 연구

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 195
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Yoo, Chang Dong	-
dc.contributor.advisor	유창동	-
dc.contributor.author	Vu, Thang	-
dc.date.accessioned	2023-06-23T19:34:00Z	-
dc.date.available	2023-06-23T19:34:00Z	-
dc.date.issued	2023	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1030573&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/309154	-
dc.description	학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2023.2,[vi, 88 p. :]	-
dc.description.abstract	This dissertation considers unimodal and multimodal learning with adaptive alignment for 2D-3D visual recognition on images and point clouds. Regarding unimodality on 2D images, we investigate object detection and instance segmentation tasks, which are commonly formulated by a two-stage pipeline of RPN and R-CNN. We propose Cascade RPN with Adaptive Convolution to ensure alignment between features and reference boxes which is required for progressive refinement. For the R-CNN, we revisit Cascade Mask R-CNN and propose SCNet to align sample distribution between training and inference in existing cascade architectures. For unimodality on 3D point clouds, we propose SoftGroup to perform grouping on soft scores to avoid error propagation from hard semantic prediction into instance segmentation. SoftGroup is further extended to SoftGroup++ for scalable 3D instance segmentation with an adaptive strategy to reduce time complexity and search space. Finally, we propose Bird Eye View (BEV) fusion for multimodal object detection that aligns image and point features via BEV projection followed by weighted fusion to address the limitation of sparse points for far objects. Extensive experiments on various standard benchmarked datasets demonstrate the superiority and generality of the proposed methods.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	Unimodal▼aMultimodal▼aAdaptive alignment▼a2D-3D visual recognition▼aDeep neural network	-
dc.subject	유니모달▼a멀티모달▼a적응적 정렬▼a비주얼 인식 2D-3D▼a딥 뉴럴 네트워크	-
dc.title	From unimodal to multimodal learning with adaptive alignment for 2D-3D visual recognition	-
dc.title.alternative	2D-3D 시각 인식을 위한 적응적 정렬을 활용한 유니모달부터 멀티모달 학습 기법 연구	-
dc.type	Thesis(Ph.D)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :전기및전자공학부,	-
dc.contributor.alternativeauthor	부탕	-

Appears in Collection: EE-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

From unimodal to multimodal learning with adaptive alignment for 2D-3D visual recognition2D-3D 시각 인식을 위한 적응적 정렬을 활용한 유니모달부터 멀티모달 학습 기법 연구

KOASAS

Communities & Collections