Learning audio-visual relationships and correspondences in the visual scenes시각과 청각 정보 간의 관련성 학습 기법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 69
  • Download : 0
If we think about how we, as human beings, experience the world around us, it can be realized that we continuously use all of our senses. With all these different sensory signals, we learn and understand the scenes. Regardless of whether a person sees an image of “lion”, hears a “lion roaring” sound, or hears someone says the word “lion”, the same response is triggered inside of the human brain. Though human perception uses multimodal information, most of the existing models for understanding the scene around us deal with only a single modality, such as vision. Thus, developing a machine perception that uses multimodal data is very essential. Among these sensory signals, inarguably the most dominant ones are vision and audition. The sound is not only complementary to the visual information but also correlated to visual events. When we see that a car is moving, we hear the engine sound at the same time. In this thesis, I introduce computational models to find the correspondence and complementary information between audio and visual signals. I introduce several tasks that benefit from the correspondence information such as sound source localization, audio-visual cross-modal retrieval, and audio-visual driven important moment selection in the videos. I propose effective self-supervised, semi-supervised and weakly-supervised methods to learn audio-visual correspondence. I also discuss different relationships of audio-visual signals as they do not follow a single type of relationship and leverage these two signals as complementary information to each other in video understanding task by following different ways of audio-visual formations.
Advisors
Kweon, In Soresearcher권인소researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2022.8,[ix, 73 p. :]

Keywords

Audio-visual learning▼aSelf-supervision▼aMultimodal learning; 시각-청각 학습▼a자가학습▼a다중모달 학습

URI
http://hdl.handle.net/10203/309064
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1007849&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0