Enabling Visual Object Detection With Object Sounds via Visual Modality Recalling Memory

Cited 1 time in webofscience Cited 0 time in scopus
  • Hit : 5
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorKim, Jung Ukko
dc.contributor.authorRo, Yong Manko
dc.date.accessioned2024-07-29T13:00:06Z-
dc.date.available2024-07-29T13:00:06Z-
dc.date.created2023-11-02-
dc.date.issued2023-10-
dc.identifier.citationIEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, pp.1 - 13-
dc.identifier.issn2162-237X-
dc.identifier.urihttp://hdl.handle.net/10203/321180-
dc.description.abstractWhen humans hear the sound of an object, they recall associated visual information and integrate the sound with recalled visual modality to detect the object. In this article, we present a novel sound-based object detector that mimics this process. We design a visual modality recalling (VMR) memory to recall information of a visual modality based on an audio modal input (i.e., sound). To achieve this goal, we propose a VMR loss and an audio-visual association loss to guide the VMR memory to memorize visual modal information by establishing associations between audio and visual modalities. With the visual modal information recalled through the VMR memory along with the original audio input, we perform audio-visual integration. In this step, we introduce an integrated feature contrastive loss that allows the integrated feature to be embedded as if it were encoded using both audio and visual modal inputs. This guidance enables our sound-based object detector to effectively perform visual object detection even when only sound is provided. We believe that our work is a cornerstone study that offers a new perspective to conventional object detection studies that solely rely on the visual modality. Comprehensive experimental results demonstrate the effectiveness of the proposed method with the VMR memory.-
dc.languageEnglish-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleEnabling Visual Object Detection With Object Sounds via Visual Modality Recalling Memory-
dc.typeArticle-
dc.identifier.wosid001092377100001-
dc.identifier.scopusid2-s2.0-85174831658-
dc.type.rimsART-
dc.citation.beginningpage1-
dc.citation.endingpage13-
dc.citation.publicationnameIEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS-
dc.identifier.doi10.1109/TNNLS.2023.3323560-
dc.contributor.localauthorRo, Yong Man-
dc.contributor.nonIdAuthorKim, Jung Uk-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle; Early Access-
dc.subject.keywordAuthorMemory network-
dc.subject.keywordAuthormodality recalling-
dc.subject.keywordAuthorobject sound-
dc.subject.keywordAuthorvisual object detection-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 1 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0