InstaFormer plus plus : Multi-Domain Instance-Aware Image-to-Image Translation with Transformer

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 10
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorKim, Soohyunko
dc.contributor.authorBaek, Jongbeomko
dc.contributor.authorPark, Jihyeko
dc.contributor.authorHa, Eunjaeko
dc.contributor.authorJung, Hominko
dc.contributor.authorLee, Taeyoungko
dc.contributor.authorKim, Seungryongko
dc.date.accessioned2024-08-16T02:00:05Z-
dc.date.available2024-08-16T02:00:05Z-
dc.date.created2024-08-16-
dc.date.issued2024-04-
dc.identifier.citationINTERNATIONAL JOURNAL OF COMPUTER VISION, v.132, no.4, pp.1167 - 1186-
dc.identifier.issn0920-5691-
dc.identifier.urihttp://hdl.handle.net/10203/322305-
dc.description.abstractWe present a novel Transformer-based network architecture for instance-aware image-to-image translation, dubbed InstaFormer, to effectively integrate global- and instance-level information. By considering extracted content features from an image as visual tokens, our model discovers global consensus of content features by considering context information through self-attention module of Transformers. By augmenting such tokens with an instance-level feature extracted from the content feature with respect to bounding box information, our framework is capable of learning an interaction between object instances and the global image, thus boosting the instance-awareness. We replace layer normalization (LayerNorm) in standard Transformers with adaptive instance normalization (AdaIN) to enable a multi-modal translation with style codes. In addition, to improve the instance-awareness and translation quality at object regions, we present an instance-level content contrastive loss defined between input and translated image. Although competitive performance can be attained by InstaFormer, it may face some limitations, i.e., limited scalability in handling multiple domains, and reliance on domain annotations. To overcome this, we propose InstaFormer++ as an extension of Instaformer, which enables multi-domain translation in instance-aware image translation for the first time. We propose to obtain pseudo domain label by leveraging a list of candidate domain labels in a text format and pretrained vision-language model. We conduct experiments to demonstrate the effectiveness of our methods over the latest methods and provide extensive ablation studies.-
dc.languageEnglish-
dc.publisherSPRINGER-
dc.titleInstaFormer plus plus : Multi-Domain Instance-Aware Image-to-Image Translation with Transformer-
dc.typeArticle-
dc.identifier.wosid001091935600001-
dc.identifier.scopusid2-s2.0-85175203558-
dc.type.rimsART-
dc.citation.volume132-
dc.citation.issue4-
dc.citation.beginningpage1167-
dc.citation.endingpage1186-
dc.citation.publicationnameINTERNATIONAL JOURNAL OF COMPUTER VISION-
dc.identifier.doi10.1007/s11263-023-01866-y-
dc.contributor.localauthorKim, Seungryong-
dc.contributor.nonIdAuthorKim, Soohyun-
dc.contributor.nonIdAuthorBaek, Jongbeom-
dc.contributor.nonIdAuthorPark, Jihye-
dc.contributor.nonIdAuthorHa, Eunjae-
dc.contributor.nonIdAuthorJung, Homin-
dc.contributor.nonIdAuthorLee, Taeyoung-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorGANs-
dc.subject.keywordAuthorInstance-aware image-to-image translation-
dc.subject.keywordAuthorVision and language-
dc.subject.keywordAuthorImage-to-image translation-
Appears in Collection
AI-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0