Dealing with Missing Modalities in the Visual Question Answer-Difference Prediction Task through Knowledge Distillation

Cited 6 time in webofscience Cited 0 time in scopus
  • Hit : 111
  • Download : 0
In this work, we address the issues of the missing modalities that have arisen from the Visual Question Answer-Difference prediction task and find a novel method to solve the task at hand. We address the missing modality-the ground truth answers-that are not present at test time and use a privileged knowledge distillation scheme to deal with the issue of the missing modality. In order to efficiently do so, we first introduce a model, the "Big" Teacher, that takes the image/question/answer triplet as its input and outperforms the baseline, then use a combination of models to distill knowledge to a target network (student) that only takes the image/question pair as its inputs. We experiment our models on the VizWiz and VQA-V2 Answer Difference datasets and show through extensive experimentation and ablation the performance of our method and a diverse possibility for future research.
Publisher
IEEE
Issue Date
2021-06
Language
English
Citation

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

ISSN
2160-7508
DOI
10.1109/cvprw53098.2021.00175
URI
http://hdl.handle.net/10203/312241
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 6 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0