DSpace at KOASAS: Learning to Generate Inversion-Resistant Model Explanations

DSpace at KOASAS

College of Engineering(공과대학)Kim Jaechul Graduate School of AI(김재철AI대학원)AI-Conference Papers(학술대회논문)

Learning to Generate Inversion-Resistant Model Explanations

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 149
Download : 0

Export

Jeong, Hoyong / Lee, Suyoung / Hwang, Sung Ju researcher / Son, Sooel researcher

The wide adoption of deep neural networks (DNNs) in mission-critical applications has spurred the need for interpretable models that provide explanations of the model’s decisions. Unfortunately, previous studies have demonstrated that model explanations facilitate information leakage, rendering DNN models vulnerable to model inversion attacks. These attacks enable the adversary to reconstruct original images based on model explanations, thus leaking privacy-sensitive features. To this end, we present Generative Noise Injector for Model Explanations (GNIME), a novel defense framework that perturbs model explanations to minimize the risk of model inversion attacks while preserving the interpretabilities of the generated explanations. Specifically, we formulate the defense training as a two-player minimax game between the inversion attack network on the one hand, which aims to invert model explanations, and the noise generator network on the other, which aims to inject perturbations to tamper with model inversion attacks. We demonstrate that GNIME significantly decreases the information leakage in model explanations, decreasing transferable classification accuracy in facial recognition models by up to 84.8% while preserving the original functionality of model explanations.

Publisher: NeurIPS

Issue Date: 2022-11-30

Language: English

Citation: 36th Annual Conference on Neural Information Processing Systems (NeurIPS 2022)

URI: http://hdl.handle.net/10203/299618

Appears in Collection: AI-Conference Papers(학술대회논문)CS-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Learning to Generate Inversion-Resistant Model Explanations

KOASAS

Communities & Collections