DSpace at KOASAS: An Empirical Study of Utility and Disclosure Risk for Tabular Data Synthesis Models: In-Depth Analysis and Interesting Findings

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Conference Papers(학술회의논문)

An Empirical Study of Utility and Disclosure Risk for Tabular Data Synthesis Models: In-Depth Analysis and Interesting Findings

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 16
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Park, Dae-Young	ko
dc.contributor.author	Ko, In-Young	ko
dc.date.accessioned	2024-06-18T08:19:38Z	-
dc.date.available	2024-06-18T08:19:38Z	-
dc.date.created	2024-06-18	-
dc.date.issued	2024-02-20	-
dc.identifier.citation	2024 IEEE International Conference on Big Data and Smart Computing, BigComp 2024, pp.67 - 74	-
dc.identifier.uri	http://hdl.handle.net/10203/319842	-
dc.description.abstract	The ever-growing accumulation of data in various applications has spurred research into privacy-enhancing technologies. Synthetic data, in particular, has gained significant attention for enhancing machine learning model performance while preserving personal information. Although synthetic data studies have been on the rise, there are no clear criteria for how to measure the utility and disclosure risk of synthetic data. Furthermore, although many existing studies have primarily concentrated on image data synthesis models, there's a notable scarcity of research on tabular data synthesis models, particularly concerning disclosure risk. This is crucial in domains such as finance, which heavily rely on tabular datasets containing sensitive information. In this paper, we perform in-depth analysis of utility and disclosure risk index from classical to state-of-the-art tabular data synthesis models in terms of different metrics and various types of datasets. Our interesting findings can be summarized as follows: (1) Synthetic data's utility tends to increase as the proportion of continuous attributes in the original data decreases, (2) Conversely, disclosure risk rises with a lower proportion of continuous attributes in the original data, (3) As the volume of synthetic data grows, both utility and disclosure risk metrics generally increase, (4) An inverse relationship is observed between the sparsity of original data and a specific utility metric, and (5) Notably, we discover that Targeted Correct Attribution Probability (TCAP), a widely-used disclosure risk metric, fails to measure certain outlier records that are potential vulnerabilities for malicious attacks.	-
dc.language	English	-
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	-
dc.title	An Empirical Study of Utility and Disclosure Risk for Tabular Data Synthesis Models: In-Depth Analysis and Interesting Findings	-
dc.type	Conference	-
dc.type.rims	CONF	-
dc.citation.beginningpage	67	-
dc.citation.endingpage	74	-
dc.citation.publicationname	2024 IEEE International Conference on Big Data and Smart Computing, BigComp 2024	-
dc.identifier.conferencecountry	TH	-
dc.identifier.conferencelocation	태국 방콕	-
dc.identifier.doi	10.1109/BigComp60711.2024.00020	-
dc.contributor.localauthor	Ko, In-Young	-
dc.contributor.nonIdAuthor	Park, Dae-Young	-

Appears in Collection: CS-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

An Empirical Study of Utility and Disclosure Risk for Tabular Data Synthesis Models: In-Depth Analysis and Interesting Findings

KOASAS

Communities & Collections