An Empirical Study of Utility and Disclosure Risk for Tabular Data Synthesis Models: In-Depth Analysis and Interesting Findings

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 17
  • Download : 0
The ever-growing accumulation of data in various applications has spurred research into privacy-enhancing technologies. Synthetic data, in particular, has gained significant attention for enhancing machine learning model performance while preserving personal information. Although synthetic data studies have been on the rise, there are no clear criteria for how to measure the utility and disclosure risk of synthetic data. Furthermore, although many existing studies have primarily concentrated on image data synthesis models, there's a notable scarcity of research on tabular data synthesis models, particularly concerning disclosure risk. This is crucial in domains such as finance, which heavily rely on tabular datasets containing sensitive information. In this paper, we perform in-depth analysis of utility and disclosure risk index from classical to state-of-the-art tabular data synthesis models in terms of different metrics and various types of datasets. Our interesting findings can be summarized as follows: (1) Synthetic data's utility tends to increase as the proportion of continuous attributes in the original data decreases, (2) Conversely, disclosure risk rises with a lower proportion of continuous attributes in the original data, (3) As the volume of synthetic data grows, both utility and disclosure risk metrics generally increase, (4) An inverse relationship is observed between the sparsity of original data and a specific utility metric, and (5) Notably, we discover that Targeted Correct Attribution Probability (TCAP), a widely-used disclosure risk metric, fails to measure certain outlier records that are potential vulnerabilities for malicious attacks.
Publisher
Institute of Electrical and Electronics Engineers Inc.
Issue Date
2024-02-20
Language
English
Citation

2024 IEEE International Conference on Big Data and Smart Computing, BigComp 2024, pp.67 - 74

DOI
10.1109/BigComp60711.2024.00020
URI
http://hdl.handle.net/10203/319842
Appears in Collection
CS-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0