DC Field | Value | Language |
---|---|---|
dc.contributor.author | Park, Dae-Young | ko |
dc.contributor.author | Ko, In-Young | ko |
dc.date.accessioned | 2024-06-18T08:19:38Z | - |
dc.date.available | 2024-06-18T08:19:38Z | - |
dc.date.created | 2024-06-18 | - |
dc.date.issued | 2024-02-20 | - |
dc.identifier.citation | 2024 IEEE International Conference on Big Data and Smart Computing, BigComp 2024, pp.67 - 74 | - |
dc.identifier.uri | http://hdl.handle.net/10203/319842 | - |
dc.description.abstract | The ever-growing accumulation of data in various applications has spurred research into privacy-enhancing technologies. Synthetic data, in particular, has gained significant attention for enhancing machine learning model performance while preserving personal information. Although synthetic data studies have been on the rise, there are no clear criteria for how to measure the utility and disclosure risk of synthetic data. Furthermore, although many existing studies have primarily concentrated on image data synthesis models, there's a notable scarcity of research on tabular data synthesis models, particularly concerning disclosure risk. This is crucial in domains such as finance, which heavily rely on tabular datasets containing sensitive information. In this paper, we perform in-depth analysis of utility and disclosure risk index from classical to state-of-the-art tabular data synthesis models in terms of different metrics and various types of datasets. Our interesting findings can be summarized as follows: (1) Synthetic data's utility tends to increase as the proportion of continuous attributes in the original data decreases, (2) Conversely, disclosure risk rises with a lower proportion of continuous attributes in the original data, (3) As the volume of synthetic data grows, both utility and disclosure risk metrics generally increase, (4) An inverse relationship is observed between the sparsity of original data and a specific utility metric, and (5) Notably, we discover that Targeted Correct Attribution Probability (TCAP), a widely-used disclosure risk metric, fails to measure certain outlier records that are potential vulnerabilities for malicious attacks. | - |
dc.language | English | - |
dc.publisher | Institute of Electrical and Electronics Engineers Inc. | - |
dc.title | An Empirical Study of Utility and Disclosure Risk for Tabular Data Synthesis Models: In-Depth Analysis and Interesting Findings | - |
dc.type | Conference | - |
dc.type.rims | CONF | - |
dc.citation.beginningpage | 67 | - |
dc.citation.endingpage | 74 | - |
dc.citation.publicationname | 2024 IEEE International Conference on Big Data and Smart Computing, BigComp 2024 | - |
dc.identifier.conferencecountry | TH | - |
dc.identifier.conferencelocation | 태국 방콕 | - |
dc.identifier.doi | 10.1109/BigComp60711.2024.00020 | - |
dc.contributor.localauthor | Ko, In-Young | - |
dc.contributor.nonIdAuthor | Park, Dae-Young | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.