Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype cell key perturbation to enhance disclosure control against difference attacks #251

Open
tombisho opened this issue Nov 25, 2021 · 0 comments
Labels
Milestone

Comments

@tombisho
Copy link
Contributor

Stefan has illustrated two methods of retrieving data from DataSHIELD with difference attacks. In short:

(1) by comparing the mean of a column with all rows and with one row removed
(2) by comparing the mean of a column with all rows and with one row duplicated

This is hard to protect against because it is done by creating two subsets that generally have large numbers of rows.

Research indicates that the best protection against difference attacks is to add noise. There is a package cellKey which provides the ability to add noise to a table in R. This could be repackaged for DataSHIELD use.

The issues to address are:

  • when to apply the noise - on import of data into the session? Or when the data are split into subsets?
  • the cell key process has been used for census data, and tends to be evaluated on a particular data set to see if it is appropriate. How would that work for DataSHIELD with diverse datasets?
@StuartWheater StuartWheater added this to the v6.4 milestone Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants