Skip to Content

De-, re-classify data?


Some of the data I work with contain sensitive information (names of persons, dates, locations, etc). But I sometimes need to share "the numbers" with other persons to get help with statistical analysis, or process it on more powerful machines where I can't manage who looks at the data.

Ideally I would like to work like this:

  1. Read the data into R (look at it, clean it, etc.)
  2. Select a data frame that I want to de-classify, run it through a package and receive two "files": the de-classified data and a translation-file. The latter I will keep myself.
  3. The de-classified data can be shared, manipulated and processed without worries.
  4. I re-classify the processed data together with the translation-file.

I suppose that this can also be useful when uploading data for processing "in the cloud" (Amazon etc.).

Have you been in this situation? I first thought about writing a "randomize" function myself, but then I realized there is no end on how sophisticated this can be done (for example offsetting time-stamps without loosing order). Maybe there is already a defined method or tool?

Thanks to everyone who contributes to [r]-tag here at Stack Overflow!