Ben Langhinrichs

Photograph of Ben Langhinrichs
E-mail address - Ben Langhinrichs

April, 2018
01 02 03 04 05 06 07
08 09 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30

Search the weblog

Genii Weblog

Civility in critiquing the ideas of others is no vice. Rudeness in defending your own ideas is no virtue.

Wed 4 Apr 2018, 02:39 PM
Inline JPEG image
There's a lot of talk about GDPR and its implications for Domino and other software environments. One frequent reminder is that backups and archives are included in the right to be forgotten. Now, if you are a small shop and have infrequent GDPR requests, anonymizing backup and archive copies might be a tedious but doable task. But what it you are faced with many of these?
One possible solution is the idea of pre-anonymizing. Imagine that every named person is given a unique id code, and a lookup table is maintained. Then the backup or archive process itself could include a translation process where the named person was replaced with the unique id code. If there were some later point where the named person needed to be forgotten, the archive itself would be left untouched and the index containing the name to id would have the name removed. Thereafter, the id would return "not found" when anyone tried to retrieve the name.
This isn't a perfect solution, and there might need to be a periodic garbage collection where all unknown ids were converted to a single UNKNOWN id, but particularly for difficult to access backups stored in long term storage, it would provide a way to "forget" without altering the storage. It also might require too much effort per backup/archive, though it would also conversion of various names, nicknames, email addresses, etc. to the single code, which would also make retrieval of information on a request easier.
I do wonder how referenced but non-specific names in rich text would be handled. If the rich text says "Alan told me we could bill Krangdon and CC it to Krangdon's VP of Operations" would Alan and Krangdon's VP of Operations need to be identified by context for both the purpose of notification and anonymization. I imagine there is some level of specificity beyond which you could not be expected to identify a person (e.g., Jim's wife), but if I am wrong, Watson is going to be needed just to find the references. Interesting times.
Note: For what it is worth, it would be possible to pre-anonymize working Notes databases, but it would take more effort than seems worth it. For example, in ACLs and such, you could use the id and then have a group named after the id with a single member for each internal user. But other places would be worse, and there would need to be lookups frequently, so I doubt it is a good idea.

Copyright © 2018 Genii Software Ltd.