Genii Weblog

Pondering pre-anonymization and GDPR

Wed 4 Apr 2018, 02:39 PM

by Ben Langhinrichs
There's a lot of talk about GDPR and its implications for Domino and other software environments. One frequent reminder is that backups and archives are included in the right to be forgotten. Now, if you are a small shop and have infrequent GDPR requests, anonymizing backup and archive copies might be a tedious but doable task. But what it you are faced with many of these?
One possible solution is the idea of pre-anonymizing. Imagine that every named person is given a unique id code, and a lookup table is maintained. Then the backup or archive process itself could include a translation process where the named person was replaced with the unique id code. If there were some later point where the named person needed to be forgotten, the archive itself would be left untouched and the index containing the name to id would have the name removed. Thereafter, the id would return "not found" when anyone tried to retrieve the name.
This isn't a perfect solution, and there might need to be a periodic garbage collection where all unknown ids were converted to a single UNKNOWN id, but particularly for difficult to access backups stored in long term storage, it would provide a way to "forget" without altering the storage. It also might require too much effort per backup/archive, though it would also conversion of various names, nicknames, email addresses, etc. to the single code, which would also make retrieval of information on a request easier.
I do wonder how referenced but non-specific names in rich text would be handled. If the rich text says "Alan told me we could bill Krangdon and CC it to Krangdon's VP of Operations" would Alan and Krangdon's VP of Operations need to be identified by context for both the purpose of notification and anonymization. I imagine there is some level of specificity beyond which you could not be expected to identify a person (e.g., Jim's wife), but if I am wrong, Watson is going to be needed just to find the references. Interesting times.
Note: For what it is worth, it would be possible to pre-anonymize working Notes databases, but it would take more effort than seems worth it. For example, in ACLs and such, you could use the id and then have a group named after the id with a single member for each internal user. But other places would be worse, and there would need to be lookups frequently, so I doubt it is a good idea.

Copyright 2018 Genii Software Ltd.

What has been said:

1094.1. Russell Maher
(04/05/2018 09:25 AM)

FWIW Despite what some others appear to have proclaimed in a webinar, it is not entirely clear that, for example, fully removing someone from an ACL is required by GDPR. Even your own example referring to "Alan" is likely not be subject to GDPR on its own. Backups/archives are also similarly not a clear cut case.

GDPR provides for continued processing and/or refusal to "forget" for technical and/or legal reasons. Maintaining an audit trail is a valid legal reason for processing. The inability to remove one person's data from a backup without destroying other backups is very likely to be proven to be a reasonable technical hurdle preventing full execution of a right to be forgotten request.

It is far more likely that businesses will find legal ways of keeping that data justifiably under GDPR than businesses are going to invest in technology to find the world "Alan" throughout their entire environment and either remove it or anonymizing it/preparing to anonymize it.

My belief is that there will be a several court cases where the real impact will be fleshed out. It is simply untenable that, for example, me needing your permission to enter your information into my CRM system or my address book after you have given me your business card and yet there are many claiming that that is exactly what will have to be done.

There are also GDPR-related challenges in your own solution. We tag "Alan" so we can anonymize "Alan" in the future but that requires some sort of ID Tag which then would be used to perform the anonymization but then how do you prove it has been completed unless you actually keep the ID Tag which, by its very nature, would need to still have "Alan" connected with it and that Name-ID connection is clearly subject to GDPR. Very circular and some large businesses are going to have to prove that very thing in court over time.

1094.2. Ben Langhinrichs
(04/05/2018 10:01 AM)


I agree to some degree. There will certainly be a tension between a strict legal interpretation and reasonable implementation, but we don't know where that balance will be struck. If I had to hazard a guess, the courts will be unsympathetic to overly broad claims that it is too technically difficult, but will be more sympathetic to non-specific incidental use of first names and such. But that is just a guess.

As for audit trails and that sort of thing, my understanding is that they would be a weak defense when it came to customers/partners, but a stronger defense with regards to internal people, even ex-employees. Again, just an educated guess. I disagree about backups/archives, as those topics have been addressed by regulatory people in the EU. What point is there in "forgetting" somebody if you have a searchable archive available with all their information. You could simply declare every Notes database of JSON repository an archive. Similarly, you will need to be able to search/retrieve information from backups and archives. That means that even if pre-anonymizing were viable, it doesn't avoid the need to access archives and backups, just the need to modify them.

There will certainly be some interesting fights over how to balance GDPR and other compliance/regulatory requirements. The problem is going to come when the balance is even a little bit different than a specific company may have anticipated. If a developer or consultant says, "No worries, this would all fall under the exception for being too technically difficult" and then the courts decide otherwise, the threat of fines will hang over the development process. As a software vendor, my job is to anticipate such need for rapid remediation, even when it may or may not be necessary.

1094.3. Russell Maher
(04/05/2018 12:10 PM)

Since I've been immersed and negotiating and drafting data security contracts for over 12 years now I have maybe a more experienced perspective on this especially since we are exclusively in the business of collecting personal information which means I have had the experience of working directly with the data security counsel at many of the largest companies in the world which provides a rather unique opportunity to discuss these things at length.

You are are a tool vendor and you have to anticipate future customer needs. If I was advising you, I would say "wait" which I took a very long way so say above.