Event Sourcing and GDPR Compliance

Mateusz Joniak
June 29, 2018

Since its coming into force in May 2018, the European Union General Data Protection Regulation (GDPR) has revolutionised the way we think about personal data. No longer can companies process and store sensitive information in any way they like. Instead, they have to obtain user’s explicit consent. One may say, the regulation put control of the data back in the hands of its rightful owners: the people.

However, even if we all love the positive changes in data protection brought by the new law, the regulation comes with its own share of pitfalls and technical challenges. Especially the famous Article 17, which introduces a right to be forgotten, causes problems for systems based on permanent and tamper-proof data storage solutions, such as blockchain or immutable event stores.

GDPR in short #

GDPR aims to unite all data protection-related laws in the European community. Some other states, such as Japan and New Zealand, that are not members of the EU, but wish to improve the way they handle personal information, are also working on compatible regulations.

Without delving too much into the nitty-gritty of legal details, if a company wants to store and process any data that can be linked to an individual user, it’s required to get consent for all uses of the data. However, now those permissions need to be explicit and the conditions must be presented in a plain and perspicuous language. This finally puts a stop to pre-ticked checkboxes and pages upon pages of legalese babble.

Another change is that now the data subject can ask you to provide a copy of all the data you have on them. This right of access also requires that the information in question should be exported in both machine- and human-friendly format (such as JSON).

If the user withdraws consent, they can invoke the right to erasure (also known as the right to be forgotten), which is a demand to delete all their personal data. As we will see, this can cause some trouble with techniques based on immutability, i.e., databases that don’t support deletes or updates.

There are of course some exceptions to those rules, for example, if under the law of its parent country, the company is obliged to preserve some data, such as invoices and contracts.

GDPR and Event Sourcing #

In event-sourced systems, current state is calculated based on a log of previous actions. So for example, in banking, instead of changing both accounts’ balance after a money transfer, you would save a transaction to the DB, and then if needed, derive the current amount from all past operations.

Usually, those events are only persisted once and never modified afterward. Some storage mediums, like blockchain and WORM disks, don’t support any alteration of saved data at all.

Pseudonymous data #

How to reconcile the erasure-related requirements of GDPR-compliant systems with technical constraints associated with immutable persistence? Solution to this conundrum might lay in pseudonymisation.

The technique is a simple one. What turns any set of facts into personal data, is the presence of attributes that can be linked to a particular person. Examples include names, addresses (both email and physical), identification numbers and biometrics. If we can extract such info from our immutable store, and put it in an external DB that supports updates, we can readily conform to the requirements.

Before pseudonymisation

Safe pseudonymised data

Data between our two stores could be linked with an identifier called a pseudonym. It should be impossible to single out the data subject based on the pseudonym and non-sensitive data fields alone. Randomised numbers, such as GUIDs, can work well in this scenario.

The obvious downside is that now you have to link data from two separate stores. Also, if you were using blockchain, you lose some of the benefits coming from cryptographic security and peer-to-peer architecture. Instead, there is now a single and universally trusted centralised DB that contains all the sensitive information.

However, storing sensitive info on a globally-accessible blockchain was always a bad idea, since it exposes those data to the public. Pseudonymisation should always be considered best practice in systems making use of distributed ledgers.

Mutable events #

Of course, if the architecture in question isn’t based on an immutable store - for example, if you persist all your domain events in Mongo DB - you can simply alter past events, removing all the protected info.

However, if some business rules depend on personally-identifiable data, then you will lose replayability, a significant benefit usually associated with event sourcing. If you rebuild your current state from events after data removal, you will get different results than you had previously.

We advise this solution for systems that don’t usually handle big amounts of data, and where full replayability is not a crucial issue.

Conclusion #

While some may consider this dilemma a pick your poison situation, we believe that it’s an excellent opportunity to think about how we handle personal data. After all, our customers put an enormous deal of trust in our company by handing over their sensitive data for safekeeping. We cannot let them down with rash decisions regarding security measures and storage techniques. It might be a tough challenge to overcome, but it’s definitely worth the effort.

Now, let's talk about your project!

We don't have one standard offer.
Each project is unique, rest assured that we will approach the next one full of energy and engagement.