Data governance - anonymization of PII

Conversation from Confluent Community Slack. Copied here to make it available to all.

Philip Schmitt @philipschmitt
Governance / PII
Maybe someone can point me in the right direction:
Have there been any notable Summit talks, blog posts, or articles around data governance/anonymization/PII?

We want to ensure (somehow) that not all developers can extract customer data from production Kafka topics.

Neil Buesing @nbuesing :
I have not seen this one myself, but did cover the topic — however, I don’t think most talks go to your point of developer restriction, but build into the use of Kafka as a whole.

Trying to build a system where there is different levels of access to production for developers could be quite challenging and a unique ask to this common problem. Typically I would see a level of replication in place to copy data from topics to another environment and “redact” the data.

  1. Create ALCs to prevent developers who have access to prod to be able to read from the topics with PII data
  2. Copy that data (ideally to non-prod) through a replication process that redacts the PII data.

In any case, I hope that video can provide some ideas. Field-level encryption gets you want you want, but complicates and all existing pieces of your infrastructure have to be on board for that to happen, so not easy to add into a system that is already in place.

Philip Schmitt:
Thanks, @nbuesing