With the rise of data-intensive applications, privacy and personal integrity has become a focus topic. Although companies may have incentive to collect all available data forever, privacy regulations act counter balance. Regulations limit the data that may be stored, for how long it may be stored, how access is given, and give users rights to have their data deleted and get information about the data stored by companies. The regulations put constraints on technical solutions, and makes it challenging to architect and implement systems that allow engineers to efficiently make beneficial use of sensitive data. Unfortunately, failures to properly protect privacy can be very expensive, since the work required to rework core data models and wash tainted data can be massive.
This talk provides an engineering perspective on privacy protection. The intended audience is architects, developers, data scientists, and engineering managers that build applications handling user data. We highlight topics that require attention at an early design stage, and go through pitfalls and potentially expensive architectural mistakes. We describe a number of technical patterns for complying with privacy regulations without sacrificing the ability to use data for product features. The content of the talk is based on real world experience from handling privacy protection in large scale data processing environments.