As software engineers, we take pride in our code quality. As data scientists, in the quality of our models and analyses. As a data engineer, I take pride in the quality of the datasets I provide access to.

Everyone in IT works with data one way or another, be it producing, managing or using it. Yet like water for fish, we often fail to notice data because it is all around us. And, just like fish in the water suffer bad water quality, we suffer if our data quality decreases. Unlike the fish in the water though, we can actually all contribute to addressing data quality issues.

In my talk, I want to encourage you to become more aware of data quality concerns. I will discuss what data quality is, how we can identify data quality issues and some strategies for addressing them. As a practical example, I will share our experiences with monitoring data quality using Amazon Research's deequ framework.


Berlin Buzzwords
09.06.2020 20:10 – 20:40