Schedule
11:30 am
Return to Data’s Inferno: Are the 7 Layers of Data Testing Hell Still Relevant?
Back in 2018, a blogpost titled "Data's Inferno: 7 circles of data testing hell with Airflow" presented a layered approach to data quality checks in data applications and pipelines. Now, 5 years later, this talk looks back at Data's Inferno and surveys what has changed but also what hasn't in the space of ensuring high data quality.
5 years ago a blog post called "Data's Inferno" (https://medium.com/wbaa/datas-inferno-7-circles-of-data-testing-hell-with-airflow-cef4adff58d8) was written about how to ensure high data quality with Apache Airflow. It suggested using different types of tests as layers to catch issues lurking within the data. These layers included tests for Airflow DAG integrity, mock data pipelines, production data tests, and more. Combining these layers made for a reliable way to filter out incorrect data. Despite the blogpost's age, the ideas are still relevant today. New tools and applications have been developed to help improve data quality as well as new best practices. In this talk, we'll review the layers of Data's Inferno and how they contributed to improving data quality. We'll also look at how new tools address the same concerns. Finally, we'll discuss how we expect and hope the data quality landscape to evolve in the future.
Host
Manolis Manousogiannis
Data Engineer
Xebia | Data
Guests
Daniel van der Ende
Data Engineer
Xebia | Data