Schedule
3:30 pm
Streamlining Data Science Workflows with a Feature Catalog
With the democratization of data via data lakes data science teams increasingly rely on custom model pipelines for data preprocessing and feature engineering. As a result it becomes difficult to reuse features or even compare similar features across different teams. This can be a significant challenge, as it can lead to duplicative work and ambiguous definitions that cause confusion and a risk on wrong conclusions. For example, two data science teams that both communicate an average click-rate to the marketing team, where one team excludes clicks made by robots and the other doesn’t. Without being aware of this different interpretation the marketing team can make some awfully wrong decisions.
Host

Roel Bertens
Data Scientist
Xebia