Data Science Team & Tech Lead

Tag: Data Versioning

  • Data versioning

    Data versioning

    “Data versioning is like flossing. Everyone agrees it’s a good thing to do, but few do it.” ~ Chip Huyen, Designing Machine Learning Systems

    Unlike code versioning, it is a lot more difficult to implement data versioning in data science / machine learning projects.

    It is because of the following reasons:

    ➡️ Data is often larger than codes.

    ➡️ Varying definitions of what constitutes a difference between two data versions and how to resolve merge conflicts.

    ➡️ Regulations on data protection and privacy make keeping historical data difficult.

    Do you floss… erm version your data often?