BIG DATA CONFERENCE
CEO and Co-Founder of Data Lake Management platform
Einat Orr is the CEO and Co-founder of Treeverse, the company behind lakeFS, an open source project that empowers data lakes with ACID guarantees. She received her PhD. in Mathematics from Tel Aviv University, in the field of optimization in graph theory. Einat previously led several engineering organizations, most recently as CTO at SimilarWeb.
Data Versioning – What Does it Mean?
The demand for better versioning of data is growing. There are a plethora of open source projects providing “data versioning”, “Git for data” and “manage data like code” capabilities (e.g Hudi, DoltHub,, Delta Lake, DVC, Pachyderm, and lakeFS). So how do you know you are choosing the right one?
In this talk we will go over the difference between these solutions by clustering them according to 4 main use cases:
1. Collaboration over data: enabling teams to collaborate over data over time, while contributing to the data evolution.
2. Managing ML pipelines: allowing pipeline management of ML projects, from model creation to production.
3. The need for mutability: data formats that grant Insert, Update and delete over an immutable object storage.
4. The need for ACID guarantees over an object storage data lake: using branching logic to manage an object storage based data lake.
By the end of the talk, you should have a good understanding of how these solutions compare and which you should choose for different types of use cases.