Hybrid Edition

September 28-30

Vilnius and Online

Sonal Goyal


India, Nube Technologies


Sonal is the founder at, building AI powered data mastering and integration products. She was on the Program Committee for OReilly Strata Data and AI conferences and a repeat speaker at major data and AI events like Spark Summit, Strata, GIDs etc. At Nube, she applies ML to build distributed data pipelines for integrating data silos.


Scalable ML Pipelines for Enterprise Data Mastering

Enteprise data is often siloed and stored in a variety of application databases, ERPs, CRMs, flat files etc. Integrating the data silos to build a clean and consistent view of customers, suppliers, products and parts is critical for analytics, compliance, risk, operations, customer experience and personalization. This is easier than said. The sheer number of attributes, data systems, formats, entities and variations makes this a daunting task. Each source system generally has a different schema representation of the entity. Even after aligning the schemas, there are variations like missing fields, typos, errors, abbreviations etc. Mastering this to achieve a 360 view is important but tough.

Rule based systems fail to scale to enterprise data volumes and variety. In this talk, Sonal will cover some approaches for scaling data mastering within an enteprise using AI and distributed technologies like Spark. She will discuss
a) The need for data mastering
b) The challenges in data mastering with rule based approaches
c) Leveraging AI to master data at scale

Session Keywords

🔑 Master Data Management
🔑 Entity resolution
🔑 Schema Mapping
🔑 ML

« Back