27-29 November, Vilnius

Conference about Big Data, High Load, Data Science, Machine Learning & AI

Conference is over. See you next year!


AbundanceAI, The Netherlands


Subhojit Banerjee is a data engineer/data scientist and currently CTO of AbundanceAI. Having worked in core distributed systems, high frequency finance and ecommerce. Subhojit loves solving problems with data @scale. He measures himself by the number of successful ML models that he pushes into production


Deploying Large Spark Models to Production and Model Scoring in Near Real Time

1. How does one build a pyspark model and deploy it in a scala pipeline with no code rewrite – Solving the greatest fights between datascientist who want to code in python and data engineers who like the tried and tested type safety of the JVM.
2. How does one beat the spark context latency to serve spark models in milliseconds to handle near realtime business needs
3. How does one build a ML model, zip it up and deploy it across platforms in a completely vendor neutral way i.e. build your model on AWS and deploy it on GCP or vice-versa.
4. How does one leverage the years of efforts spent in software engineering and use it directly in building datascience pipelines without reinventing the wheel and pain.
5. How does on build a completely GDPR compliant machine learning model with 0.88 on the ROC curve.