BIG DATA CONFERENCE
Best Practices for ETL with Apache NiFi on Kubernetes
During the talk, there are described all details about migrating pipelines from the old Hadoop platform to the Kubernetes, managing everything as the code, monitoring all corner cases of NiFi and making it a robust solution that is user-friendly even for non-programmers.
The State of MLOps – Machine Learning in Production at Enterprise Scale
In this session, we’ll explore this relatively new subject. Bas will explain the need for MLOps (and AIOps and ModelOps which are related), dive into the tools and techniques, and give some examples of real-world solutions.
Choosing the Right Abstraction Level for Your Kafka Project
What kind of operations need to be applied to the events? Do we need to interact with external systems? In this presentation, he will go through several scenarios and cases to highlight the key factors that should be considered when deciding which API should be used for a given project.
Keyword Search is Dead! And so are Solr and Elasticsearch?
How can AI combined with Vector Similarity Search efficiently deliver more relevant search results than conventional methods?
For which cases is there an economic gain from their application?
To answer these and other questions, he will provide an overview of the current state and an outlook into the future possibilities of new technologies and reveal how search applications can get a boost with the help of AI.
NLP & Machine Learning Applied to the Analysis of Advertising Data and User Behaviour on the Website for Marketing Purposes
In this speech, we will understand how to use the events collected by behavioural tracking tools on websites in order to create a performant audience for advertising and marketing campaigns.
A Code-Driven Introduction to Reinforcement Learning
This presentation investigates the state of the art in the cyber-security space, specifically focussing on how reinforcement learning is helping beat the hackers.
Rethinking Ingestion: CI/CD for Data Lakes
What they propose and will cover in this talk, is a new strategy for data lake ingestion. One where new data can be added in isolation then tested and validated, before “going live” in a production table. Finally, they will show how git-for-data tools like lakeFS and Nessie enable this ingestion paradigm in a seamless way.
Data Observability is a growing area in data engineering. In this session, he will explain to an audience of data engineers what data observability means in both development and operational processes.
Management of a Cloud Data Lake in Practice: How to Manage 1000s of ETLs Using Apache Spark
The talk will outline the business reasoning, key design principles as well as technical solution. Expect some (but not too much) nerdy details related to Apache Spark implementation.
Machine Learning Security
Many companies would like to introduce machine learning models, but fail to see the potential security issues. In the presentation, he will show recent security issues related to machine learning models, such as adversarial attacks.