27-29 November, Vilnius

Conference about Big Data, High Load, Data Science, Machine Learning & AI

Conference is over. See you next year!

Gareth Rogers

Metail, UK


Gareth Rogers is a Data Engineer at Metail where he’s worked for 6 years. Over the last 4 years he’s been part of the team first building and then keeping Metail’s data analytics pipeline up-to-date and able to meet their changing demands. This has meant deciding where to keep up with a rapidly changing field and where to enjoy some stability. He came to Metail after graduating with a PhD in high energy physics based on the LHCb experiment at CERN. There he spent too much time working on the control system and monitoring software, but he still managed to code up and version control my analysis.


Putting the Spark in Functional Fashion Tech Analytics

Metail is a fashion tech startup whose goal is to reduce the cost and improve the efficiency of a retailer’s garment photograph process and to give consumers confidence in the clothes they buy online. By allowing customers to try clothes online on their body shape we’ve been able to collect a unique data set of customer cloth shopping habits. Metail’s analytics platform, now four years old, drives our data science products, and internal and external dashboards giving summarised view of key business metrics. The pipeline is based on the ideas in Nathan Marz’s lambda architecture (http://lambda-architecture.net/) and uses the Snowplow Analytics (https://snowplowanalytics.com/) pipeline as a foundation for our event tracking, collection and first pass processing. From the start, the pipeline was implemented in [Clojure](https://clojure.org/) using it to connect our pipeline stages and it’s big data libraries are the workhorse of our raw event processing and aggregation. This talk will show how Gareth and his team used Clojure to provide a solid platform to connect and manage our AWS hosted analytics pipeline and the pitfalls they encountered on the way. Gareth also talk about some of the difficulties they’re currently experiencing and how these are being resolved. He will cover our use of Spark jobs implemented in Clojure and how this feeds our Redshift cluster which is taking advantage of AWS’ new Redshift Spectrum technology to build an S3 based data lake.