Jakub Hava
22 Sep 2019
Big Data and Data Analytics

Enhancing Spark Pipeline API with H2O Models, Random Grid Search and Automatic Machine Learning using Sparkling Water

Learn more about how you can integrate large scale data preprocessing with Machine Learning using Sparkling Water. Sparkling Water enables training H2O-3 models leveraging Apache Spark clusters in a distributed manner. It also allows for using trained H2O-3 and Driverless AI models inside Apache Spark. We will demonstrate model training together with hyper-parameter tuning (Cartesian and Random GridSearch with time constraint) of various algorithms, using AutoML – training meta model combining different algorithms, hyper-parameter search and stacking (Ensemble method) all using Spark Pipeline API. We will also demonstrate how target encoding can be used with the Sparkling Water API.