Hands-on Lab on Sparkling Water
Sparkling Water provides access to H2O algorithms and publishes API to integrate them as part of regular Spark pipelines. This feature allows for seamless training and deployment of H2O algorithms in the Spark environment. Furthermore, trained pipelines do not require H2O runtime anymore (thanks to MOJO representation of trained H2O models) which enables variety of deployment scenarios. Moreover, by supporting Python and Scala environment, we enable a simple transfer of modeling results between data scientists (Python land) and production (JVM land). The goal of this hands-on is to show integration of H2O models into Spark pipelines using PySpark and PySparkling and demonstrate deployment of the trained pipeline in the context of JVM and Spark streaming.