The speed, quality, ease-of-use, and model-deployment for the various cutting edge Supervised and Unsupervised algorithms like Deep Learning, Tree Ensembles, and GLRM make H2O a highly sought after API for big data data science.

H2O is a distributed, fast and scalable open-source machine learning and predictive analytics platform in RAM that enables the creation of machine learning models based on big data and enables easy production implementation of these models in enterprise environments.

H2O is under the Apache Licence 2.0.


Digital platformsWindows 7 or later OS X 10.9 or later Ubuntu 12.04 RHEL/CentOS 6 or later.

Versions: Cloud/On-Premise 


H2O’s core code is written in Java. Inside H2O, a Distributed Key/Value store is used to access and reference data, models, objects, etc., across all nodes and machines. 

The algorithms are implemented on top of H2O’s distributed Map/Reduce framework and utilise the Java Fork/Join framework for multi-threading. The data is read in parallel and is distributed across the cluster and stored in memory in a columnar format in a compressed way. H2O’s data parser has built-in intelligence to guess the schema of the incoming dataset and supports data ingested from multiple sources in various formats.

H2O’s REST API allows access to all the capabilities of H2O from an external program or script via JSON over HTTP. The Rest API is used by H2O’s web interface (Flow UI), R binding (H2O-R), and Python binding (H2O-Python).