CatBoost is an open source software library developed by Yandex that implements a unique proprietary algorithm for building machine learning models using one of the original gradient boosting schemes. The main API for working with the library is implemented for the Python language, and there is also an implementation for the R programming language.

On July 18, 2017 Yandex uploaded the library with the CatBoost algorithm to the open access with the open Apache 2.0 license, which is a continuation and development of the Yandex project – Matrixnet.

Clients: European Center for Nuclear Research (CERN) , Yandesk Internet services: Zen, weather, recommendation feed.


Digital platforms: Cross-platform software

Versions: Cloud/On-Premise 

Use cases

  • Enhancing results of the Yandex search engine Ranking of personal recommendation feeds, e.g. in Yandex.Zen, weather forecast calculations and other Yandex internet services
  • Solutions for manufacturing industry Optimisation of raw material consumption and prediction of defects in production.
  • European Centre for Nuclear Research (CERN) In research at the Large Hadron Collider (LHC) to combine information from different sections of the LHC detector into the most accurate, aggregated knowledge of the particle. By using CatBoost to combine the data, scientists have been able to improve the quality of the final solution. CatBoost results have been better than those obtained using other methods.


Comparing CatBoost to similar machine learning systems from Google (TensorFlow) and Microsoft (LightGBM), Google’s TensorFlow solves a different class of problems by efficiently analyzing homogenous data such as images. And “CatBoost works with data of a different nature and can be used in conjunction with TensorFlow and other machine learning algorithms, depending on the task at hand. The Russian solution beats Microsoft’s LightGBM in terms of quality, as demonstrated by the test table with common machine-learning comparisons.