It is a library for machine learning that represents a simple and efficient tool for predictive data analysis. 

Scikit-learn  is based on NumPy, SciPy, and matplotlib.



Digital platforms: Cross-platform software

Versions: Cloud/On-Premise 

Use cases


  • Classification: determination of which category an object belongs to;
  • Application: spam detection, image recognition;
  • Regression: prediction of a continuous attribute related to an object;
  • Application: predicting customer churn;
  • Clustering: automatic grouping of similar objects into sets;
  • Applications: customer segmentation, clustering of experimental results;
  • Dimensionality reduction: reducing the number of random variables for analysis;
  • Applications: identifying responses to medication use, visualisation and efficiency improvement; 
  • Model selection: comparing, validating, and selecting parameters and models;
  • Applications: improving accuracy through parameter tuning;
  • Preprocessing: Feature extraction and normalisation;
  • Application: transforming input data (such as text) for applying machine learning algorithms.


Scikit-learn is mostly written in Python and makes extensive use of NumPy for high-performance linear algebra and array operations. Besides, some basic algorithms are written in Cython to improve performance. Support vector machines are implemented by the Cython wrapper around LIBSVM; logistic regression and linear support vector machines using a similar wrapper for LIBLINEAR.

In such cases, extending these methods with Python may not be possible.

Scikit-learn integrates well with many other Python libraries, such as Matplotlib and plotly for plotting, NumPy for vectorising arrays, Pandas data frames, SciPy and many others.