It is a library for machine learning that represents a simple and efficient tool for predictive data analysis.
Scikit-learn is based on NumPy, SciPy, and matplotlib.
- Classification: determination of which category an object belongs to;
- Application: spam detection, image recognition;
- Regression: prediction of a continuous attribute related to an object;
- Application: predicting customer churn;
- Clustering: automatic grouping of similar objects into sets;
- Applications: customer segmentation, clustering of experimental results;
- Dimensionality reduction: reducing the number of random variables for analysis;
- Applications: identifying responses to medication use, visualisation and efficiency improvement;
- Model selection: comparing, validating, and selecting parameters and models;
- Applications: improving accuracy through parameter tuning;
- Preprocessing: Feature extraction and normalisation;
- Application: transforming input data (such as text) for applying machine learning algorithms.
Scikit-learn is mostly written in Python and makes extensive use of NumPy for high-performance linear algebra and array operations. Besides, some basic algorithms are written in Cython to improve performance. Support vector machines are implemented by the Cython wrapper around LIBSVM; logistic regression and linear support vector machines using a similar wrapper for LIBLINEAR.
In such cases, extending these methods with Python may not be possible.
Scikit-learn integrates well with many other Python libraries, such as Matplotlib and plotly for plotting, NumPy for vectorising arrays, Pandas data frames, SciPy and many others.