Pandas based solutions

Pandas

Pandas is a software library written for the Python programming language for data manipulation and analysis.

The pandas data operation is built over the NumPy library, which is a lower-level tool. It offers data structures and operations for manipulating numerical tables and time series.

The name is derived from the “panel data”, an econometrics term for describing multidimensional structured datasets. It is released under the three-clause BSD license.

The main application field is to provide operation within the Python environment not only for data collection and cleansing, but also for data analysis and modelling tasks,without switching to more specific statistical processing languages (such as R and Octave).

The implementation of native categorical data types is supported. Pandas is mainly designed for cleansing and primary estimation of data using common metrics such as mean, quantiles and so on.

It is not a statistical package in the full sense: data sets of DataFrame and Series types are used as input in most data analysis and machine learning modules (SciPy, Scikit-Learn, etc.).

Site 

Digital platforms: Cross-platform software

Versions: Cloud/On-Premise 

Benefits

The library is optimised for high performance, with the most important parts of the code written in Cython and C.