NumPy, SciPy, Pandas, Scikit-learn

Numpy and Scipy took python from a general programming language to a very powerful matrix-oriented one. Pandas brought data.frames to python. Data.frames are one of the core concepts in modern data analysis. Building on top of these data structures, Scikit-learn brought killer implementations of best-of-breed algorithms, all under a standardized library. Nowadays, python is the programming language of choice of data scientists.

Preprocessing with Pandas

  • Reading data
  • Selecting columns and rows
  • Filtering
  • Vectorized string operations
  • Missing values
  • Handling time
  • Time series

NumPy, SciPy

  • Arrays
  • Indexing, Slicing, and Iterating
  • Reshaping
  • Shallow vs deep copy
  • Broadcasting
  • Indexing (advanced)
  • Matrices
  • Matrix decompositions


  • Feature extraction
  • Classification
  • Regression
  • Clustering
  • Dimension reduction
  • Model selection