# CURRICULUM

# courses

Data science is the intersection of engineering, analytics and business. Below is our teaching curriculum grouped by these three dimensions:

## ENGINEERING

- Data science tools – text editors, development environment setup
- Programming practices – test driven development, reproducibility, packaging
- Python – Pandas, NumPy, Scikit-Learn, Matplotlib
- SQL
- Using a Bash shell
- Git & GitHub
- Data visualisation – D3
- Deploying models with Flask and Docker
- Distributed machine learning with Spark

## ANALYTICS

- Probability & Statistics
- Foundations of Machine Learning
- Practical Machine Learning
- Working with Small Samples
- Backpropagation & Deep Learning
- Computer Vision with PyTorch
- Sequential Models with TensorFlow
- Natural Language Processing (NLP)
- Unsupervised Learning
- Interpreting Machine Learning models
- Reinforcement Learning

## BUSINESS

- Technical communication and presentation skills
- Interview question practice & preparation

# REquirements

## before the interview

There are no strict requirements on your level before the interview. Most participants have already taken their first steps learning Python or machine learning before the interview.

We recommend that anyone considering studying at Data Science Retreat to book an interview; we are happy to give advice on what you can study to get up to speed.

## before the bootcamP

Below we outline the required knowledge for our participants to explore before they study with us:

## Python

For Python, we expect students to be familiar with the following concepts outlined in the Python Tutorial:

- Variables, Strings, Floats, Integers (Section 3)
- Conditionals (Section 4.1 – 4.5)
- Functions (Section 4.6, 4.7.1, 4.7.2)
- Lists (Section 3.1.3, 5.1)
- Tuples, Sets, Dictionaries (Section 5.3 – 5.5)
- Reading & Writing Files (Section 7.2)

## Linear Algebra & Probability

For linear algebra, participants are expected to understand:

- the difference between a scalar, matrix & tensor
- element-wise matrix multiplication & dot products

For probability, we expect participants to be familiar with:

- independent, marginal and conditional probabilities
- expectation & variance
- the Bernoulli & Gaussian distributions

## machine learning

For machine learning, we expect students to have:

# ADDITIONAL RECommended RESOURCES

One of the most beautiful features of data science is access to resources to learn from; below are some of our favourites.

## PROGRAMMING

- Think Python as an introductory textbook
- Fluent Python as a more advanced look at the language
- Practical coding challenges we recommend HackerRank
- The DataCamp Introduction to Shell for Data Sciencefor an excellent introduction to the Bash shell

## DATA SCIENCE

- Kaggle offers access to many interesting datasets, along with communities that share their work
- Python for Data Analysis
- Automate the Boring Stuff with Python
- Agile Data Science is recommended as a more advanced textbook that covers application development using a distinct software engineering philosophy

## MACHINE LEARNING

- Andrew Ng’s Stanford Machine Learning is a classic course that is somewhat of a rite of passage for machine learning practitioners
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow textbook
- Andrew Ng’s Machine Learning Yearning is a useful resource for getting insight into the practicalities of model training
- Elements of Statistical Learning and Pattern Recognition and Machine Learning for classic machine learning textbooks
- Deep Learning for neural networks