Course 1 - The Data Scientist’s Toolbox

Week 1

  • Overview of courses

Week 2

  • Introduction to Git and Github
  • Markdown
  • Rtools

Week 3

  • Big data
  • Experimental design

Course 2 - R Programming

Week 1

  • Data types
  • Reading in data
  • Subsetting

Week 2

  • Control structures
  • Functions
  • Scoping
  • Dates and times

Week 3

  • Loop functions
  • Debugging tools

Week 4

  • Simulation
  • R Profiler

Course 3 - Getting and Cleaning Data

Week 1

  • Components of tidy data
  • Reading Excel files
  • Reading XML
  • Reading JSON
  • The data.table package

Week 2

  • Reading from MySQL
  • Reading from HDF5
  • Reading from The Web
  • Reading from APIs
  • Reading from other sources

Week 3

  • Subsetting and sorting
  • Reshaping data
  • managing data with dplyr

Week 4

  • Regular expressions
  • Working with dates
  • Data resources

Course 4 - Exploratory Data Analysis

Week 1

  • Principles of analytic graphics
  • Exploratory graphs
  • Base plotting system

Week 2

  • Lattice plotting system
  • ggplot2

Week 3

  • Hierarchical clustering
  • K-means clustering
  • Dimension reduction
  • Working with colour in R plots

Week 4

  • Clustering case study
  • Air pollution case study

Course 5

Week 1

  • Concepts and ideas
  • Structuring a data analysis

Week 2

  • R Markdown
  • knitr

Week 3

  • RPubs
  • Reproducible research checklist
  • Evidence-based data analysis

Week 4

  • Caching computations
  • Case study: air pollution
  • Case study: high throughput biology
  • Commentaries on data analysis

Course 6 - Statistical Inference

Week 1

  • Probability
  • PMFs
  • PDFs
  • Baye’s rules
  • Expected values

Week 2

  • Variability
  • Variance simulation example
  • Standard error of the mean
  • Binomial distribution
  • Normal distribution
  • Poisson distribution
  • Asymptotics and LLN
  • Asymptotics and the CLT
  • Asymptotics and confidence intervals

Week 3

  • Confidence intervals
  • T tests
  • Hypothesis testing
  • P values

Week 4

  • Power
  • Multiple comparisons
  • Bootstrapping

Course 7 - Regression Models

Week 1

  • Regression
  • Least squares

Week 2

  • Linear regression
  • Residuals
  • Multivariable regression

Week 3

  • Multivariable regression
  • Residuals and diagnostics
  • Model selection

Week 4

  • Logistic regression
  • Poisson regression

Course 8 - Practical Machine Learning

Week 1

  • Prediction
  • Types of errors
  • Cross-validation

Week 2

  • Training
  • Preprocessing
  • Prediction with regression
  • Prediction with regression multiple covariates

Week 3

  • Prediction with trees
  • Bagging
  • Random forests
  • Boosting
  • Model based prediction

Week 4

  • Regularised regression
  • Combining predictors
  • Forecasting
  • Unsupervised prediction

Course 9 - Developing Data Products

Week 1

  • Shiny
  • Manipulate
  • rCharts
  • GoogleVis

Week 2

  • Writing a data report
  • Slidify
  • RStudio Presenter

Week 4

  • R Packages