April 28, 2016

Plan for Today

  • A Broad overview of the Model Building Process
    • Train-test split
    • Preprocessing
  • Explore Model Validations with Cross-Validation Methods
    • Hyper parameter tuning
    • Alliviate Over-fitting

Quick Overview of Caret

  • Creates a unified interface for modeling and prediction
    • A total of over 200 Models and Algorithms
  • Pipeline ready modeling
  • Added Parallel computing efficiency

The Model Building Process

  • Gain domain knowledge (if possible)
    • Gain intuition of the problem or data
  • Estimate model parameters - learned parameters
    • In linear regression, intercept and slope(s)
  • Determine tuning parameters, hyperparameters, structural parameters
    • In KNN, estimating the K is not data dependent
  • Calculate the performance of the model, and how it will generalize to new data
  • Repeat

Basic Caret Syntax

Caret implements a unified interface (syntax) for all of R's most popular models. The basic syntax is as follows:

model <- train(formula = y ~ x, 
              data = dataset, method = 'lm'
              trControl = 'additional options')

Initial Visualizations I

Initial Visualizations II

Add smoother to scatterplots