June 5, 2018

Outline

  1. Preprocessing and Exploratory Data Analysis
  2. What was used and how?
    • R, dplyr, caret, pls, randomForest, glmnet, ggplot2
  3. Results
    • Comparison of MSPE for Test Sets
    • Identify notable features found

Preprocessing

  • 13741 observations with 182 Features and 2 Responses
Feature(F) Issue
F2, F84, F85, Same info (False) or NA's
F167 (Dates) Uncertainty about coding of dates
F151-153, F155-F164 Almost all 0's
F180, F181 Obs has same info except F181 and Target2
  • Eliminating F180 removes ~3,000 observations
  • Remove ~90 actual duplicates and ~300 obs with Na's
  • Final Dataset is 10203 x 165 with 2 responses`

Exploratory Data Analysis

2. Machine Learning Algorithms

Algorithm How they Work
Partial Least Squreas Regression Construct \(\textit{M}\) principal components, linear combinations of the variables, from \(\textit{p}\) variables where \(\textit{M}\) \(<<\)\(\textit{p}\)Use Least squares regression on components.
Lasso/Elastic Net \(\hat{\beta} = \underset{\beta}{\operatorname{argmin}}(\| y-X \beta \|^2+\lambda_2 \|\beta\|^2 + \lambda_1 \|\beta\|_1)\)
Random Forest Builds multiple decision trees and merges together to get stable predictions.

Implentation

  • Before implementing each algorithm, the data set is randomly divided into a 70 percent Train set and a 30 percent Test set.
  • The package \(\textit{caret}\) randomly selects observations for the Train and Test sets.
  • Each algorithm, when run on the Train set, uses 10-fold cross validation to determine the best model for using on the Test set.
  • Mean Squared Prediction Error (MSPE) on the Test for Target1 only. Target2 produced high MSPE by itself and did not improve when using the algorithms as multi-responses.

3. Results

  • Partial Least Squares Regression:
    • 10, 20,25,30,40,50,60 components
    • MSPE: 39.16,6.38,3.21,2.19,1.27,0.93,0.67
  • Lasso,Elastic Net:
    • MSPE: 8.85 with alpha as 0.4
    • 3,4,6,11,12,16,17,31,82,91,93
    • 94,138,168,170,173,174,175
  • Random Forest:
    • MSPE: 24.13
    • 3,6,7,8,9,11,12,16,17,18,19,21,22,23,86,94
    • 143,145,146,150,168,169,170,174,175,177
    • 95,101,138,139