- Preprocessing and Exploratory Data Analysis
- What was used and how?
- R, dplyr, caret, pls, randomForest, glmnet, ggplot2
- Results
- Comparison of MSPE for Test Sets
- Identify notable features found
June 5, 2018
| Feature(F) | Issue |
|---|---|
| F2, F84, F85, | Same info (False) or NA's |
| F167 (Dates) | Uncertainty about coding of dates |
| F151-153, F155-F164 | Almost all 0's |
| F180, F181 | Obs has same info except F181 and Target2 |
| Algorithm | How they Work |
|---|---|
| Partial Least Squreas Regression | Construct \(\textit{M}\) principal components, linear combinations of the variables, from \(\textit{p}\) variables where \(\textit{M}\) \(<<\)\(\textit{p}\)Use Least squares regression on components. |
| Lasso/Elastic Net | \(\hat{\beta} = \underset{\beta}{\operatorname{argmin}}(\| y-X \beta \|^2+\lambda_2 \|\beta\|^2 + \lambda_1 \|\beta\|_1)\) |
| Random Forest | Builds multiple decision trees and merges together to get stable predictions. |