Pratical Machine Learning-Final Project

1-Project Introduction

1.1-Background

Using devices such as Jawbone Up, Nike FuelBand, and Fitbit it is now possible to collect a large amount of data about personal activity relatively inexpensively. These type of devices are part of the quantified self movement – a group of enthusiasts who take measurements about themselves regularly to improve their health, to find patterns in their behavior, or because they are tech geeks. One thing that people regularly do is quantify how much of a particular activity they do, but they rarely quantify how well they do it. In this project, your goal will be to use data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants. They were asked to perform barbell lifts correctly and incorrectly in 5 different ways. More information is available from the website here: http://groupware.les.inf.puc-rio.br/har (see the section on the Weight Lifting Exercise Dataset).

1.2-Data Source

The training data for this project are available here:

https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv

The test data are available here:

https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv

The data for this project come from this source: http://groupware.les.inf.puc-rio.br/har. If you use the document you create for this class for any purpose please cite them as they have been very generous in allowing their data to be used for this kind of assignment.

1.3-Goal of Project

The goal of this course project is to predict the manner in which they did the exercise. This is the “classe” variable in the training set. You may use any of the other variables to predict with.

You should create a report describing how you built your model, how you used cross validation, what you think the expected out of sample error is, and rationalize why you made the choices you did. You will also use your prediction model to predict 20 different test cases.

# Load required packages
library(caret)

## Loading required package: lattice

## Loading required package: ggplot2

library(rpart)
library(rpart.plot)
library(rattle)

## Rattle: A free graphical interface for data mining with R.
## Version 4.1.0 Copyright (c) 2006-2015 Togaware Pty Ltd.
## Type 'rattle()' to shake, rattle, and roll your data.

library(RColorBrewer)
library(randomForest)

## randomForest 4.6-12

## Type rfNews() to see new features/changes/bug fixes.

## 
## Attaching package: 'randomForest'

## The following object is masked from 'package:ggplot2':
## 
##     margin

library(knitr)

2-Getting and cleaning data

TrainUrl <- "https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv"
TestUrl <- "https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv"

# 2.1-Load and read data
TrainData <- read.csv(url(TrainUrl), na.strings = 'NA')
TestData <- read.csv(url(TestUrl), na.strings = 'NA')

# 2.2-Create a Data partition with the Trainset (only the training data)
inTrain <- createDataPartition(TrainData$classe, p = 0.7, list = FALSE)
TrainSet <- TrainData[inTrain, ]
TestSet  <- TrainData[-inTrain, ] # don't put wrongly TestData
dim(TrainSet)

## [1] 13737   160

dim(TestSet)

## [1] 5885  160

Cleaning data

# Viewing training data set 
names(TrainSet)

##   [1] "X"                        "user_name"               
##   [3] "raw_timestamp_part_1"     "raw_timestamp_part_2"    
##   [5] "cvtd_timestamp"           "new_window"              
##   [7] "num_window"               "roll_belt"               
##   [9] "pitch_belt"               "yaw_belt"                
##  [11] "total_accel_belt"         "kurtosis_roll_belt"      
##  [13] "kurtosis_picth_belt"      "kurtosis_yaw_belt"       
##  [15] "skewness_roll_belt"       "skewness_roll_belt.1"    
##  [17] "skewness_yaw_belt"        "max_roll_belt"           
##  [19] "max_picth_belt"           "max_yaw_belt"            
##  [21] "min_roll_belt"            "min_pitch_belt"          
##  [23] "min_yaw_belt"             "amplitude_roll_belt"     
##  [25] "amplitude_pitch_belt"     "amplitude_yaw_belt"      
##  [27] "var_total_accel_belt"     "avg_roll_belt"           
##  [29] "stddev_roll_belt"         "var_roll_belt"           
##  [31] "avg_pitch_belt"           "stddev_pitch_belt"       
##  [33] "var_pitch_belt"           "avg_yaw_belt"            
##  [35] "stddev_yaw_belt"          "var_yaw_belt"            
##  [37] "gyros_belt_x"             "gyros_belt_y"            
##  [39] "gyros_belt_z"             "accel_belt_x"            
##  [41] "accel_belt_y"             "accel_belt_z"            
##  [43] "magnet_belt_x"            "magnet_belt_y"           
##  [45] "magnet_belt_z"            "roll_arm"                
##  [47] "pitch_arm"                "yaw_arm"                 
##  [49] "total_accel_arm"          "var_accel_arm"           
##  [51] "avg_roll_arm"             "stddev_roll_arm"         
##  [53] "var_roll_arm"             "avg_pitch_arm"           
##  [55] "stddev_pitch_arm"         "var_pitch_arm"           
##  [57] "avg_yaw_arm"              "stddev_yaw_arm"          
##  [59] "var_yaw_arm"              "gyros_arm_x"             
##  [61] "gyros_arm_y"              "gyros_arm_z"             
##  [63] "accel_arm_x"              "accel_arm_y"             
##  [65] "accel_arm_z"              "magnet_arm_x"            
##  [67] "magnet_arm_y"             "magnet_arm_z"            
##  [69] "kurtosis_roll_arm"        "kurtosis_picth_arm"      
##  [71] "kurtosis_yaw_arm"         "skewness_roll_arm"       
##  [73] "skewness_pitch_arm"       "skewness_yaw_arm"        
##  [75] "max_roll_arm"             "max_picth_arm"           
##  [77] "max_yaw_arm"              "min_roll_arm"            
##  [79] "min_pitch_arm"            "min_yaw_arm"             
##  [81] "amplitude_roll_arm"       "amplitude_pitch_arm"     
##  [83] "amplitude_yaw_arm"        "roll_dumbbell"           
##  [85] "pitch_dumbbell"           "yaw_dumbbell"            
##  [87] "kurtosis_roll_dumbbell"   "kurtosis_picth_dumbbell" 
##  [89] "kurtosis_yaw_dumbbell"    "skewness_roll_dumbbell"  
##  [91] "skewness_pitch_dumbbell"  "skewness_yaw_dumbbell"   
##  [93] "max_roll_dumbbell"        "max_picth_dumbbell"      
##  [95] "max_yaw_dumbbell"         "min_roll_dumbbell"       
##  [97] "min_pitch_dumbbell"       "min_yaw_dumbbell"        
##  [99] "amplitude_roll_dumbbell"  "amplitude_pitch_dumbbell"
## [101] "amplitude_yaw_dumbbell"   "total_accel_dumbbell"    
## [103] "var_accel_dumbbell"       "avg_roll_dumbbell"       
## [105] "stddev_roll_dumbbell"     "var_roll_dumbbell"       
## [107] "avg_pitch_dumbbell"       "stddev_pitch_dumbbell"   
## [109] "var_pitch_dumbbell"       "avg_yaw_dumbbell"        
## [111] "stddev_yaw_dumbbell"      "var_yaw_dumbbell"        
## [113] "gyros_dumbbell_x"         "gyros_dumbbell_y"        
## [115] "gyros_dumbbell_z"         "accel_dumbbell_x"        
## [117] "accel_dumbbell_y"         "accel_dumbbell_z"        
## [119] "magnet_dumbbell_x"        "magnet_dumbbell_y"       
## [121] "magnet_dumbbell_z"        "roll_forearm"            
## [123] "pitch_forearm"            "yaw_forearm"             
## [125] "kurtosis_roll_forearm"    "kurtosis_picth_forearm"  
## [127] "kurtosis_yaw_forearm"     "skewness_roll_forearm"   
## [129] "skewness_pitch_forearm"   "skewness_yaw_forearm"    
## [131] "max_roll_forearm"         "max_picth_forearm"       
## [133] "max_yaw_forearm"          "min_roll_forearm"        
## [135] "min_pitch_forearm"        "min_yaw_forearm"         
## [137] "amplitude_roll_forearm"   "amplitude_pitch_forearm" 
## [139] "amplitude_yaw_forearm"    "total_accel_forearm"     
## [141] "var_accel_forearm"        "avg_roll_forearm"        
## [143] "stddev_roll_forearm"      "var_roll_forearm"        
## [145] "avg_pitch_forearm"        "stddev_pitch_forearm"    
## [147] "var_pitch_forearm"        "avg_yaw_forearm"         
## [149] "stddev_yaw_forearm"       "var_yaw_forearm"         
## [151] "gyros_forearm_x"          "gyros_forearm_y"         
## [153] "gyros_forearm_z"          "accel_forearm_x"         
## [155] "accel_forearm_y"          "accel_forearm_z"         
## [157] "magnet_forearm_x"         "magnet_forearm_y"        
## [159] "magnet_forearm_z"         "classe"

head(TrainSet)

##     X user_name raw_timestamp_part_1 raw_timestamp_part_2   cvtd_timestamp
## 1   1  carlitos           1323084231               788290 05/12/2011 11:23
## 2   2  carlitos           1323084231               808298 05/12/2011 11:23
## 3   3  carlitos           1323084231               820366 05/12/2011 11:23
## 4   4  carlitos           1323084232               120339 05/12/2011 11:23
## 7   7  carlitos           1323084232               368296 05/12/2011 11:23
## 11 11  carlitos           1323084232               500302 05/12/2011 11:23
##    new_window num_window roll_belt pitch_belt yaw_belt total_accel_belt
## 1          no         11      1.41       8.07    -94.4                3
## 2          no         11      1.41       8.07    -94.4                3
## 3          no         11      1.42       8.07    -94.4                3
## 4          no         12      1.48       8.05    -94.4                3
## 7          no         12      1.42       8.09    -94.4                3
## 11         no         12      1.45       8.18    -94.4                3
##    kurtosis_roll_belt kurtosis_picth_belt kurtosis_yaw_belt
## 1                                                          
## 2                                                          
## 3                                                          
## 4                                                          
## 7                                                          
## 11                                                         
##    skewness_roll_belt skewness_roll_belt.1 skewness_yaw_belt max_roll_belt
## 1                                                                       NA
## 2                                                                       NA
## 3                                                                       NA
## 4                                                                       NA
## 7                                                                       NA
## 11                                                                      NA
##    max_picth_belt max_yaw_belt min_roll_belt min_pitch_belt min_yaw_belt
## 1              NA                         NA             NA             
## 2              NA                         NA             NA             
## 3              NA                         NA             NA             
## 4              NA                         NA             NA             
## 7              NA                         NA             NA             
## 11             NA                         NA             NA             
##    amplitude_roll_belt amplitude_pitch_belt amplitude_yaw_belt
## 1                   NA                   NA                   
## 2                   NA                   NA                   
## 3                   NA                   NA                   
## 4                   NA                   NA                   
## 7                   NA                   NA                   
## 11                  NA                   NA                   
##    var_total_accel_belt avg_roll_belt stddev_roll_belt var_roll_belt
## 1                    NA            NA               NA            NA
## 2                    NA            NA               NA            NA
## 3                    NA            NA               NA            NA
## 4                    NA            NA               NA            NA
## 7                    NA            NA               NA            NA
## 11                   NA            NA               NA            NA
##    avg_pitch_belt stddev_pitch_belt var_pitch_belt avg_yaw_belt
## 1              NA                NA             NA           NA
## 2              NA                NA             NA           NA
## 3              NA                NA             NA           NA
## 4              NA                NA             NA           NA
## 7              NA                NA             NA           NA
## 11             NA                NA             NA           NA
##    stddev_yaw_belt var_yaw_belt gyros_belt_x gyros_belt_y gyros_belt_z
## 1               NA           NA         0.00            0        -0.02
## 2               NA           NA         0.02            0        -0.02
## 3               NA           NA         0.00            0        -0.02
## 4               NA           NA         0.02            0        -0.03
## 7               NA           NA         0.02            0        -0.02
## 11              NA           NA         0.03            0        -0.02
##    accel_belt_x accel_belt_y accel_belt_z magnet_belt_x magnet_belt_y
## 1           -21            4           22            -3           599
## 2           -22            4           22            -7           608
## 3           -20            5           23            -2           600
## 4           -22            3           21            -6           604
## 7           -22            3           21            -4           599
## 11          -21            2           23            -5           596
##    magnet_belt_z roll_arm pitch_arm yaw_arm total_accel_arm var_accel_arm
## 1           -313     -128      22.5    -161              34            NA
## 2           -311     -128      22.5    -161              34            NA
## 3           -305     -128      22.5    -161              34            NA
## 4           -310     -128      22.1    -161              34            NA
## 7           -311     -128      21.9    -161              34            NA
## 11          -317     -128      21.5    -161              34            NA
##    avg_roll_arm stddev_roll_arm var_roll_arm avg_pitch_arm
## 1            NA              NA           NA            NA
## 2            NA              NA           NA            NA
## 3            NA              NA           NA            NA
## 4            NA              NA           NA            NA
## 7            NA              NA           NA            NA
## 11           NA              NA           NA            NA
##    stddev_pitch_arm var_pitch_arm avg_yaw_arm stddev_yaw_arm var_yaw_arm
## 1                NA            NA          NA             NA          NA
## 2                NA            NA          NA             NA          NA
## 3                NA            NA          NA             NA          NA
## 4                NA            NA          NA             NA          NA
## 7                NA            NA          NA             NA          NA
## 11               NA            NA          NA             NA          NA
##    gyros_arm_x gyros_arm_y gyros_arm_z accel_arm_x accel_arm_y accel_arm_z
## 1         0.00        0.00       -0.02        -288         109        -123
## 2         0.02       -0.02       -0.02        -290         110        -125
## 3         0.02       -0.02       -0.02        -289         110        -126
## 4         0.02       -0.03        0.02        -289         111        -123
## 7         0.00       -0.03        0.00        -289         111        -125
## 11        0.02       -0.03        0.00        -290         110        -123
##    magnet_arm_x magnet_arm_y magnet_arm_z kurtosis_roll_arm
## 1          -368          337          516                  
## 2          -369          337          513                  
## 3          -368          344          513                  
## 4          -372          344          512                  
## 7          -373          336          509                  
## 11         -366          339          509                  
##    kurtosis_picth_arm kurtosis_yaw_arm skewness_roll_arm
## 1                                                       
## 2                                                       
## 3                                                       
## 4                                                       
## 7                                                       
## 11                                                      
##    skewness_pitch_arm skewness_yaw_arm max_roll_arm max_picth_arm
## 1                                                NA            NA
## 2                                                NA            NA
## 3                                                NA            NA
## 4                                                NA            NA
## 7                                                NA            NA
## 11                                               NA            NA
##    max_yaw_arm min_roll_arm min_pitch_arm min_yaw_arm amplitude_roll_arm
## 1           NA           NA            NA          NA                 NA
## 2           NA           NA            NA          NA                 NA
## 3           NA           NA            NA          NA                 NA
## 4           NA           NA            NA          NA                 NA
## 7           NA           NA            NA          NA                 NA
## 11          NA           NA            NA          NA                 NA
##    amplitude_pitch_arm amplitude_yaw_arm roll_dumbbell pitch_dumbbell
## 1                   NA                NA      13.05217      -70.49400
## 2                   NA                NA      13.13074      -70.63751
## 3                   NA                NA      12.85075      -70.27812
## 4                   NA                NA      13.43120      -70.39379
## 7                   NA                NA      13.12695      -70.24757
## 11                  NA                NA      13.13074      -70.63751
##    yaw_dumbbell kurtosis_roll_dumbbell kurtosis_picth_dumbbell
## 1     -84.87394                                               
## 2     -84.71065                                               
## 3     -85.14078                                               
## 4     -84.87363                                               
## 7     -85.09961                                               
## 11    -84.71065                                               
##    kurtosis_yaw_dumbbell skewness_roll_dumbbell skewness_pitch_dumbbell
## 1                                                                      
## 2                                                                      
## 3                                                                      
## 4                                                                      
## 7                                                                      
## 11                                                                     
##    skewness_yaw_dumbbell max_roll_dumbbell max_picth_dumbbell
## 1                                       NA                 NA
## 2                                       NA                 NA
## 3                                       NA                 NA
## 4                                       NA                 NA
## 7                                       NA                 NA
## 11                                      NA                 NA
##    max_yaw_dumbbell min_roll_dumbbell min_pitch_dumbbell min_yaw_dumbbell
## 1                                  NA                 NA                 
## 2                                  NA                 NA                 
## 3                                  NA                 NA                 
## 4                                  NA                 NA                 
## 7                                  NA                 NA                 
## 11                                 NA                 NA                 
##    amplitude_roll_dumbbell amplitude_pitch_dumbbell amplitude_yaw_dumbbell
## 1                       NA                       NA                       
## 2                       NA                       NA                       
## 3                       NA                       NA                       
## 4                       NA                       NA                       
## 7                       NA                       NA                       
## 11                      NA                       NA                       
##    total_accel_dumbbell var_accel_dumbbell avg_roll_dumbbell
## 1                    37                 NA                NA
## 2                    37                 NA                NA
## 3                    37                 NA                NA
## 4                    37                 NA                NA
## 7                    37                 NA                NA
## 11                   37                 NA                NA
##    stddev_roll_dumbbell var_roll_dumbbell avg_pitch_dumbbell
## 1                    NA                NA                 NA
## 2                    NA                NA                 NA
## 3                    NA                NA                 NA
## 4                    NA                NA                 NA
## 7                    NA                NA                 NA
## 11                   NA                NA                 NA
##    stddev_pitch_dumbbell var_pitch_dumbbell avg_yaw_dumbbell
## 1                     NA                 NA               NA
## 2                     NA                 NA               NA
## 3                     NA                 NA               NA
## 4                     NA                 NA               NA
## 7                     NA                 NA               NA
## 11                    NA                 NA               NA
##    stddev_yaw_dumbbell var_yaw_dumbbell gyros_dumbbell_x gyros_dumbbell_y
## 1                   NA               NA                0            -0.02
## 2                   NA               NA                0            -0.02
## 3                   NA               NA                0            -0.02
## 4                   NA               NA                0            -0.02
## 7                   NA               NA                0            -0.02
## 11                  NA               NA                0            -0.02
##    gyros_dumbbell_z accel_dumbbell_x accel_dumbbell_y accel_dumbbell_z
## 1              0.00             -234               47             -271
## 2              0.00             -233               47             -269
## 3              0.00             -232               46             -270
## 4             -0.02             -232               48             -269
## 7              0.00             -232               47             -270
## 11             0.00             -233               47             -269
##    magnet_dumbbell_x magnet_dumbbell_y magnet_dumbbell_z roll_forearm
## 1               -559               293               -65         28.4
## 2               -555               296               -64         28.3
## 3               -561               298               -63         28.3
## 4               -552               303               -60         28.1
## 7               -551               295               -70         27.9
## 11              -564               299               -64         27.6
##    pitch_forearm yaw_forearm kurtosis_roll_forearm kurtosis_picth_forearm
## 1          -63.9        -153                                             
## 2          -63.9        -153                                             
## 3          -63.9        -152                                             
## 4          -63.9        -152                                             
## 7          -63.9        -152                                             
## 11         -63.8        -152                                             
##    kurtosis_yaw_forearm skewness_roll_forearm skewness_pitch_forearm
## 1                                                                   
## 2                                                                   
## 3                                                                   
## 4                                                                   
## 7                                                                   
## 11                                                                  
##    skewness_yaw_forearm max_roll_forearm max_picth_forearm max_yaw_forearm
## 1                                     NA                NA                
## 2                                     NA                NA                
## 3                                     NA                NA                
## 4                                     NA                NA                
## 7                                     NA                NA                
## 11                                    NA                NA                
##    min_roll_forearm min_pitch_forearm min_yaw_forearm
## 1                NA                NA                
## 2                NA                NA                
## 3                NA                NA                
## 4                NA                NA                
## 7                NA                NA                
## 11               NA                NA                
##    amplitude_roll_forearm amplitude_pitch_forearm amplitude_yaw_forearm
## 1                      NA                      NA                      
## 2                      NA                      NA                      
## 3                      NA                      NA                      
## 4                      NA                      NA                      
## 7                      NA                      NA                      
## 11                     NA                      NA                      
##    total_accel_forearm var_accel_forearm avg_roll_forearm
## 1                   36                NA               NA
## 2                   36                NA               NA
## 3                   36                NA               NA
## 4                   36                NA               NA
## 7                   36                NA               NA
## 11                  36                NA               NA
##    stddev_roll_forearm var_roll_forearm avg_pitch_forearm
## 1                   NA               NA                NA
## 2                   NA               NA                NA
## 3                   NA               NA                NA
## 4                   NA               NA                NA
## 7                   NA               NA                NA
## 11                  NA               NA                NA
##    stddev_pitch_forearm var_pitch_forearm avg_yaw_forearm
## 1                    NA                NA              NA
## 2                    NA                NA              NA
## 3                    NA                NA              NA
## 4                    NA                NA              NA
## 7                    NA                NA              NA
## 11                   NA                NA              NA
##    stddev_yaw_forearm var_yaw_forearm gyros_forearm_x gyros_forearm_y
## 1                  NA              NA            0.03            0.00
## 2                  NA              NA            0.02            0.00
## 3                  NA              NA            0.03           -0.02
## 4                  NA              NA            0.02           -0.02
## 7                  NA              NA            0.02            0.00
## 11                 NA              NA            0.02           -0.02
##    gyros_forearm_z accel_forearm_x accel_forearm_y accel_forearm_z
## 1            -0.02             192             203            -215
## 2            -0.02             192             203            -216
## 3             0.00             196             204            -213
## 4             0.00             189             206            -214
## 7            -0.02             195             205            -215
## 11           -0.02             193             205            -214
##    magnet_forearm_x magnet_forearm_y magnet_forearm_z classe
## 1               -17              654              476      A
## 2               -18              661              473      A
## 3               -18              658              469      A
## 4               -16              658              469      A
## 7               -18              659              470      A
## 11              -17              657              465      A

We observe that there are variables which are irrelavant for our analysis. Such as “X” and “user_variable” other NA value, Near Zero Variance (NZV) variables and ID variables

TrainSet <- TrainSet[, -(1:2)]
TestSet <- TestSet[, -(1:2)]

# 2.3-Remove Near Zero variable Variance
NZV <- nearZeroVar(TrainSet)
TrainSet <- TrainSet[, -NZV]
TestSet <- TestSet[, -NZV]
dim(TrainSet)

## [1] 13737   103

dim(TestSet )

## [1] 5885  103

# 2.4-Remove variable that have a possibility more than 70& of value of "NA"

NA_value <- sapply(TrainSet, function(x) mean(is.na(x)))
TrainSet <- TrainSet[, (NA_value  > 0.7) == FALSE]
TestSet  <- TestSet[, (NA_value  > 0.7) == FALSE]
dim(TrainSet)

## [1] 13737    57

dim(TestSet)

## [1] 5885   57

# 2.5-Remove the ID variables
TrainSet <- TrainSet[, -(1:5)]
TestSet  <- TestSet[, -(1:5)]
dim(TrainSet)

## [1] 13737    52

dim(TestSet)

## [1] 5885   52

After cleaning the irrelevant variables, we manage to reduce the number of variables for analysis to 52

3-Correaltion among the variables

Let’s run a correlation analysis among the variables to see if some of varaibles are highly correlated to each other.

library(corrplot)
corelationMatrix <- cor(TrainSet[, -52])
corrplot(corelationMatrix, order = "hclust", method = "color", type = "lower", tl.cex = 0.7, tl.col = rgb(0, 0, 0))

# order = "FPC", refers to  the first principal component order
# order = "hclust", refers to hierarchical clustering order.
# type should only be one of “full”, “lower”, “upper” value.

4-Prediction Model

In this part, we will use 3 prediction model to simulate the regression in the TrainSet,contrast related results and accuracy level, then select the highest accuracy methode for the quiz prediction.

The prediction methode used are: - Decision Trees - Random Forests - Generalized Boosted Regression

4.1- Decision Trees Methode

set.seed(12000)
modelFit_DecisionTree <- rpart(classe ~ ., data = TrainSet, method = "class")
fancyRpartPlot(modelFit_DecisionTree)

## Warning: labs do not fit even at cex 0.15, there may be some overplotting

# Decision Tree Prediction on Test dataset
Pred_DecisionTree  <- predict(modelFit_DecisionTree, newdata = TestSet, type = "class")
ConfusionMatrix_DecisionTree <- confusionMatrix(Pred_DecisionTree, TestSet$classe)
ConfusionMatrix_DecisionTree

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 1460  185   19   36   51
##          B   62  647   59   80  156
##          C   26   79  779  137  155
##          D  106  177  143  654  128
##          E   20   51   26   57  592
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7021          
##                  95% CI : (0.6903, 0.7138)
##     No Information Rate : 0.2845          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.6232          
##  Mcnemar's Test P-Value : < 2.2e-16       
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            0.8722   0.5680   0.7593   0.6784   0.5471
## Specificity            0.9309   0.9248   0.9183   0.8874   0.9679
## Pos Pred Value         0.8338   0.6444   0.6624   0.5414   0.7936
## Neg Pred Value         0.9482   0.8992   0.9475   0.9337   0.9047
## Prevalence             0.2845   0.1935   0.1743   0.1638   0.1839
## Detection Rate         0.2481   0.1099   0.1324   0.1111   0.1006
## Detection Prevalence   0.2975   0.1706   0.1998   0.2053   0.1268
## Balanced Accuracy      0.9015   0.7464   0.8388   0.7829   0.7575

# Plot Decision Tree Confusion Matrix
plot(ConfusionMatrix_DecisionTree$table, col = ConfusionMatrix_DecisionTree$byClass, 
     main = paste("Decision Tree Confusion Matrix: Accuracy =",
                  round(ConfusionMatrix_DecisionTree$overall['Accuracy'], 4)))

###### 4.2- Random Forests Methode

# Random Forests Prediction on Test dataset
set.seed(12000)
ModelFit_RanForest <- randomForest(classe~., data = TrainSet)
Pred_RanForest <- predict(ModelFit_RanForest, TestSet, type = "class")
ConfusionMatrix_RanForest <- confusionMatrix(Pred_RanForest, TestSet$classe)
ConfusionMatrix_RanForest

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 1674    7    0    0    0
##          B    0 1131    4    0    0
##          C    0    1 1022   11    2
##          D    0    0    0  953    4
##          E    0    0    0    0 1076
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9951          
##                  95% CI : (0.9929, 0.9967)
##     No Information Rate : 0.2845          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9938          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            1.0000   0.9930   0.9961   0.9886   0.9945
## Specificity            0.9983   0.9992   0.9971   0.9992   1.0000
## Pos Pred Value         0.9958   0.9965   0.9865   0.9958   1.0000
## Neg Pred Value         1.0000   0.9983   0.9992   0.9978   0.9988
## Prevalence             0.2845   0.1935   0.1743   0.1638   0.1839
## Detection Rate         0.2845   0.1922   0.1737   0.1619   0.1828
## Detection Prevalence   0.2856   0.1929   0.1760   0.1626   0.1828
## Balanced Accuracy      0.9992   0.9961   0.9966   0.9939   0.9972

# Plot ModelFit Random Forest
plot(ModelFit_RanForest)

# Plot Random Forest Confusion Matrix
plot(ConfusionMatrix_RanForest$table, col = ConfusionMatrix_RanForest$byClass, 
     main = paste("Random Forest Confusion Matrix: Accuracy =",      
                   round(ConfusionMatrix_RanForest$overall['Accuracy'], 4)))

###### 4.3-Generalized Boosted Regression Methode (GBM)

ModelFit_Control <- trainControl(method = "repeatedcv", number = 5, repeats = 1)
ModelFit_GBM <- train(classe ~ ., data = TrainSet, 
                      method = "gbm", 
                      trControl = ModelFit_Control, 
                      verbose = FALSE)

## Loading required package: gbm

## Loading required package: survival

## 
## Attaching package: 'survival'

## The following object is masked from 'package:caret':
## 
##     cluster

## Loading required package: splines

## Loading required package: parallel

## Loaded gbm 2.1.1

## Loading required package: plyr

GBM_FinalModel <- ModelFit_GBM$finalModel
GBM_FinalModel

## A gradient boosted model with multinomial loss function.
## 150 iterations were performed.
## There were 51 predictors of which 43 had non-zero influence.

# Generalized Boosted Regression Methode (GBM) Prediction on Test dataset
Pred_GBM <- predict(ModelFit_GBM, newdata = TestSet)
ConfusionMatrix_GBM <- confusionMatrix(Pred_GBM, TestSet$classe)
ConfusionMatrix_GBM

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 1645   47    0    1    0
##          B   20 1056   40    3   11
##          C    7   29  972   35   11
##          D    2    1   12  917   17
##          E    0    6    2    8 1043
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9572          
##                  95% CI : (0.9517, 0.9622)
##     No Information Rate : 0.2845          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9458          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            0.9827   0.9271   0.9474   0.9512   0.9640
## Specificity            0.9886   0.9844   0.9831   0.9935   0.9967
## Pos Pred Value         0.9716   0.9345   0.9222   0.9663   0.9849
## Neg Pred Value         0.9931   0.9825   0.9888   0.9905   0.9919
## Prevalence             0.2845   0.1935   0.1743   0.1638   0.1839
## Detection Rate         0.2795   0.1794   0.1652   0.1558   0.1772
## Detection Prevalence   0.2877   0.1920   0.1791   0.1613   0.1799
## Balanced Accuracy      0.9856   0.9558   0.9652   0.9724   0.9803

# Plot Generalized Boosted Regression Methode (GBM) Confusion Matrix
plot(ConfusionMatrix_GBM$table, col = ConfusionMatrix_GBM$byClass,
     main = paste("GBM Confusion Matrix: Accuracy =",
                   round(ConfusionMatrix_GBM$overall['Accuracy'], 4)))

###### 4.4-Contrast the acuracy of the Three Prediction Methodes

# Comparison of the Three Prediction Methodes Accuracy result 
print(paste("Decision Tree Confusion Matrix: Accuracy =",
                  round(ConfusionMatrix_DecisionTree$overall['Accuracy'], 4)))

## [1] "Decision Tree Confusion Matrix: Accuracy = 0.7021"

print(paste("Random Forest Confusion Matrix: Accuracy =",      
                   round(ConfusionMatrix_RanForest$overall['Accuracy'], 4)))

## [1] "Random Forest Confusion Matrix: Accuracy = 0.9951"

print(paste("GBM Confusion Matrix: Accuracy: Accuracy =",
                   round(ConfusionMatrix_GBM$overall['Accuracy'], 4)))

## [1] "GBM Confusion Matrix: Accuracy: Accuracy = 0.9572"

As the we can see from the above Accuracy Comparison, the Random Forest model have the highest level of accuracy of 0.9949. The expected out of-sample error = 100% - 99.49% = 0.51%.

Therefore, we would choose the Random Forest model to precit 20 different test cases.

5-Apply the the Random Forest model to precit 20 different test cases.(TestData)

The 20 Quiz tested results are shown below:

Show the Extract of the first 20 results (Total more than 8000 entries)

# Show the Extract of the first 20 results (Total morethan 8000 entries)
Pred_Test[1:20]

##  5  6  8  9 10 18 20 27 29 32 34 39 43 44 50 54 56 57 58 59 
##  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  A 
## Levels: A B C D E

Pratical Machine Learning-Project

Spark-lin

12/3/2016