This analysis document serves as a final report for the peer assessment review of Coursera’s course Machine learning as a part of the Data Science Speialization course affiliated by John Hopkins University. This report is built on Rstudio using the knitr functionalities.The goal of your project is to predict the manner in which 6 participants completed different forms of exercies as described in the next section. The variable “class” in the training set will be the main predictor for this exercise.
Using devices such as Jawbone Up, Nike FuelBand, and Fitbit it is now possible to collect a large amount of data about personal activity relatively inexpensively. These type of devices are part of the quantified self movement - a group of enthusiasts who take measurements about themselves regularly to improve their health, to find patterns in their behavior, or because they are tech geeks. One thing that people regularly do is quantify how much of a particular activity they do, but they rarely quantify how well they do it. In this project, your goal will be to use data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants. They were asked to perform barbell lifts correctly and incorrectly in 5 different ways. More information is available from the website here: http://groupware.les.inf.puc-rio.br/har (see the section on the Weight Lifting Exercise Dataset).
The training data for this project are available here:
https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv
The test data are available here:
https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv
The data for this project come from this source: http://groupware.les.inf.puc-rio.br/har.
traindata_url <- "https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv"
testdata_url <- "https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv"
Trainfile <- "pml-training.csv"
Testfile <- "pml-testing.csv"
if(!file.exists(Trainfile))
{
download.file(traindata_url, destfile = Trainfile)
}
training_pml <- read.csv(Trainfile)
if(!file.exists(Testfile))
{
download.file(testdata_url, destfile = Testfile)
}
test_pml <- read.csv(Testfile)
#Create a partition to build a tranining set(80% of the data) for modelling purpose and to build a test set for validation purpose.
inTrain <- createDataPartition(training_pml$classe,p=0.7,list = FALSE)
Trainset <- training_pml[inTrain, ]
Testset <- training_pml[-inTrain, ]
head(Trainset)
## X user_name raw_timestamp_part_1 raw_timestamp_part_2 cvtd_timestamp
## 2 2 carlitos 1323084231 808298 05/12/2011 11:23
## 3 3 carlitos 1323084231 820366 05/12/2011 11:23
## 4 4 carlitos 1323084232 120339 05/12/2011 11:23
## 5 5 carlitos 1323084232 196328 05/12/2011 11:23
## 6 6 carlitos 1323084232 304277 05/12/2011 11:23
## 7 7 carlitos 1323084232 368296 05/12/2011 11:23
## new_window num_window roll_belt pitch_belt yaw_belt total_accel_belt
## 2 no 11 1.41 8.07 -94.4 3
## 3 no 11 1.42 8.07 -94.4 3
## 4 no 12 1.48 8.05 -94.4 3
## 5 no 12 1.48 8.07 -94.4 3
## 6 no 12 1.45 8.06 -94.4 3
## 7 no 12 1.42 8.09 -94.4 3
## kurtosis_roll_belt kurtosis_picth_belt kurtosis_yaw_belt
## 2
## 3
## 4
## 5
## 6
## 7
## skewness_roll_belt skewness_roll_belt.1 skewness_yaw_belt max_roll_belt
## 2 NA
## 3 NA
## 4 NA
## 5 NA
## 6 NA
## 7 NA
## max_picth_belt max_yaw_belt min_roll_belt min_pitch_belt min_yaw_belt
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## 7 NA NA NA
## amplitude_roll_belt amplitude_pitch_belt amplitude_yaw_belt
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
## 7 NA NA
## var_total_accel_belt avg_roll_belt stddev_roll_belt var_roll_belt
## 2 NA NA NA NA
## 3 NA NA NA NA
## 4 NA NA NA NA
## 5 NA NA NA NA
## 6 NA NA NA NA
## 7 NA NA NA NA
## avg_pitch_belt stddev_pitch_belt var_pitch_belt avg_yaw_belt
## 2 NA NA NA NA
## 3 NA NA NA NA
## 4 NA NA NA NA
## 5 NA NA NA NA
## 6 NA NA NA NA
## 7 NA NA NA NA
## stddev_yaw_belt var_yaw_belt gyros_belt_x gyros_belt_y gyros_belt_z
## 2 NA NA 0.02 0.00 -0.02
## 3 NA NA 0.00 0.00 -0.02
## 4 NA NA 0.02 0.00 -0.03
## 5 NA NA 0.02 0.02 -0.02
## 6 NA NA 0.02 0.00 -0.02
## 7 NA NA 0.02 0.00 -0.02
## accel_belt_x accel_belt_y accel_belt_z magnet_belt_x magnet_belt_y
## 2 -22 4 22 -7 608
## 3 -20 5 23 -2 600
## 4 -22 3 21 -6 604
## 5 -21 2 24 -6 600
## 6 -21 4 21 0 603
## 7 -22 3 21 -4 599
## magnet_belt_z roll_arm pitch_arm yaw_arm total_accel_arm var_accel_arm
## 2 -311 -128 22.5 -161 34 NA
## 3 -305 -128 22.5 -161 34 NA
## 4 -310 -128 22.1 -161 34 NA
## 5 -302 -128 22.1 -161 34 NA
## 6 -312 -128 22.0 -161 34 NA
## 7 -311 -128 21.9 -161 34 NA
## avg_roll_arm stddev_roll_arm var_roll_arm avg_pitch_arm stddev_pitch_arm
## 2 NA NA NA NA NA
## 3 NA NA NA NA NA
## 4 NA NA NA NA NA
## 5 NA NA NA NA NA
## 6 NA NA NA NA NA
## 7 NA NA NA NA NA
## var_pitch_arm avg_yaw_arm stddev_yaw_arm var_yaw_arm gyros_arm_x
## 2 NA NA NA NA 0.02
## 3 NA NA NA NA 0.02
## 4 NA NA NA NA 0.02
## 5 NA NA NA NA 0.00
## 6 NA NA NA NA 0.02
## 7 NA NA NA NA 0.00
## gyros_arm_y gyros_arm_z accel_arm_x accel_arm_y accel_arm_z magnet_arm_x
## 2 -0.02 -0.02 -290 110 -125 -369
## 3 -0.02 -0.02 -289 110 -126 -368
## 4 -0.03 0.02 -289 111 -123 -372
## 5 -0.03 0.00 -289 111 -123 -374
## 6 -0.03 0.00 -289 111 -122 -369
## 7 -0.03 0.00 -289 111 -125 -373
## magnet_arm_y magnet_arm_z kurtosis_roll_arm kurtosis_picth_arm
## 2 337 513
## 3 344 513
## 4 344 512
## 5 337 506
## 6 342 513
## 7 336 509
## kurtosis_yaw_arm skewness_roll_arm skewness_pitch_arm skewness_yaw_arm
## 2
## 3
## 4
## 5
## 6
## 7
## max_roll_arm max_picth_arm max_yaw_arm min_roll_arm min_pitch_arm
## 2 NA NA NA NA NA
## 3 NA NA NA NA NA
## 4 NA NA NA NA NA
## 5 NA NA NA NA NA
## 6 NA NA NA NA NA
## 7 NA NA NA NA NA
## min_yaw_arm amplitude_roll_arm amplitude_pitch_arm amplitude_yaw_arm
## 2 NA NA NA NA
## 3 NA NA NA NA
## 4 NA NA NA NA
## 5 NA NA NA NA
## 6 NA NA NA NA
## 7 NA NA NA NA
## roll_dumbbell pitch_dumbbell yaw_dumbbell kurtosis_roll_dumbbell
## 2 13.13074 -70.63751 -84.71065
## 3 12.85075 -70.27812 -85.14078
## 4 13.43120 -70.39379 -84.87363
## 5 13.37872 -70.42856 -84.85306
## 6 13.38246 -70.81759 -84.46500
## 7 13.12695 -70.24757 -85.09961
## kurtosis_picth_dumbbell kurtosis_yaw_dumbbell skewness_roll_dumbbell
## 2
## 3
## 4
## 5
## 6
## 7
## skewness_pitch_dumbbell skewness_yaw_dumbbell max_roll_dumbbell
## 2 NA
## 3 NA
## 4 NA
## 5 NA
## 6 NA
## 7 NA
## max_picth_dumbbell max_yaw_dumbbell min_roll_dumbbell min_pitch_dumbbell
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## 7 NA NA NA
## min_yaw_dumbbell amplitude_roll_dumbbell amplitude_pitch_dumbbell
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
## 7 NA NA
## amplitude_yaw_dumbbell total_accel_dumbbell var_accel_dumbbell
## 2 37 NA
## 3 37 NA
## 4 37 NA
## 5 37 NA
## 6 37 NA
## 7 37 NA
## avg_roll_dumbbell stddev_roll_dumbbell var_roll_dumbbell
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## 7 NA NA NA
## avg_pitch_dumbbell stddev_pitch_dumbbell var_pitch_dumbbell
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## 7 NA NA NA
## avg_yaw_dumbbell stddev_yaw_dumbbell var_yaw_dumbbell gyros_dumbbell_x
## 2 NA NA NA 0
## 3 NA NA NA 0
## 4 NA NA NA 0
## 5 NA NA NA 0
## 6 NA NA NA 0
## 7 NA NA NA 0
## gyros_dumbbell_y gyros_dumbbell_z accel_dumbbell_x accel_dumbbell_y
## 2 -0.02 0.00 -233 47
## 3 -0.02 0.00 -232 46
## 4 -0.02 -0.02 -232 48
## 5 -0.02 0.00 -233 48
## 6 -0.02 0.00 -234 48
## 7 -0.02 0.00 -232 47
## accel_dumbbell_z magnet_dumbbell_x magnet_dumbbell_y magnet_dumbbell_z
## 2 -269 -555 296 -64
## 3 -270 -561 298 -63
## 4 -269 -552 303 -60
## 5 -270 -554 292 -68
## 6 -269 -558 294 -66
## 7 -270 -551 295 -70
## roll_forearm pitch_forearm yaw_forearm kurtosis_roll_forearm
## 2 28.3 -63.9 -153
## 3 28.3 -63.9 -152
## 4 28.1 -63.9 -152
## 5 28.0 -63.9 -152
## 6 27.9 -63.9 -152
## 7 27.9 -63.9 -152
## kurtosis_picth_forearm kurtosis_yaw_forearm skewness_roll_forearm
## 2
## 3
## 4
## 5
## 6
## 7
## skewness_pitch_forearm skewness_yaw_forearm max_roll_forearm
## 2 NA
## 3 NA
## 4 NA
## 5 NA
## 6 NA
## 7 NA
## max_picth_forearm max_yaw_forearm min_roll_forearm min_pitch_forearm
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## 7 NA NA NA
## min_yaw_forearm amplitude_roll_forearm amplitude_pitch_forearm
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
## 7 NA NA
## amplitude_yaw_forearm total_accel_forearm var_accel_forearm
## 2 36 NA
## 3 36 NA
## 4 36 NA
## 5 36 NA
## 6 36 NA
## 7 36 NA
## avg_roll_forearm stddev_roll_forearm var_roll_forearm avg_pitch_forearm
## 2 NA NA NA NA
## 3 NA NA NA NA
## 4 NA NA NA NA
## 5 NA NA NA NA
## 6 NA NA NA NA
## 7 NA NA NA NA
## stddev_pitch_forearm var_pitch_forearm avg_yaw_forearm
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## 7 NA NA NA
## stddev_yaw_forearm var_yaw_forearm gyros_forearm_x gyros_forearm_y
## 2 NA NA 0.02 0.00
## 3 NA NA 0.03 -0.02
## 4 NA NA 0.02 -0.02
## 5 NA NA 0.02 0.00
## 6 NA NA 0.02 -0.02
## 7 NA NA 0.02 0.00
## gyros_forearm_z accel_forearm_x accel_forearm_y accel_forearm_z
## 2 -0.02 192 203 -216
## 3 0.00 196 204 -213
## 4 0.00 189 206 -214
## 5 -0.02 189 206 -214
## 6 -0.03 193 203 -215
## 7 -0.02 195 205 -215
## magnet_forearm_x magnet_forearm_y magnet_forearm_z classe
## 2 -18 661 473 A
## 3 -18 658 469 A
## 4 -16 658 469 A
## 5 -17 655 473 A
## 6 -9 660 478 A
## 7 -18 659 470 A
As a part of data cleansing process the non-zero variance values and the varables with MOSTLY NA values are going to be removed from the training and test data sets
## [1] 13737 160
## [1] 5885 160
## [1] 13737 54
## [1] 5885 54
As a part of initial analysis process, we need to draw few plots to check the relation between variables from the training dataset
## corrplot 0.84 loaded
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
##
## legend
##
## Attaching package: 'gplots'
## The following object is masked from 'package:PerformanceAnalytics':
##
## textplot
## The following object is masked from 'package:stats':
##
## lowess
After plotting the variables using correlated plots and heatmap , we could see that the related variables,which are shown on dark colors, are very few in numbers so there is no need to perform PCA(Principal component analysis) test further.
In this particular exercise , there are three popular methods will be used to build a regression model for the train dataset and the best model with the highest accuracy will be applied to test dataset for quiz predictions. Three regression model methods are defined below: a) Random Forest b) Decision Tree c) Generalized Boosted Model
In order to measure the performance level a confusionmatrix will be drawn to find the best fitted model with highest accuracy.
#train the model using random forest method
controlRF <- trainControl(method = "cv",number=3, verboseIter=FALSE)
modrandomforest <- train(classe ~ .,data=Trainset,method="rf",trControl=controlRF)
modrandomforest
## Random Forest
##
## 13737 samples
## 53 predictor
## 5 classes: 'A', 'B', 'C', 'D', 'E'
##
## No pre-processing
## Resampling: Cross-Validated (3 fold)
## Summary of sample sizes: 9158, 9157, 9159
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.9914097 0.9891326
## 27 0.9959233 0.9948431
## 53 0.9936665 0.9919882
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 27.
#prediction on test dataset
predictrf <- predict(modrandomforest,newdata = Testset)
#Find accuracy using confusion matrix
confrandomforest <- confusionMatrix(predictrf,Testset$classe)
confrandomforest
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 1674 5 0 0 0
## B 0 1133 2 0 0
## C 0 1 1024 3 0
## D 0 0 0 961 3
## E 0 0 0 0 1079
##
## Overall Statistics
##
## Accuracy : 0.9976
## 95% CI : (0.996, 0.9987)
## No Information Rate : 0.2845
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.997
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 1.0000 0.9947 0.9981 0.9969 0.9972
## Specificity 0.9988 0.9996 0.9992 0.9994 1.0000
## Pos Pred Value 0.9970 0.9982 0.9961 0.9969 1.0000
## Neg Pred Value 1.0000 0.9987 0.9996 0.9994 0.9994
## Prevalence 0.2845 0.1935 0.1743 0.1638 0.1839
## Detection Rate 0.2845 0.1925 0.1740 0.1633 0.1833
## Detection Prevalence 0.2853 0.1929 0.1747 0.1638 0.1833
## Balanced Accuracy 0.9994 0.9972 0.9986 0.9981 0.9986
confrandomforest$overall['Accuracy']
## Accuracy
## 0.9976211
Random Forest model accuracy 0.9927.
moddecisiontree <- rpart(classe ~ .,data=Trainset,method="class")
fancyRpartPlot(moddecisiontree)
#prediction on test dataset
predictdc <- predict(moddecisiontree,newdata = Testset,type = "class")
#Find accuracy using confusion matrix
confdc <- confusionMatrix(predictdc,Testset$classe)
confdc
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 1499 117 55 26 16
## B 38 709 51 24 50
## C 59 106 797 177 95
## D 75 200 122 675 100
## E 3 7 1 62 821
##
## Overall Statistics
##
## Accuracy : 0.7648
## 95% CI : (0.7538, 0.7756)
## No Information Rate : 0.2845
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.7028
## Mcnemar's Test P-Value : < 2.2e-16
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 0.8955 0.6225 0.7768 0.7002 0.7588
## Specificity 0.9492 0.9657 0.9101 0.8990 0.9848
## Pos Pred Value 0.8751 0.8131 0.6459 0.5759 0.9183
## Neg Pred Value 0.9581 0.9142 0.9508 0.9387 0.9477
## Prevalence 0.2845 0.1935 0.1743 0.1638 0.1839
## Detection Rate 0.2547 0.1205 0.1354 0.1147 0.1395
## Detection Prevalence 0.2911 0.1482 0.2097 0.1992 0.1519
## Balanced Accuracy 0.9223 0.7941 0.8434 0.7996 0.8718
confdc$overall['Accuracy']
## Accuracy
## 0.7648258
Decision model accuracy 0.7138.
controlGBM <- trainControl(method = "repeatedcv",number=3, verboseIter=FALSE)
modgbm <- train(classe ~ .,data=Trainset,method="gbm",trControl=controlGBM)
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1271
## 2 1.5242 nan 0.1000 0.0872
## 3 1.4672 nan 0.1000 0.0636
## 4 1.4245 nan 0.1000 0.0543
## 5 1.3888 nan 0.1000 0.0497
## 6 1.3561 nan 0.1000 0.0396
## 7 1.3305 nan 0.1000 0.0408
## 8 1.3044 nan 0.1000 0.0408
## 9 1.2772 nan 0.1000 0.0330
## 10 1.2557 nan 0.1000 0.0287
## 20 1.0890 nan 0.1000 0.0178
## 40 0.9078 nan 0.1000 0.0089
## 60 0.7976 nan 0.1000 0.0076
## 80 0.7116 nan 0.1000 0.0053
## 100 0.6472 nan 0.1000 0.0036
## 120 0.5917 nan 0.1000 0.0025
## 140 0.5460 nan 0.1000 0.0030
## 150 0.5248 nan 0.1000 0.0026
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1949
## 2 1.4857 nan 0.1000 0.1349
## 3 1.3998 nan 0.1000 0.1011
## 4 1.3342 nan 0.1000 0.0926
## 5 1.2752 nan 0.1000 0.0758
## 6 1.2264 nan 0.1000 0.0638
## 7 1.1852 nan 0.1000 0.0653
## 8 1.1443 nan 0.1000 0.0509
## 9 1.1112 nan 0.1000 0.0463
## 10 1.0818 nan 0.1000 0.0470
## 20 0.8524 nan 0.1000 0.0236
## 40 0.6275 nan 0.1000 0.0219
## 60 0.4839 nan 0.1000 0.0164
## 80 0.3901 nan 0.1000 0.0065
## 100 0.3226 nan 0.1000 0.0066
## 120 0.2684 nan 0.1000 0.0036
## 140 0.2304 nan 0.1000 0.0023
## 150 0.2098 nan 0.1000 0.0017
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2421
## 2 1.4570 nan 0.1000 0.1639
## 3 1.3530 nan 0.1000 0.1380
## 4 1.2673 nan 0.1000 0.1025
## 5 1.2016 nan 0.1000 0.0987
## 6 1.1388 nan 0.1000 0.0715
## 7 1.0916 nan 0.1000 0.0690
## 8 1.0481 nan 0.1000 0.0678
## 9 1.0047 nan 0.1000 0.0643
## 10 0.9652 nan 0.1000 0.0697
## 20 0.6978 nan 0.1000 0.0273
## 40 0.4618 nan 0.1000 0.0093
## 60 0.3346 nan 0.1000 0.0070
## 80 0.2580 nan 0.1000 0.0036
## 100 0.1981 nan 0.1000 0.0038
## 120 0.1553 nan 0.1000 0.0029
## 140 0.1238 nan 0.1000 0.0008
## 150 0.1109 nan 0.1000 0.0011
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1256
## 2 1.5215 nan 0.1000 0.0860
## 3 1.4633 nan 0.1000 0.0689
## 4 1.4176 nan 0.1000 0.0531
## 5 1.3826 nan 0.1000 0.0444
## 6 1.3527 nan 0.1000 0.0457
## 7 1.3228 nan 0.1000 0.0398
## 8 1.2971 nan 0.1000 0.0391
## 9 1.2690 nan 0.1000 0.0316
## 10 1.2479 nan 0.1000 0.0301
## 20 1.0842 nan 0.1000 0.0192
## 40 0.8969 nan 0.1000 0.0088
## 60 0.7816 nan 0.1000 0.0059
## 80 0.6997 nan 0.1000 0.0054
## 100 0.6324 nan 0.1000 0.0027
## 120 0.5782 nan 0.1000 0.0042
## 140 0.5308 nan 0.1000 0.0032
## 150 0.5106 nan 0.1000 0.0019
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1961
## 2 1.4830 nan 0.1000 0.1302
## 3 1.3961 nan 0.1000 0.1054
## 4 1.3292 nan 0.1000 0.0882
## 5 1.2721 nan 0.1000 0.0787
## 6 1.2213 nan 0.1000 0.0756
## 7 1.1724 nan 0.1000 0.0523
## 8 1.1377 nan 0.1000 0.0513
## 9 1.1042 nan 0.1000 0.0408
## 10 1.0764 nan 0.1000 0.0433
## 20 0.8455 nan 0.1000 0.0214
## 40 0.6094 nan 0.1000 0.0173
## 60 0.4700 nan 0.1000 0.0066
## 80 0.3800 nan 0.1000 0.0055
## 100 0.3152 nan 0.1000 0.0027
## 120 0.2653 nan 0.1000 0.0031
## 140 0.2231 nan 0.1000 0.0015
## 150 0.2079 nan 0.1000 0.0014
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2455
## 2 1.4543 nan 0.1000 0.1671
## 3 1.3518 nan 0.1000 0.1267
## 4 1.2718 nan 0.1000 0.1111
## 5 1.2008 nan 0.1000 0.0924
## 6 1.1417 nan 0.1000 0.0779
## 7 1.0912 nan 0.1000 0.0830
## 8 1.0400 nan 0.1000 0.0663
## 9 0.9989 nan 0.1000 0.0725
## 10 0.9547 nan 0.1000 0.0706
## 20 0.6910 nan 0.1000 0.0273
## 40 0.4392 nan 0.1000 0.0089
## 60 0.3177 nan 0.1000 0.0063
## 80 0.2332 nan 0.1000 0.0035
## 100 0.1834 nan 0.1000 0.0028
## 120 0.1449 nan 0.1000 0.0023
## 140 0.1161 nan 0.1000 0.0015
## 150 0.1040 nan 0.1000 0.0023
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1311
## 2 1.5230 nan 0.1000 0.0849
## 3 1.4658 nan 0.1000 0.0691
## 4 1.4197 nan 0.1000 0.0508
## 5 1.3853 nan 0.1000 0.0494
## 6 1.3534 nan 0.1000 0.0426
## 7 1.3260 nan 0.1000 0.0417
## 8 1.3001 nan 0.1000 0.0320
## 9 1.2796 nan 0.1000 0.0366
## 10 1.2543 nan 0.1000 0.0346
## 20 1.0948 nan 0.1000 0.0205
## 40 0.9116 nan 0.1000 0.0070
## 60 0.8015 nan 0.1000 0.0050
## 80 0.7176 nan 0.1000 0.0055
## 100 0.6512 nan 0.1000 0.0017
## 120 0.5966 nan 0.1000 0.0034
## 140 0.5498 nan 0.1000 0.0031
## 150 0.5276 nan 0.1000 0.0029
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1926
## 2 1.4866 nan 0.1000 0.1293
## 3 1.4031 nan 0.1000 0.1068
## 4 1.3345 nan 0.1000 0.0871
## 5 1.2783 nan 0.1000 0.0706
## 6 1.2306 nan 0.1000 0.0769
## 7 1.1825 nan 0.1000 0.0521
## 8 1.1484 nan 0.1000 0.0569
## 9 1.1127 nan 0.1000 0.0412
## 10 1.0870 nan 0.1000 0.0448
## 20 0.8691 nan 0.1000 0.0179
## 40 0.6242 nan 0.1000 0.0107
## 60 0.4890 nan 0.1000 0.0060
## 80 0.3966 nan 0.1000 0.0056
## 100 0.3297 nan 0.1000 0.0038
## 120 0.2768 nan 0.1000 0.0015
## 140 0.2331 nan 0.1000 0.0016
## 150 0.2157 nan 0.1000 0.0016
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2404
## 2 1.4581 nan 0.1000 0.1619
## 3 1.3555 nan 0.1000 0.1284
## 4 1.2748 nan 0.1000 0.1021
## 5 1.2099 nan 0.1000 0.0977
## 6 1.1482 nan 0.1000 0.0739
## 7 1.1006 nan 0.1000 0.0799
## 8 1.0509 nan 0.1000 0.0652
## 9 1.0108 nan 0.1000 0.0601
## 10 0.9734 nan 0.1000 0.0657
## 20 0.7130 nan 0.1000 0.0377
## 40 0.4585 nan 0.1000 0.0084
## 60 0.3321 nan 0.1000 0.0057
## 80 0.2524 nan 0.1000 0.0038
## 100 0.1964 nan 0.1000 0.0033
## 120 0.1543 nan 0.1000 0.0016
## 140 0.1230 nan 0.1000 0.0014
## 150 0.1109 nan 0.1000 0.0010
##
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.2436
## 2 1.4571 nan 0.1000 0.1653
## 3 1.3516 nan 0.1000 0.1310
## 4 1.2695 nan 0.1000 0.1187
## 5 1.1954 nan 0.1000 0.0844
## 6 1.1413 nan 0.1000 0.0758
## 7 1.0915 nan 0.1000 0.0796
## 8 1.0419 nan 0.1000 0.0617
## 9 1.0029 nan 0.1000 0.0631
## 10 0.9642 nan 0.1000 0.0609
## 20 0.6912 nan 0.1000 0.0266
## 40 0.4534 nan 0.1000 0.0107
## 60 0.3178 nan 0.1000 0.0044
## 80 0.2389 nan 0.1000 0.0058
## 100 0.1845 nan 0.1000 0.0020
## 120 0.1492 nan 0.1000 0.0021
## 140 0.1221 nan 0.1000 0.0014
## 150 0.1088 nan 0.1000 0.0013
#prediction on test dataset
predictgbm <- predict(modgbm,newdata = Testset)
#Find accuracy using confusion matrix
confgbm <- confusionMatrix(predictgbm,Testset$classe)
confgbm
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 1673 15 0 1 0
## B 1 1120 13 5 4
## C 0 4 1008 16 5
## D 0 0 4 942 11
## E 0 0 1 0 1062
##
## Overall Statistics
##
## Accuracy : 0.9864
## 95% CI : (0.9831, 0.9892)
## No Information Rate : 0.2845
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9828
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 0.9994 0.9833 0.9825 0.9772 0.9815
## Specificity 0.9962 0.9952 0.9949 0.9970 0.9998
## Pos Pred Value 0.9905 0.9799 0.9758 0.9843 0.9991
## Neg Pred Value 0.9998 0.9960 0.9963 0.9955 0.9959
## Prevalence 0.2845 0.1935 0.1743 0.1638 0.1839
## Detection Rate 0.2843 0.1903 0.1713 0.1601 0.1805
## Detection Prevalence 0.2870 0.1942 0.1755 0.1626 0.1806
## Balanced Accuracy 0.9978 0.9892 0.9887 0.9871 0.9907
confgbm$overall['Accuracy']
## Accuracy
## 0.9864061
Decision model accuracy 0.9646.
Based on the confusionmatrix analysis results, the random forest method prouced the best accurate prediction on the “classe” variable of the test dataset. The Random Forest model will be applied to the original test dataset for final predictions.
finalpredict <- predict(modrandomforest,newdata = test_pml)
finalpredict
## [1] B A B A A E D B A A B C B A E E A B B B
## Levels: A B C D E