Prediction Assignment Writeup

Overview

This document aims to show the manner in which 6 participants performed some exercise.

Using devices such as Jawbone Up, Nike FuelBand, and Fitbit it is now possible to collect a large amount of data about personal activity relatively inexpensively. These type of devices are part of the quantified self movement – a group of enthusiasts who take measurements about themselves regularly to improve their health, to find patterns in their behavior, or because they are tech geeks. One thing that people regularly do is quantify how much of a particular activity they do, but they rarely quantify how well they do it. In this project, your goal will be to use data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants. They were asked to perform barbell lifts correctly and incorrectly in 5 different ways. More information is available from the website here: http://groupware.les.inf.puc-rio.br/har (see the section on the Weight Lifting Exercise Dataset).

Data

The data for this project come from this source.

The training and test data for this project are available here:

*Training Data

*Test Data

Goal of the Project.

The goal of this project is to predict the manner in which they did the exercise. This is the “classe” variable in the training set. It may use any of the other variables to predict with. Must create a report describing how was builded the model, how was used cross validation, what was thought the expected out of sample error is, and why was made the choices made. It will also use the prediction model to predict 20 different test cases.

Packages

library(data.table)
library(caret)
library(dplyr)
library(ggthemes)
library(corrplot)
library(RColorBrewer)
library(rpart)
library(rattle)
library(gbm)
library(MASS)

Download and Store data.

Download and check the files will be in the specify work directory.

setwd("C:/Users/aleja/Documents/Cursos/Coursera R pratices/Prediction Assignment Writeup")

train_url<-"https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv"
test_url<-"https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv"

ifelse(!dir.exists(file.path(getwd(), "Data")), 
       dir.create(file.path(getwd(), "Data")), FALSE)

Download the databases and save them in the correct file.

download.file(url = train_url, destfile = file.path("./Data", "self_movement_train_data.csv"), 
              method = "curl")

download.file(url = test_url, destfile = file.path("./Data", "self_movement_test_data.csv"), 
              method = "curl")

Verify that the databases were downloaded correctly.

list.files("./Data")

## [1] "self_movement_test_data.csv"  "self_movement_train_data.csv"

Data Processing

It will always be necessary to organize and arrange the database with which to work.

fread("./Data/self_movement_train_data.csv")->train_df
fread("./Data/self_movement_test_data.csv")->test_df

The train and test files have a lot of columns that contains only NA’s, thus, these columns will be dispensed.

str(train_df)

## Classes 'data.table' and 'data.frame':   19622 obs. of  160 variables:
##  $ V1                      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ user_name               : chr  "carlitos" "carlitos" "carlitos" "carlitos" ...
##  $ raw_timestamp_part_1    : int  1323084231 1323084231 1323084231 1323084232 1323084232 1323084232 1323084232 1323084232 1323084232 1323084232 ...
##  $ raw_timestamp_part_2    : int  788290 808298 820366 120339 196328 304277 368296 440390 484323 484434 ...
##  $ cvtd_timestamp          : chr  "05/12/2011 11:23" "05/12/2011 11:23" "05/12/2011 11:23" "05/12/2011 11:23" ...
##  $ new_window              : chr  "no" "no" "no" "no" ...
##  $ num_window              : int  11 11 11 12 12 12 12 12 12 12 ...
##  $ roll_belt               : num  1.41 1.41 1.42 1.48 1.48 1.45 1.42 1.42 1.43 1.45 ...
##  $ pitch_belt              : num  8.07 8.07 8.07 8.05 8.07 8.06 8.09 8.13 8.16 8.17 ...
##  $ yaw_belt                : num  -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 ...
##  $ total_accel_belt        : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ kurtosis_roll_belt      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ kurtosis_picth_belt     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ kurtosis_yaw_belt       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ skewness_roll_belt      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ skewness_roll_belt.1    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ skewness_yaw_belt       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_roll_belt           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_picth_belt          : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_yaw_belt            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_roll_belt           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_pitch_belt          : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_yaw_belt            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ amplitude_roll_belt     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ amplitude_pitch_belt    : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ amplitude_yaw_belt      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ var_total_accel_belt    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avg_roll_belt           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ stddev_roll_belt        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ var_roll_belt           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avg_pitch_belt          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ stddev_pitch_belt       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ var_pitch_belt          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avg_yaw_belt            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ stddev_yaw_belt         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ var_yaw_belt            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ gyros_belt_x            : num  0 0.02 0 0.02 0.02 0.02 0.02 0.02 0.02 0.03 ...
##  $ gyros_belt_y            : num  0 0 0 0 0.02 0 0 0 0 0 ...
##  $ gyros_belt_z            : num  -0.02 -0.02 -0.02 -0.03 -0.02 -0.02 -0.02 -0.02 -0.02 0 ...
##  $ accel_belt_x            : int  -21 -22 -20 -22 -21 -21 -22 -22 -20 -21 ...
##  $ accel_belt_y            : int  4 4 5 3 2 4 3 4 2 4 ...
##  $ accel_belt_z            : int  22 22 23 21 24 21 21 21 24 22 ...
##  $ magnet_belt_x           : int  -3 -7 -2 -6 -6 0 -4 -2 1 -3 ...
##  $ magnet_belt_y           : int  599 608 600 604 600 603 599 603 602 609 ...
##  $ magnet_belt_z           : int  -313 -311 -305 -310 -302 -312 -311 -313 -312 -308 ...
##  $ roll_arm                : num  -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 ...
##  $ pitch_arm               : num  22.5 22.5 22.5 22.1 22.1 22 21.9 21.8 21.7 21.6 ...
##  $ yaw_arm                 : num  -161 -161 -161 -161 -161 -161 -161 -161 -161 -161 ...
##  $ total_accel_arm         : int  34 34 34 34 34 34 34 34 34 34 ...
##  $ var_accel_arm           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avg_roll_arm            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ stddev_roll_arm         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ var_roll_arm            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avg_pitch_arm           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ stddev_pitch_arm        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ var_pitch_arm           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ avg_yaw_arm             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ stddev_yaw_arm          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ var_yaw_arm             : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ gyros_arm_x             : num  0 0.02 0.02 0.02 0 0.02 0 0.02 0.02 0.02 ...
##  $ gyros_arm_y             : num  0 -0.02 -0.02 -0.03 -0.03 -0.03 -0.03 -0.02 -0.03 -0.03 ...
##  $ gyros_arm_z             : num  -0.02 -0.02 -0.02 0.02 0 0 0 0 -0.02 -0.02 ...
##  $ accel_arm_x             : int  -288 -290 -289 -289 -289 -289 -289 -289 -288 -288 ...
##  $ accel_arm_y             : int  109 110 110 111 111 111 111 111 109 110 ...
##  $ accel_arm_z             : int  -123 -125 -126 -123 -123 -122 -125 -124 -122 -124 ...
##  $ magnet_arm_x            : int  -368 -369 -368 -372 -374 -369 -373 -372 -369 -376 ...
##  $ magnet_arm_y            : int  337 337 344 344 337 342 336 338 341 334 ...
##  $ magnet_arm_z            : int  516 513 513 512 506 513 509 510 518 516 ...
##  $ kurtosis_roll_arm       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ kurtosis_picth_arm      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ kurtosis_yaw_arm        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ skewness_roll_arm       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ skewness_pitch_arm      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ skewness_yaw_arm        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_roll_arm            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_picth_arm           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_yaw_arm             : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_roll_arm            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_pitch_arm           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_yaw_arm             : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ amplitude_roll_arm      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ amplitude_pitch_arm     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ amplitude_yaw_arm       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ roll_dumbbell           : num  13.1 13.1 12.9 13.4 13.4 ...
##  $ pitch_dumbbell          : num  -70.5 -70.6 -70.3 -70.4 -70.4 ...
##  $ yaw_dumbbell            : num  -84.9 -84.7 -85.1 -84.9 -84.9 ...
##  $ kurtosis_roll_dumbbell  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ kurtosis_picth_dumbbell : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ kurtosis_yaw_dumbbell   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ skewness_roll_dumbbell  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ skewness_pitch_dumbbell : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ skewness_yaw_dumbbell   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_roll_dumbbell       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_picth_dumbbell      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ max_yaw_dumbbell        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_roll_dumbbell       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_pitch_dumbbell      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ min_yaw_dumbbell        : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ amplitude_roll_dumbbell : num  NA NA NA NA NA NA NA NA NA NA ...
##   [list output truncated]
##  - attr(*, ".internal.selfref")=<externalptr>

cat("Amount of NA's in train set", sum(is.na(train_df)==TRUE), sep = "\n")

## Amount of NA's in train set
## 1925102

cat("Amount of NA's in test set",sum(is.na(test_df)==TRUE), sep = "\n")

## Amount of NA's in test set
## 2000

*For removing the columns with NA’s, it is use “dplyr”. Also, it will be remove the first 5 columns, since they do not contain relevant information.

train_df[,-c(1:5)] %>%
     select_if(~ !any(is.na(.)))->train_data

test_df[,-c(1:5)] %>%
     select_if(~ !any(is.na(.)))->test_data

cat("Amount of NA's in new train set", sum(is.na(train_data)==TRUE), sep = "\n")

## Amount of NA's in new train set
## 0

cat("Amount of NA's in new test set",sum(is.na(test_data)==TRUE), sep = "\n")

## Amount of NA's in new test set
## 0

Now, the dimensions of the remain datasets are:

cat("Dims. of training set", dim(train_data))

## Dims. of training set 19622 55

cat("Dims. of test set", dim(test_data))

## Dims. of test set 20 55

There is only 55 coumns remaining, which contains the majority of relevant information for the analysis.

Exploratory analysis

In the plot below, it is easy to see which are the most common classes.

g1 <- ggplot(data = train_data, aes(x=as.factor(train_data$classe)))
g1 + geom_bar(fill="firebrick3", colour="black")+theme_stata()+
     ylab("Frequency") + xlab("Classes") + ggtitle("Frequency of different classes")

In the correlation plot below, it can be seen the variables that have more correlation between them, neither is positive or negative relation. The corplot is ordered for the first principal components.

corM <- cor(train_data[, -c(1,55)])
corrplot(corM, order = "FPC", method = "circle", type = "upper", 
         tl.cex = 0.7, tl.col="black", col=brewer.pal(n=8, name="RdBu"))

Prediction Models.

In this project, predictive analysis will be performed with three models widely used today:

*Random Forests.

*Decision Tree.

*Generalized Boosted Model.

To perform the models, we need to divide our training data into train and test set. For this task, was used the “caret” package. For more information about this package, visit the link.

partition  <- createDataPartition(train_data$classe, p=0.75, list=FALSE)

train_set <- train_data[partition, ]

test_set <- train_data[-partition, ]

Random Forest.

control_rf <- trainControl(method="cv", number=4, verboseIter=FALSE)
fit_rf <- train(classe ~ ., data=train_set, method="rf",
                          trControl=control_rf)

This is the result of the final model from the Random Forest model.

fit_rf$finalModel

## 
## Call:
##  randomForest(x = x, y = y, mtry = param$mtry) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 28
## 
##         OOB estimate of  error rate: 0.2%
## Confusion matrix:
##      A    B    C    D    E  class.error
## A 4183    1    0    0    1 0.0004778973
## B    5 2840    3    0    0 0.0028089888
## C    0    6 2561    0    0 0.0023373588
## D    0    0   11 2401    0 0.0045605307
## E    0    0    0    2 2704 0.0007390983

Below, the prediction adjust to the test set and the confusion matrix from the Random Forest Model.

predict_rf<- predict(fit_rf, newdata=test_set)
matrix_rf <- confusionMatrix(predict_rf, as.factor(test_set$classe))
matrix_rf

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 1395    2    0    0    0
##          B    0  946    3    0    2
##          C    0    1  852    6    0
##          D    0    0    0  797    0
##          E    0    0    0    1  899
## 
## Overall Statistics
##                                          
##                Accuracy : 0.9969         
##                  95% CI : (0.995, 0.9983)
##     No Information Rate : 0.2845         
##     P-Value [Acc > NIR] : < 2.2e-16      
##                                          
##                   Kappa : 0.9961         
##                                          
##  Mcnemar's Test P-Value : NA             
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            1.0000   0.9968   0.9965   0.9913   0.9978
## Specificity            0.9994   0.9987   0.9983   1.0000   0.9998
## Pos Pred Value         0.9986   0.9947   0.9919   1.0000   0.9989
## Neg Pred Value         1.0000   0.9992   0.9993   0.9983   0.9995
## Prevalence             0.2845   0.1935   0.1743   0.1639   0.1837
## Detection Rate         0.2845   0.1929   0.1737   0.1625   0.1833
## Detection Prevalence   0.2849   0.1939   0.1752   0.1625   0.1835
## Balanced Accuracy      0.9997   0.9978   0.9974   0.9956   0.9988

plot(fit_rf, main="Random Forest Plot.")

Gradient Boosted Model (GBM)

controlGBM <- trainControl(method = "cv", number = 5)
fit_gbm  <- train(classe ~ ., data=train_set, method = "gbm",
                    trControl = controlGBM, verbose = FALSE)

This is the result of the final model from the GBM model.

fit_gbm$finalModel

## A gradient boosted model with multinomial loss function.
## 150 iterations were performed.
## There were 54 predictors of which 53 had non-zero influence.

Below, the prediction adjust to the test set and the confusion matrix from the GBM Model.

predict_gbm<- predict(fit_gbm, newdata=test_set)
matrix_gbm <- confusionMatrix(predict_gbm, as.factor(test_set$classe))
matrix_gbm

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 1393    9    0    0    0
##          B    1  931    8    2    6
##          C    0    7  846   15    0
##          D    1    2    1  786    2
##          E    0    0    0    1  893
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9888          
##                  95% CI : (0.9854, 0.9915)
##     No Information Rate : 0.2845          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9858          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            0.9986   0.9810   0.9895   0.9776   0.9911
## Specificity            0.9974   0.9957   0.9946   0.9985   0.9998
## Pos Pred Value         0.9936   0.9821   0.9747   0.9924   0.9989
## Neg Pred Value         0.9994   0.9954   0.9978   0.9956   0.9980
## Prevalence             0.2845   0.1935   0.1743   0.1639   0.1837
## Detection Rate         0.2841   0.1898   0.1725   0.1603   0.1821
## Detection Prevalence   0.2859   0.1933   0.1770   0.1615   0.1823
## Balanced Accuracy      0.9980   0.9884   0.9920   0.9881   0.9954

Decision Trees.

This is the model of the final model from the GBM model, and the control parameters.

fit_tree <- rpart(classe ~ ., data=train_set, method="class")
fit_tree$control

## $minsplit
## [1] 20
## 
## $minbucket
## [1] 7
## 
## $cp
## [1] 0.01
## 
## $maxcompete
## [1] 4
## 
## $maxsurrogate
## [1] 5
## 
## $usesurrogate
## [1] 2
## 
## $surrogatestyle
## [1] 0
## 
## $maxdepth
## [1] 30
## 
## $xval
## [1] 10

Below, the prediction adjust to the test set and the confusion matrix from the Decision Tree Model.

predict_tree<- predict(fit_tree, newdata=test_set, type="class")
matrix_tree <- confusionMatrix(predict_tree, as.factor(test_set$classe))
matrix_tree

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 1260  206   42   94   74
##          B   37  536   69   24   95
##          C    7   40  665  109   48
##          D   78  131   57  543  104
##          E   13   36   22   34  580
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7308          
##                  95% CI : (0.7182, 0.7432)
##     No Information Rate : 0.2845          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.6574          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            0.9032   0.5648   0.7778   0.6754   0.6437
## Specificity            0.8814   0.9431   0.9496   0.9098   0.9738
## Pos Pred Value         0.7518   0.7043   0.7652   0.5947   0.8467
## Neg Pred Value         0.9582   0.9003   0.9529   0.9346   0.9239
## Prevalence             0.2845   0.1935   0.1743   0.1639   0.1837
## Detection Rate         0.2569   0.1093   0.1356   0.1107   0.1183
## Detection Prevalence   0.3418   0.1552   0.1772   0.1862   0.1397
## Balanced Accuracy      0.8923   0.7540   0.8637   0.7926   0.8087

fancyRpartPlot(fit_tree, main="Decision Tree Plot.")

Linear Discrimination Analysis (LDA)

control <- trainControl(method = "cv", number = 5)

fit_lda <- train(classe~., data=train_set, 
                 method="lda", metric="Accuracy", trControl=control)

This is the result of the final model from the GBM model.

fit_lda$finalModel

## Call:
## lda(x, grouping = y)
## 
## Prior probabilities of groups:
##         A         B         C         D         E 
## 0.2843457 0.1935046 0.1744123 0.1638810 0.1838565 
## 
## Group means:
##   new_windowyes num_window roll_belt  pitch_belt   yaw_belt total_accel_belt
## A    0.02054958   383.9221  60.07460  0.37210753 -11.266750         10.77276
## B    0.01896067   508.1647  65.29072  0.08350421 -13.466815         11.16573
## C    0.01830931   484.0374  64.91887 -0.80421114  -7.959264         11.19322
## D    0.02363184   428.8130  60.56257  2.00286070 -18.598661         11.22139
## E    0.02291205   369.4265  74.54440  0.75266445  -5.055514         12.72801
##   gyros_belt_x gyros_belt_y gyros_belt_z accel_belt_x accel_belt_y accel_belt_z
## A -0.005739546   0.04077419   -0.1231971    -6.281481     29.28029    -63.86547
## B -0.006551966   0.04267556   -0.1349649    -5.074438     32.08006    -73.80513
## C -0.015383716   0.03941956   -0.1357733    -4.314764     31.24854    -71.35021
## D -0.016314262   0.03687396   -0.1332960    -8.614842     30.38184    -68.87396
## E  0.014146341   0.03918699   -0.1236881    -4.382853     29.22875    -91.88655
##   magnet_belt_x magnet_belt_y magnet_belt_z  roll_arm  pitch_arm    yaw_arm
## A      57.85376      602.2686     -337.6791 -1.066858   3.396385 -12.095840
## B      49.07058      599.5945     -336.3606 31.343227  -6.248683   7.384916
## C      56.58083      599.8231     -337.7106 24.915442  -2.091395   4.569790
## D      48.07463      594.1318     -340.7554 23.567956 -10.575825   5.205228
## E      63.30118      567.4656     -378.9379 19.587701 -12.355148  -1.940739
##   total_accel_arm gyros_arm_x gyros_arm_y gyros_arm_z accel_arm_x accel_arm_y
## A        27.36320  0.04203584  -0.2305759   0.2683441  -133.19068    46.30036
## B        26.44031  0.04372893  -0.2923736   0.2713167   -42.18539    25.21243
## C        24.33970  0.10316323  -0.2680483   0.2768017   -75.33035    39.95676
## D        23.50124  0.04174129  -0.2543367   0.2671393    18.40672    24.97637
## E        24.40946  0.05500000  -0.2909719   0.2867738   -19.98189    16.09202
##   accel_arm_z magnet_arm_x magnet_arm_y magnet_arm_z roll_dumbbell
## A   -74.33501    -23.48602    237.31063     412.8760      21.67189
## B   -95.19803    235.09515    131.23244     197.2521      35.36759
## C   -54.35801    161.84885    186.62446     359.0935     -14.00015
## D   -48.61443    402.60738     94.60987     293.8495      50.79367
## E   -75.27901    323.65188     84.20140     221.1508      25.93793
##   pitch_dumbbell yaw_dumbbell total_accel_dumbbell gyros_dumbbell_x
## A     -18.587123     1.606308             14.65878        0.1215603
## B       2.872817    14.414811             14.43294        0.1648736
## C     -24.782856   -15.613114             12.83210        0.1927815
## D      -1.714069     2.638046             11.27405        0.2002446
## E      -7.530309     4.996935             14.32262        0.1413378
##   gyros_dumbbell_y gyros_dumbbell_z accel_dumbbell_x accel_dumbbell_y
## A       0.03604301      -0.07706093       -50.143130         52.66930
## B       0.01477177      -0.14757022        -1.236306         69.49965
## C       0.05625633      -0.15349825       -39.903000         29.70822
## D       0.01662106      -0.12817579       -21.966003         52.96144
## E       0.11931264      -0.15385809       -18.662602         54.37472
##   accel_dumbbell_z magnet_dumbbell_x magnet_dumbbell_y magnet_dumbbell_z
## A        -57.00550         -385.3157          217.0205           9.40779
## B        -16.54635         -248.3553          264.3789          47.43223
## C        -51.59564         -368.9626          157.5738          62.16595
## D        -32.24751         -313.1439          217.0514          57.40257
## E        -24.52033         -294.8673          240.6948          72.11789
##   roll_forearm pitch_forearm yaw_forearm total_accel_forearm gyros_forearm_x
## A     25.80915      -6.71838   23.357305            32.16129       0.1737252
## B     32.37202      14.79500   14.267303            35.36763       0.1388097
## C     58.91059      12.41519   38.250545            34.79042       0.2053448
## D     16.13575      28.21249    4.572396            36.11111       0.1239345
## E     38.37289      16.89978   11.474010            36.73762       0.1316149
##   gyros_forearm_y gyros_forearm_z accel_forearm_x accel_forearm_y
## A      0.14534050       0.1722389       -0.288172        172.0736
## B      0.10713132       0.1788588      -76.759831        136.6475
## C      0.02211531       0.1288274      -47.809895        212.9817
## D     -0.07008706       0.1047886     -154.168740        152.3068
## E      0.05138950       0.1509645      -73.373984        143.6559
##   accel_forearm_z magnet_forearm_x magnet_forearm_y magnet_forearm_z
## A       -60.40263        -196.3622         480.6511         411.5646
## B       -47.23947        -323.8343         278.8062         377.0312
## C       -60.56525        -335.9073         507.5730         460.9809
## D       -48.32960        -456.2251         317.1779         356.0817
## E       -57.50554        -330.7642         280.2882         349.3174
## 
## Coefficients of linear discriminants:
##                                LD1           LD2           LD3           LD4
## new_windowyes         1.410898e-02  0.0540335689 -0.1123579126 -1.287712e-01
## num_window            3.875566e-04 -0.0006921719  0.0017038579  4.558236e-05
## roll_belt             5.728992e-02  0.0958846156  0.0020966025  7.143647e-02
## pitch_belt            3.191313e-02  0.0124402762 -0.0697584063  1.124627e-02
## yaw_belt             -9.168320e-03  0.0009167361 -0.0099609902 -3.745603e-03
## total_accel_belt     -3.032154e-02 -0.0213058341 -0.2719956759 -1.779606e-01
## gyros_belt_x          7.827373e-01  0.1233781604  1.0011016630  3.582490e-01
## gyros_belt_y         -1.614869e+00 -2.1262117261 -0.9184486798  7.185556e-01
## gyros_belt_z          5.711864e-01  0.4754734818  0.4920069166 -6.017151e-01
## accel_belt_x         -1.671178e-03 -0.0017146653  0.0198718587  6.140797e-03
## accel_belt_y         -2.415415e-02 -0.0314962633  0.0513149856  7.567886e-03
## accel_belt_z          5.390848e-03  0.0274268185 -0.0103606718  1.638474e-02
## magnet_belt_x        -1.112164e-02  0.0029612523 -0.0207206682 -3.923765e-03
## magnet_belt_y        -2.264492e-02 -0.0081327120 -0.0008368685 -3.362297e-03
## magnet_belt_z         7.732861e-03 -0.0006912649  0.0113318387  3.221598e-03
## roll_arm              7.586419e-04  0.0001864800  0.0021328556  3.299443e-04
## pitch_arm            -3.342949e-03  0.0060646524  0.0057067414  1.540546e-03
## yaw_arm               1.269323e-03 -0.0009582550  0.0013385543 -1.165249e-03
## total_accel_arm       5.460354e-03 -0.0257422641 -0.0218413376 -1.738358e-02
## gyros_arm_x           1.315659e-01  0.0221614121 -0.0338702422  3.316230e-02
## gyros_arm_y           9.545561e-02 -0.0778275830 -0.0449007509  1.639741e-01
## gyros_arm_z          -1.442944e-01 -0.1283810464  0.0145724396  1.337536e-01
## accel_arm_x          -3.160178e-03 -0.0050909582 -0.0083688796 -2.026154e-03
## accel_arm_y          -3.366176e-03  0.0148460372 -0.0000681902  3.770973e-03
## accel_arm_z           1.003463e-02 -0.0025474502  0.0018974126 -7.191813e-03
## magnet_arm_x          1.185377e-04 -0.0002132317  0.0020993113  1.192414e-03
## magnet_arm_y         -1.181672e-03 -0.0051798393  0.0049534005  4.028490e-04
## magnet_arm_z         -3.789780e-03 -0.0019871402 -0.0056961671  2.070360e-03
## roll_dumbbell         2.555527e-03 -0.0041369185 -0.0030878286 -7.749850e-03
## pitch_dumbbell       -6.085364e-03 -0.0035968322 -0.0041259799 -4.488973e-03
## yaw_dumbbell         -7.715146e-03  0.0068360593 -0.0034305793 -3.781699e-03
## total_accel_dumbbell  7.053979e-02  0.0664972858 -0.0006281945  3.617539e-03
## gyros_dumbbell_x      2.656716e-01 -0.4794798548  0.1069648811  1.010587e-01
## gyros_dumbbell_y      2.299391e-01 -0.2761991296 -0.0411082721  1.899719e-01
## gyros_dumbbell_z      8.553272e-02 -0.3290458686  0.1072592291  9.902016e-02
## accel_dumbbell_x      1.272327e-02  0.0090672749  0.0008222507  6.514479e-03
## accel_dumbbell_y      1.724172e-03  0.0034017540  0.0026632472 -2.035604e-03
## accel_dumbbell_z      2.592574e-03  0.0021780761  0.0022402441  1.147031e-03
## magnet_dumbbell_x    -4.014102e-03 -0.0005656641  0.0036224924 -2.464291e-03
## magnet_dumbbell_y    -1.025281e-03  0.0020666193 -0.0006005302 -2.184794e-03
## magnet_dumbbell_z     1.301087e-02 -0.0096281693 -0.0018781249  9.603372e-03
## roll_forearm          1.609084e-03  0.0012436728  0.0002241800  1.045248e-03
## pitch_forearm         1.608954e-02 -0.0131477651  0.0056129195  4.140853e-04
## yaw_forearm          -5.370017e-05  0.0008515176  0.0008448289  1.003078e-03
## total_accel_forearm   3.284034e-02  0.0073544020 -0.0066639277  1.906764e-03
## gyros_forearm_x      -3.193387e-02 -0.0856981261  0.2085450940  1.544631e-01
## gyros_forearm_y      -1.712144e-02 -0.0266365309  0.0209540403  8.358760e-03
## gyros_forearm_z       7.621466e-02  0.1392370025 -0.0520512547 -6.530117e-02
## accel_forearm_x       3.627254e-03  0.0109065884  0.0002624948  3.678135e-03
## accel_forearm_y       6.662715e-04 -0.0007770403 -0.0007328203 -2.051744e-03
## accel_forearm_z      -7.026805e-03  0.0027573026  0.0038478633 -4.455119e-03
## magnet_forearm_x     -1.824348e-03 -0.0036244001  0.0000687956 -1.117217e-03
## magnet_forearm_y     -9.707870e-04 -0.0016079266  0.0003647398  3.981076e-04
## magnet_forearm_z     -1.118505e-04 -0.0014203321 -0.0003700630  1.208097e-03
## 
## Proportion of trace:
##    LD1    LD2    LD3    LD4 
## 0.4792 0.2360 0.1731 0.1117

Below, the prediction adjust to the test set and the confusion matrix from the LDA Model.

predict_lda <- predict(fit_lda, newdata=test_set)
matrix_lda<- confusionMatrix(predict_lda, as.factor(test_set$classe))
matrix_lda

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 1165  132   76   50   39
##          B   44  625   85   33  131
##          C   83  109  577  117   84
##          D   98   37   87  574   80
##          E    5   46   30   30  567
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7153          
##                  95% CI : (0.7025, 0.7279)
##     No Information Rate : 0.2845          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.6396          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            0.8351   0.6586   0.6749   0.7139   0.6293
## Specificity            0.9154   0.9259   0.9029   0.9263   0.9723
## Pos Pred Value         0.7969   0.6808   0.5948   0.6553   0.8363
## Neg Pred Value         0.9332   0.9187   0.9293   0.9429   0.9210
## Prevalence             0.2845   0.1935   0.1743   0.1639   0.1837
## Detection Rate         0.2376   0.1274   0.1177   0.1170   0.1156
## Detection Prevalence   0.2981   0.1872   0.1978   0.1786   0.1383
## Balanced Accuracy      0.8752   0.7923   0.7889   0.8201   0.8008

K- Nearest Neighbors (KNN)

fit_knn <- train(classe~., data=train_set, method="knn", metric="Accuracy", 
                 trControl=control)

Below, the prediction adjust to the test set and the confusion matrix from the KNN Model.

predict_knn <- predict(fit_knn, newdata=test_set)
matrix_knn <- confusionMatrix(predict_knn, as.factor(test_set$classe))
matrix_knn

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 1347   35    8   18    8
##          B   12  828   22    7   23
##          C    7   44  805   62   11
##          D   24   30   18  706   29
##          E    5   12    2   11  830
## 
## Overall Statistics
##                                          
##                Accuracy : 0.9209         
##                  95% CI : (0.913, 0.9283)
##     No Information Rate : 0.2845         
##     P-Value [Acc > NIR] : < 2.2e-16      
##                                          
##                   Kappa : 0.8999         
##                                          
##  Mcnemar's Test P-Value : 2.438e-12      
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            0.9656   0.8725   0.9415   0.8781   0.9212
## Specificity            0.9803   0.9838   0.9694   0.9754   0.9925
## Pos Pred Value         0.9513   0.9283   0.8665   0.8748   0.9651
## Neg Pred Value         0.9862   0.9698   0.9874   0.9761   0.9824
## Prevalence             0.2845   0.1935   0.1743   0.1639   0.1837
## Detection Rate         0.2747   0.1688   0.1642   0.1440   0.1692
## Detection Prevalence   0.2887   0.1819   0.1894   0.1646   0.1754
## Balanced Accuracy      0.9730   0.9282   0.9554   0.9267   0.9569

plot(fit_knn, main="KNN model Plot.")

Performance of the models.

The model with the best performance, by accuracy metric is the Gradient Boosted Model.

perform_acuracy <- c(round(matrix_lda$overall[[1]], 4),
             round(matrix_knn$overall[[1]], 4),
             round(matrix_gbm$overall[[1]], 4),
             round(matrix_rf$overall[[1]], 4),
             round(matrix_tree$overall[[1]], 4))
model_names<-c('Linear Discrimination Analysis (LDA)', 
                         'K- Nearest Neighbors (KNN)',
                         'Gradient Boosting (GBM)',
                         'Random Forest (RF)',
                         'Decision Tree')

(accuracies<-as.data.frame(cbind(model_names, perform_acuracy)))

##                            model_names perform_acuracy
## 1 Linear Discrimination Analysis (LDA)          0.7153
## 2           K- Nearest Neighbors (KNN)          0.9209
## 3              Gradient Boosting (GBM)          0.9888
## 4                   Random Forest (RF)          0.9969
## 5                        Decision Tree          0.7308

Applying the selected Model to the Test Data.

It easy to see in the table below, the labels for the test data.

predict_test_data <- predict(fit_gbm, newdata=test_data)
table(predict_test_data,test_data$problem_id)

##                  
## predict_test_data 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
##                 A 0 1 0 1 1 0 0 0 1  1  0  0  0  1  0  0  1  0  0  0
##                 B 1 0 1 0 0 0 0 1 0  0  1  0  1  0  0  0  0  1  1  1
##                 C 0 0 0 0 0 0 0 0 0  0  0  1  0  0  0  0  0  0  0  0
##                 D 0 0 0 0 0 0 1 0 0  0  0  0  0  0  0  0  0  0  0  0
##                 E 0 0 0 0 0 1 0 0 0  0  0  0  0  0  1  1  0  0  0  0