This document aims to show the manner in which 6 participants performed some exercise.
Using devices such as Jawbone Up, Nike FuelBand, and Fitbit it is now possible to collect a large amount of data about personal activity relatively inexpensively. These type of devices are part of the quantified self movement – a group of enthusiasts who take measurements about themselves regularly to improve their health, to find patterns in their behavior, or because they are tech geeks. One thing that people regularly do is quantify how much of a particular activity they do, but they rarely quantify how well they do it. In this project, your goal will be to use data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants. They were asked to perform barbell lifts correctly and incorrectly in 5 different ways. More information is available from the website here: http://groupware.les.inf.puc-rio.br/har (see the section on the Weight Lifting Exercise Dataset).
The data for this project come from this source.
The training and test data for this project are available here:
The goal of this project is to predict the manner in which they did the exercise. This is the “classe” variable in the training set. It may use any of the other variables to predict with. Must create a report describing how was builded the model, how was used cross validation, what was thought the expected out of sample error is, and why was made the choices made. It will also use the prediction model to predict 20 different test cases.
library(data.table)
library(caret)
library(dplyr)
library(ggthemes)
library(corrplot)
library(RColorBrewer)
library(rpart)
library(rattle)
library(gbm)
library(MASS)
Download and check the files will be in the specify work directory.
setwd("C:/Users/aleja/Documents/Cursos/Coursera R pratices/Prediction Assignment Writeup")
train_url<-"https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv"
test_url<-"https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv"
ifelse(!dir.exists(file.path(getwd(), "Data")),
dir.create(file.path(getwd(), "Data")), FALSE)
Download the databases and save them in the correct file.
download.file(url = train_url, destfile = file.path("./Data", "self_movement_train_data.csv"),
method = "curl")
download.file(url = test_url, destfile = file.path("./Data", "self_movement_test_data.csv"),
method = "curl")
Verify that the databases were downloaded correctly.
list.files("./Data")
## [1] "self_movement_test_data.csv" "self_movement_train_data.csv"
It will always be necessary to organize and arrange the database with which to work.
fread("./Data/self_movement_train_data.csv")->train_df
fread("./Data/self_movement_test_data.csv")->test_df
The train and test files have a lot of columns that contains only NA’s, thus, these columns will be dispensed.
str(train_df)
## Classes 'data.table' and 'data.frame': 19622 obs. of 160 variables:
## $ V1 : int 1 2 3 4 5 6 7 8 9 10 ...
## $ user_name : chr "carlitos" "carlitos" "carlitos" "carlitos" ...
## $ raw_timestamp_part_1 : int 1323084231 1323084231 1323084231 1323084232 1323084232 1323084232 1323084232 1323084232 1323084232 1323084232 ...
## $ raw_timestamp_part_2 : int 788290 808298 820366 120339 196328 304277 368296 440390 484323 484434 ...
## $ cvtd_timestamp : chr "05/12/2011 11:23" "05/12/2011 11:23" "05/12/2011 11:23" "05/12/2011 11:23" ...
## $ new_window : chr "no" "no" "no" "no" ...
## $ num_window : int 11 11 11 12 12 12 12 12 12 12 ...
## $ roll_belt : num 1.41 1.41 1.42 1.48 1.48 1.45 1.42 1.42 1.43 1.45 ...
## $ pitch_belt : num 8.07 8.07 8.07 8.05 8.07 8.06 8.09 8.13 8.16 8.17 ...
## $ yaw_belt : num -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 ...
## $ total_accel_belt : int 3 3 3 3 3 3 3 3 3 3 ...
## $ kurtosis_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ kurtosis_picth_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ kurtosis_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_roll_belt.1 : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_picth_belt : int NA NA NA NA NA NA NA NA NA NA ...
## $ max_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_pitch_belt : int NA NA NA NA NA NA NA NA NA NA ...
## $ min_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_pitch_belt : int NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_total_accel_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_pitch_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_pitch_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_pitch_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ gyros_belt_x : num 0 0.02 0 0.02 0.02 0.02 0.02 0.02 0.02 0.03 ...
## $ gyros_belt_y : num 0 0 0 0 0.02 0 0 0 0 0 ...
## $ gyros_belt_z : num -0.02 -0.02 -0.02 -0.03 -0.02 -0.02 -0.02 -0.02 -0.02 0 ...
## $ accel_belt_x : int -21 -22 -20 -22 -21 -21 -22 -22 -20 -21 ...
## $ accel_belt_y : int 4 4 5 3 2 4 3 4 2 4 ...
## $ accel_belt_z : int 22 22 23 21 24 21 21 21 24 22 ...
## $ magnet_belt_x : int -3 -7 -2 -6 -6 0 -4 -2 1 -3 ...
## $ magnet_belt_y : int 599 608 600 604 600 603 599 603 602 609 ...
## $ magnet_belt_z : int -313 -311 -305 -310 -302 -312 -311 -313 -312 -308 ...
## $ roll_arm : num -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 ...
## $ pitch_arm : num 22.5 22.5 22.5 22.1 22.1 22 21.9 21.8 21.7 21.6 ...
## $ yaw_arm : num -161 -161 -161 -161 -161 -161 -161 -161 -161 -161 ...
## $ total_accel_arm : int 34 34 34 34 34 34 34 34 34 34 ...
## $ var_accel_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_yaw_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_yaw_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_yaw_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ gyros_arm_x : num 0 0.02 0.02 0.02 0 0.02 0 0.02 0.02 0.02 ...
## $ gyros_arm_y : num 0 -0.02 -0.02 -0.03 -0.03 -0.03 -0.03 -0.02 -0.03 -0.03 ...
## $ gyros_arm_z : num -0.02 -0.02 -0.02 0.02 0 0 0 0 -0.02 -0.02 ...
## $ accel_arm_x : int -288 -290 -289 -289 -289 -289 -289 -289 -288 -288 ...
## $ accel_arm_y : int 109 110 110 111 111 111 111 111 109 110 ...
## $ accel_arm_z : int -123 -125 -126 -123 -123 -122 -125 -124 -122 -124 ...
## $ magnet_arm_x : int -368 -369 -368 -372 -374 -369 -373 -372 -369 -376 ...
## $ magnet_arm_y : int 337 337 344 344 337 342 336 338 341 334 ...
## $ magnet_arm_z : int 516 513 513 512 506 513 509 510 518 516 ...
## $ kurtosis_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ kurtosis_picth_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ kurtosis_yaw_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_yaw_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_picth_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_yaw_arm : int NA NA NA NA NA NA NA NA NA NA ...
## $ min_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_yaw_arm : int NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_yaw_arm : int NA NA NA NA NA NA NA NA NA NA ...
## $ roll_dumbbell : num 13.1 13.1 12.9 13.4 13.4 ...
## $ pitch_dumbbell : num -70.5 -70.6 -70.3 -70.4 -70.4 ...
## $ yaw_dumbbell : num -84.9 -84.7 -85.1 -84.9 -84.9 ...
## $ kurtosis_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ kurtosis_picth_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ kurtosis_yaw_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_pitch_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_yaw_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_picth_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_yaw_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_pitch_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_yaw_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## [list output truncated]
## - attr(*, ".internal.selfref")=<externalptr>
cat("Amount of NA's in train set", sum(is.na(train_df)==TRUE), sep = "\n")
## Amount of NA's in train set
## 1925102
cat("Amount of NA's in test set",sum(is.na(test_df)==TRUE), sep = "\n")
## Amount of NA's in test set
## 2000
*For removing the columns with NA’s, it is use “dplyr”. Also, it will be remove the first 5 columns, since they do not contain relevant information.
train_df[,-c(1:5)] %>%
select_if(~ !any(is.na(.)))->train_data
test_df[,-c(1:5)] %>%
select_if(~ !any(is.na(.)))->test_data
cat("Amount of NA's in new train set", sum(is.na(train_data)==TRUE), sep = "\n")
## Amount of NA's in new train set
## 0
cat("Amount of NA's in new test set",sum(is.na(test_data)==TRUE), sep = "\n")
## Amount of NA's in new test set
## 0
Now, the dimensions of the remain datasets are:
cat("Dims. of training set", dim(train_data))
## Dims. of training set 19622 55
cat("Dims. of test set", dim(test_data))
## Dims. of test set 20 55
There is only 55 coumns remaining, which contains the majority of relevant information for the analysis.
In the plot below, it is easy to see which are the most common classes.
g1 <- ggplot(data = train_data, aes(x=as.factor(train_data$classe)))
g1 + geom_bar(fill="firebrick3", colour="black")+theme_stata()+
ylab("Frequency") + xlab("Classes") + ggtitle("Frequency of different classes")
In the correlation plot below, it can be seen the variables that have more correlation between them, neither is positive or negative relation. The corplot is ordered for the first principal components.
corM <- cor(train_data[, -c(1,55)])
corrplot(corM, order = "FPC", method = "circle", type = "upper",
tl.cex = 0.7, tl.col="black", col=brewer.pal(n=8, name="RdBu"))
In this project, predictive analysis will be performed with three models widely used today:
*Random Forests.
*Decision Tree.
*Generalized Boosted Model.
To perform the models, we need to divide our training data into train and test set. For this task, was used the “caret” package. For more information about this package, visit the link.
partition <- createDataPartition(train_data$classe, p=0.75, list=FALSE)
train_set <- train_data[partition, ]
test_set <- train_data[-partition, ]
control_rf <- trainControl(method="cv", number=4, verboseIter=FALSE)
fit_rf <- train(classe ~ ., data=train_set, method="rf",
trControl=control_rf)
This is the result of the final model from the Random Forest model.
fit_rf$finalModel
##
## Call:
## randomForest(x = x, y = y, mtry = param$mtry)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 28
##
## OOB estimate of error rate: 0.2%
## Confusion matrix:
## A B C D E class.error
## A 4183 1 0 0 1 0.0004778973
## B 5 2840 3 0 0 0.0028089888
## C 0 6 2561 0 0 0.0023373588
## D 0 0 11 2401 0 0.0045605307
## E 0 0 0 2 2704 0.0007390983
Below, the prediction adjust to the test set and the confusion matrix from the Random Forest Model.
predict_rf<- predict(fit_rf, newdata=test_set)
matrix_rf <- confusionMatrix(predict_rf, as.factor(test_set$classe))
matrix_rf
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 1395 2 0 0 0
## B 0 946 3 0 2
## C 0 1 852 6 0
## D 0 0 0 797 0
## E 0 0 0 1 899
##
## Overall Statistics
##
## Accuracy : 0.9969
## 95% CI : (0.995, 0.9983)
## No Information Rate : 0.2845
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9961
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 1.0000 0.9968 0.9965 0.9913 0.9978
## Specificity 0.9994 0.9987 0.9983 1.0000 0.9998
## Pos Pred Value 0.9986 0.9947 0.9919 1.0000 0.9989
## Neg Pred Value 1.0000 0.9992 0.9993 0.9983 0.9995
## Prevalence 0.2845 0.1935 0.1743 0.1639 0.1837
## Detection Rate 0.2845 0.1929 0.1737 0.1625 0.1833
## Detection Prevalence 0.2849 0.1939 0.1752 0.1625 0.1835
## Balanced Accuracy 0.9997 0.9978 0.9974 0.9956 0.9988
plot(fit_rf, main="Random Forest Plot.")
controlGBM <- trainControl(method = "cv", number = 5)
fit_gbm <- train(classe ~ ., data=train_set, method = "gbm",
trControl = controlGBM, verbose = FALSE)
This is the result of the final model from the GBM model.
fit_gbm$finalModel
## A gradient boosted model with multinomial loss function.
## 150 iterations were performed.
## There were 54 predictors of which 53 had non-zero influence.
Below, the prediction adjust to the test set and the confusion matrix from the GBM Model.
predict_gbm<- predict(fit_gbm, newdata=test_set)
matrix_gbm <- confusionMatrix(predict_gbm, as.factor(test_set$classe))
matrix_gbm
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 1393 9 0 0 0
## B 1 931 8 2 6
## C 0 7 846 15 0
## D 1 2 1 786 2
## E 0 0 0 1 893
##
## Overall Statistics
##
## Accuracy : 0.9888
## 95% CI : (0.9854, 0.9915)
## No Information Rate : 0.2845
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9858
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 0.9986 0.9810 0.9895 0.9776 0.9911
## Specificity 0.9974 0.9957 0.9946 0.9985 0.9998
## Pos Pred Value 0.9936 0.9821 0.9747 0.9924 0.9989
## Neg Pred Value 0.9994 0.9954 0.9978 0.9956 0.9980
## Prevalence 0.2845 0.1935 0.1743 0.1639 0.1837
## Detection Rate 0.2841 0.1898 0.1725 0.1603 0.1821
## Detection Prevalence 0.2859 0.1933 0.1770 0.1615 0.1823
## Balanced Accuracy 0.9980 0.9884 0.9920 0.9881 0.9954
This is the model of the final model from the GBM model, and the control parameters.
fit_tree <- rpart(classe ~ ., data=train_set, method="class")
fit_tree$control
## $minsplit
## [1] 20
##
## $minbucket
## [1] 7
##
## $cp
## [1] 0.01
##
## $maxcompete
## [1] 4
##
## $maxsurrogate
## [1] 5
##
## $usesurrogate
## [1] 2
##
## $surrogatestyle
## [1] 0
##
## $maxdepth
## [1] 30
##
## $xval
## [1] 10
Below, the prediction adjust to the test set and the confusion matrix from the Decision Tree Model.
predict_tree<- predict(fit_tree, newdata=test_set, type="class")
matrix_tree <- confusionMatrix(predict_tree, as.factor(test_set$classe))
matrix_tree
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 1260 206 42 94 74
## B 37 536 69 24 95
## C 7 40 665 109 48
## D 78 131 57 543 104
## E 13 36 22 34 580
##
## Overall Statistics
##
## Accuracy : 0.7308
## 95% CI : (0.7182, 0.7432)
## No Information Rate : 0.2845
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.6574
##
## Mcnemar's Test P-Value : < 2.2e-16
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 0.9032 0.5648 0.7778 0.6754 0.6437
## Specificity 0.8814 0.9431 0.9496 0.9098 0.9738
## Pos Pred Value 0.7518 0.7043 0.7652 0.5947 0.8467
## Neg Pred Value 0.9582 0.9003 0.9529 0.9346 0.9239
## Prevalence 0.2845 0.1935 0.1743 0.1639 0.1837
## Detection Rate 0.2569 0.1093 0.1356 0.1107 0.1183
## Detection Prevalence 0.3418 0.1552 0.1772 0.1862 0.1397
## Balanced Accuracy 0.8923 0.7540 0.8637 0.7926 0.8087
fancyRpartPlot(fit_tree, main="Decision Tree Plot.")
control <- trainControl(method = "cv", number = 5)
fit_lda <- train(classe~., data=train_set,
method="lda", metric="Accuracy", trControl=control)
This is the result of the final model from the GBM model.
fit_lda$finalModel
## Call:
## lda(x, grouping = y)
##
## Prior probabilities of groups:
## A B C D E
## 0.2843457 0.1935046 0.1744123 0.1638810 0.1838565
##
## Group means:
## new_windowyes num_window roll_belt pitch_belt yaw_belt total_accel_belt
## A 0.02054958 383.9221 60.07460 0.37210753 -11.266750 10.77276
## B 0.01896067 508.1647 65.29072 0.08350421 -13.466815 11.16573
## C 0.01830931 484.0374 64.91887 -0.80421114 -7.959264 11.19322
## D 0.02363184 428.8130 60.56257 2.00286070 -18.598661 11.22139
## E 0.02291205 369.4265 74.54440 0.75266445 -5.055514 12.72801
## gyros_belt_x gyros_belt_y gyros_belt_z accel_belt_x accel_belt_y accel_belt_z
## A -0.005739546 0.04077419 -0.1231971 -6.281481 29.28029 -63.86547
## B -0.006551966 0.04267556 -0.1349649 -5.074438 32.08006 -73.80513
## C -0.015383716 0.03941956 -0.1357733 -4.314764 31.24854 -71.35021
## D -0.016314262 0.03687396 -0.1332960 -8.614842 30.38184 -68.87396
## E 0.014146341 0.03918699 -0.1236881 -4.382853 29.22875 -91.88655
## magnet_belt_x magnet_belt_y magnet_belt_z roll_arm pitch_arm yaw_arm
## A 57.85376 602.2686 -337.6791 -1.066858 3.396385 -12.095840
## B 49.07058 599.5945 -336.3606 31.343227 -6.248683 7.384916
## C 56.58083 599.8231 -337.7106 24.915442 -2.091395 4.569790
## D 48.07463 594.1318 -340.7554 23.567956 -10.575825 5.205228
## E 63.30118 567.4656 -378.9379 19.587701 -12.355148 -1.940739
## total_accel_arm gyros_arm_x gyros_arm_y gyros_arm_z accel_arm_x accel_arm_y
## A 27.36320 0.04203584 -0.2305759 0.2683441 -133.19068 46.30036
## B 26.44031 0.04372893 -0.2923736 0.2713167 -42.18539 25.21243
## C 24.33970 0.10316323 -0.2680483 0.2768017 -75.33035 39.95676
## D 23.50124 0.04174129 -0.2543367 0.2671393 18.40672 24.97637
## E 24.40946 0.05500000 -0.2909719 0.2867738 -19.98189 16.09202
## accel_arm_z magnet_arm_x magnet_arm_y magnet_arm_z roll_dumbbell
## A -74.33501 -23.48602 237.31063 412.8760 21.67189
## B -95.19803 235.09515 131.23244 197.2521 35.36759
## C -54.35801 161.84885 186.62446 359.0935 -14.00015
## D -48.61443 402.60738 94.60987 293.8495 50.79367
## E -75.27901 323.65188 84.20140 221.1508 25.93793
## pitch_dumbbell yaw_dumbbell total_accel_dumbbell gyros_dumbbell_x
## A -18.587123 1.606308 14.65878 0.1215603
## B 2.872817 14.414811 14.43294 0.1648736
## C -24.782856 -15.613114 12.83210 0.1927815
## D -1.714069 2.638046 11.27405 0.2002446
## E -7.530309 4.996935 14.32262 0.1413378
## gyros_dumbbell_y gyros_dumbbell_z accel_dumbbell_x accel_dumbbell_y
## A 0.03604301 -0.07706093 -50.143130 52.66930
## B 0.01477177 -0.14757022 -1.236306 69.49965
## C 0.05625633 -0.15349825 -39.903000 29.70822
## D 0.01662106 -0.12817579 -21.966003 52.96144
## E 0.11931264 -0.15385809 -18.662602 54.37472
## accel_dumbbell_z magnet_dumbbell_x magnet_dumbbell_y magnet_dumbbell_z
## A -57.00550 -385.3157 217.0205 9.40779
## B -16.54635 -248.3553 264.3789 47.43223
## C -51.59564 -368.9626 157.5738 62.16595
## D -32.24751 -313.1439 217.0514 57.40257
## E -24.52033 -294.8673 240.6948 72.11789
## roll_forearm pitch_forearm yaw_forearm total_accel_forearm gyros_forearm_x
## A 25.80915 -6.71838 23.357305 32.16129 0.1737252
## B 32.37202 14.79500 14.267303 35.36763 0.1388097
## C 58.91059 12.41519 38.250545 34.79042 0.2053448
## D 16.13575 28.21249 4.572396 36.11111 0.1239345
## E 38.37289 16.89978 11.474010 36.73762 0.1316149
## gyros_forearm_y gyros_forearm_z accel_forearm_x accel_forearm_y
## A 0.14534050 0.1722389 -0.288172 172.0736
## B 0.10713132 0.1788588 -76.759831 136.6475
## C 0.02211531 0.1288274 -47.809895 212.9817
## D -0.07008706 0.1047886 -154.168740 152.3068
## E 0.05138950 0.1509645 -73.373984 143.6559
## accel_forearm_z magnet_forearm_x magnet_forearm_y magnet_forearm_z
## A -60.40263 -196.3622 480.6511 411.5646
## B -47.23947 -323.8343 278.8062 377.0312
## C -60.56525 -335.9073 507.5730 460.9809
## D -48.32960 -456.2251 317.1779 356.0817
## E -57.50554 -330.7642 280.2882 349.3174
##
## Coefficients of linear discriminants:
## LD1 LD2 LD3 LD4
## new_windowyes 1.410898e-02 0.0540335689 -0.1123579126 -1.287712e-01
## num_window 3.875566e-04 -0.0006921719 0.0017038579 4.558236e-05
## roll_belt 5.728992e-02 0.0958846156 0.0020966025 7.143647e-02
## pitch_belt 3.191313e-02 0.0124402762 -0.0697584063 1.124627e-02
## yaw_belt -9.168320e-03 0.0009167361 -0.0099609902 -3.745603e-03
## total_accel_belt -3.032154e-02 -0.0213058341 -0.2719956759 -1.779606e-01
## gyros_belt_x 7.827373e-01 0.1233781604 1.0011016630 3.582490e-01
## gyros_belt_y -1.614869e+00 -2.1262117261 -0.9184486798 7.185556e-01
## gyros_belt_z 5.711864e-01 0.4754734818 0.4920069166 -6.017151e-01
## accel_belt_x -1.671178e-03 -0.0017146653 0.0198718587 6.140797e-03
## accel_belt_y -2.415415e-02 -0.0314962633 0.0513149856 7.567886e-03
## accel_belt_z 5.390848e-03 0.0274268185 -0.0103606718 1.638474e-02
## magnet_belt_x -1.112164e-02 0.0029612523 -0.0207206682 -3.923765e-03
## magnet_belt_y -2.264492e-02 -0.0081327120 -0.0008368685 -3.362297e-03
## magnet_belt_z 7.732861e-03 -0.0006912649 0.0113318387 3.221598e-03
## roll_arm 7.586419e-04 0.0001864800 0.0021328556 3.299443e-04
## pitch_arm -3.342949e-03 0.0060646524 0.0057067414 1.540546e-03
## yaw_arm 1.269323e-03 -0.0009582550 0.0013385543 -1.165249e-03
## total_accel_arm 5.460354e-03 -0.0257422641 -0.0218413376 -1.738358e-02
## gyros_arm_x 1.315659e-01 0.0221614121 -0.0338702422 3.316230e-02
## gyros_arm_y 9.545561e-02 -0.0778275830 -0.0449007509 1.639741e-01
## gyros_arm_z -1.442944e-01 -0.1283810464 0.0145724396 1.337536e-01
## accel_arm_x -3.160178e-03 -0.0050909582 -0.0083688796 -2.026154e-03
## accel_arm_y -3.366176e-03 0.0148460372 -0.0000681902 3.770973e-03
## accel_arm_z 1.003463e-02 -0.0025474502 0.0018974126 -7.191813e-03
## magnet_arm_x 1.185377e-04 -0.0002132317 0.0020993113 1.192414e-03
## magnet_arm_y -1.181672e-03 -0.0051798393 0.0049534005 4.028490e-04
## magnet_arm_z -3.789780e-03 -0.0019871402 -0.0056961671 2.070360e-03
## roll_dumbbell 2.555527e-03 -0.0041369185 -0.0030878286 -7.749850e-03
## pitch_dumbbell -6.085364e-03 -0.0035968322 -0.0041259799 -4.488973e-03
## yaw_dumbbell -7.715146e-03 0.0068360593 -0.0034305793 -3.781699e-03
## total_accel_dumbbell 7.053979e-02 0.0664972858 -0.0006281945 3.617539e-03
## gyros_dumbbell_x 2.656716e-01 -0.4794798548 0.1069648811 1.010587e-01
## gyros_dumbbell_y 2.299391e-01 -0.2761991296 -0.0411082721 1.899719e-01
## gyros_dumbbell_z 8.553272e-02 -0.3290458686 0.1072592291 9.902016e-02
## accel_dumbbell_x 1.272327e-02 0.0090672749 0.0008222507 6.514479e-03
## accel_dumbbell_y 1.724172e-03 0.0034017540 0.0026632472 -2.035604e-03
## accel_dumbbell_z 2.592574e-03 0.0021780761 0.0022402441 1.147031e-03
## magnet_dumbbell_x -4.014102e-03 -0.0005656641 0.0036224924 -2.464291e-03
## magnet_dumbbell_y -1.025281e-03 0.0020666193 -0.0006005302 -2.184794e-03
## magnet_dumbbell_z 1.301087e-02 -0.0096281693 -0.0018781249 9.603372e-03
## roll_forearm 1.609084e-03 0.0012436728 0.0002241800 1.045248e-03
## pitch_forearm 1.608954e-02 -0.0131477651 0.0056129195 4.140853e-04
## yaw_forearm -5.370017e-05 0.0008515176 0.0008448289 1.003078e-03
## total_accel_forearm 3.284034e-02 0.0073544020 -0.0066639277 1.906764e-03
## gyros_forearm_x -3.193387e-02 -0.0856981261 0.2085450940 1.544631e-01
## gyros_forearm_y -1.712144e-02 -0.0266365309 0.0209540403 8.358760e-03
## gyros_forearm_z 7.621466e-02 0.1392370025 -0.0520512547 -6.530117e-02
## accel_forearm_x 3.627254e-03 0.0109065884 0.0002624948 3.678135e-03
## accel_forearm_y 6.662715e-04 -0.0007770403 -0.0007328203 -2.051744e-03
## accel_forearm_z -7.026805e-03 0.0027573026 0.0038478633 -4.455119e-03
## magnet_forearm_x -1.824348e-03 -0.0036244001 0.0000687956 -1.117217e-03
## magnet_forearm_y -9.707870e-04 -0.0016079266 0.0003647398 3.981076e-04
## magnet_forearm_z -1.118505e-04 -0.0014203321 -0.0003700630 1.208097e-03
##
## Proportion of trace:
## LD1 LD2 LD3 LD4
## 0.4792 0.2360 0.1731 0.1117
Below, the prediction adjust to the test set and the confusion matrix from the LDA Model.
predict_lda <- predict(fit_lda, newdata=test_set)
matrix_lda<- confusionMatrix(predict_lda, as.factor(test_set$classe))
matrix_lda
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 1165 132 76 50 39
## B 44 625 85 33 131
## C 83 109 577 117 84
## D 98 37 87 574 80
## E 5 46 30 30 567
##
## Overall Statistics
##
## Accuracy : 0.7153
## 95% CI : (0.7025, 0.7279)
## No Information Rate : 0.2845
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.6396
##
## Mcnemar's Test P-Value : < 2.2e-16
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 0.8351 0.6586 0.6749 0.7139 0.6293
## Specificity 0.9154 0.9259 0.9029 0.9263 0.9723
## Pos Pred Value 0.7969 0.6808 0.5948 0.6553 0.8363
## Neg Pred Value 0.9332 0.9187 0.9293 0.9429 0.9210
## Prevalence 0.2845 0.1935 0.1743 0.1639 0.1837
## Detection Rate 0.2376 0.1274 0.1177 0.1170 0.1156
## Detection Prevalence 0.2981 0.1872 0.1978 0.1786 0.1383
## Balanced Accuracy 0.8752 0.7923 0.7889 0.8201 0.8008
fit_knn <- train(classe~., data=train_set, method="knn", metric="Accuracy",
trControl=control)
Below, the prediction adjust to the test set and the confusion matrix from the KNN Model.
predict_knn <- predict(fit_knn, newdata=test_set)
matrix_knn <- confusionMatrix(predict_knn, as.factor(test_set$classe))
matrix_knn
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 1347 35 8 18 8
## B 12 828 22 7 23
## C 7 44 805 62 11
## D 24 30 18 706 29
## E 5 12 2 11 830
##
## Overall Statistics
##
## Accuracy : 0.9209
## 95% CI : (0.913, 0.9283)
## No Information Rate : 0.2845
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.8999
##
## Mcnemar's Test P-Value : 2.438e-12
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 0.9656 0.8725 0.9415 0.8781 0.9212
## Specificity 0.9803 0.9838 0.9694 0.9754 0.9925
## Pos Pred Value 0.9513 0.9283 0.8665 0.8748 0.9651
## Neg Pred Value 0.9862 0.9698 0.9874 0.9761 0.9824
## Prevalence 0.2845 0.1935 0.1743 0.1639 0.1837
## Detection Rate 0.2747 0.1688 0.1642 0.1440 0.1692
## Detection Prevalence 0.2887 0.1819 0.1894 0.1646 0.1754
## Balanced Accuracy 0.9730 0.9282 0.9554 0.9267 0.9569
plot(fit_knn, main="KNN model Plot.")
The model with the best performance, by accuracy metric is the Gradient Boosted Model.
perform_acuracy <- c(round(matrix_lda$overall[[1]], 4),
round(matrix_knn$overall[[1]], 4),
round(matrix_gbm$overall[[1]], 4),
round(matrix_rf$overall[[1]], 4),
round(matrix_tree$overall[[1]], 4))
model_names<-c('Linear Discrimination Analysis (LDA)',
'K- Nearest Neighbors (KNN)',
'Gradient Boosting (GBM)',
'Random Forest (RF)',
'Decision Tree')
(accuracies<-as.data.frame(cbind(model_names, perform_acuracy)))
## model_names perform_acuracy
## 1 Linear Discrimination Analysis (LDA) 0.7153
## 2 K- Nearest Neighbors (KNN) 0.9209
## 3 Gradient Boosting (GBM) 0.9888
## 4 Random Forest (RF) 0.9969
## 5 Decision Tree 0.7308
It easy to see in the table below, the labels for the test data.
predict_test_data <- predict(fit_gbm, newdata=test_data)
table(predict_test_data,test_data$problem_id)
##
## predict_test_data 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## A 0 1 0 1 1 0 0 0 1 1 0 0 0 1 0 0 1 0 0 0
## B 1 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 0 1 1 1
## C 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
## D 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## E 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0