Are you doing right your exercises?

Abstract

This analysis corresponds to the Project Assignment for the Practical Machine Learning course of the Johns Hopkins Bloomberg School of Public Health Data Science Specialization at Coursera.

Using devices such as Jawbone Up, Nike FuelBand, and Fitbit it is now possible to collect a large amount of data about personal activity relatively inexpensively.

These type of devices are part of the quantified self movement - a group of enthusiasts who take measurements about themselves regularly to improve their health, to find patterns in their behavior, or because they are tech geeks.

One thing that people regularly do is quantify how much of a particular activity they do, but they rarely quantify how well they do it.

In this project, the goal is: using data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants, predict the manner in which they did the exercises. They were asked to perform barbell lifts correctly and incorrectly in 5 different ways: A - the correct way and B, C, D e E, four different wrong ways of do the exercise. This is the “classe” variable in the training set. It will be select any of the other variables to predict with.

More information is available from the website here: http://web.archive.org/web/20161224072740/http:/groupware.les.inf.puc-rio.br/har (see the section on the Weight Lifting Exercise Dataset) and if you use the document you create for this class, for any purpose, please cite them as they have been very generous in allowing their data to be used for this kind of assignment.

The training and test data for this project are available in this two url’s:

https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv

https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv

Data Processing

library(caret); library(rattle); library(rpart); library(rpart.plot)
library(randomForest); library(corrplot)
#Load the imported data from local 
trainRead<-read.csv("C:/Coursera/08_Practical_Machine_learning/pml-training.csv", na.strings=c("NA","#DIV/0!", ""))  
testRead<-read.csv("C:/Coursera/08_Practical_Machine_learning/pml-testing.csv", na.strings=c("NA","#DIV/0!", ""))
dim(trainRead);dim(testRead)

## [1] 19622   160

## [1]  20 160

#Once we want use the columns as predictors, we must eliminate all the columns that do not have information.
trainClean<-trainRead[,colSums(is.na(trainRead))==0]
testClean<-testRead[,colSums(is.na(testRead))==0]
#that reduces the columns to only 60 columns
dim(trainClean);dim(testClean)

## [1] 19622    60

## [1] 20 60

#Investigating the data we can see that the seven first columns have a sequencial number (the first)
#and variations of the timestamp that we are not using for this analysis so we will eliminate those columns remaining 53
trainOK<-trainClean[,-c(1:7)]
testOK<-testClean[,-c(1:7)]
dim(trainOK);dim(testOK)

## [1] 19622    53

## [1] 20 53

#And now we are with the Dataset to proceed the study  and will see if there are correlation among the variables used.
exerCorrmatrix<-cor(trainOK[sapply(trainOK, is.numeric)])  
png(file="C:/Coursera/08_Practical_Machine_learning/corrpng.png", res=96, width=1000, height=1000)  
corrplot(exerCorrmatrix,order="FPC", method="circle", tl.cex=0.45, tl.col="black", number.cex=0.25)  
title("Correlation Matrix of the variables used", line = 1)
dev.off()

## png 
##   2

###Create the datasets

inTrain<-createDataPartition(trainOK$classe, p=3/4, list=FALSE)
train<-trainOK[inTrain,]
valid<-trainOK[-inTrain,]

analysing the principal components, we got that 25 components are necessary to capture .95 of the variance. But it demands alot of machine processing so, we decided by a .80 thresh to capture 80% of the variance using 12 components

set.seed(2018)
PropPCA<-preProcess(train,method="pca", thresh=0.8)
PropPCA

## Created from 14718 samples and 53 variables
## 
## Pre-processing:
##   - centered (52)
##   - ignored (1)
##   - principal component signal extraction (52)
##   - scaled (52)
## 
## PCA needed 12 components to capture 80 percent of the variance

#create the preProc object, excluding the response (classe)
preProc  <- preProcess(train[,-53], 
                       method = "pca",
                       pcaComp = 12, thresh=0.8) 
#Apply the processing to the train and test data, and add the response 
#to the dataframes
train_pca <- predict(preProc, train[,-53])
train_pca$classe <- train$classe
#train_pca has only 12 principal components plus classe
valid_pca <- predict(preProc, valid[,-53])
valid_pca$classe <- valid$classe
#valid_pca has only 12 principal components plus classe

###**Choose algorithms to predict**
#####Two methods will be tested, gbm=Generalized Boosted Regression and rf=Random Forest  
### GBM  
fitControl<-trainControl(method="cv", number=5, allowParallel=TRUE)
fit_gbm<-train(classe ~., data=train_pca, method="gbm", trControl=fitControl)

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.0580
##      2        1.5733             nan     0.1000    0.0444
##      3        1.5457             nan     0.1000    0.0309
##      4        1.5254             nan     0.1000    0.0306
##      5        1.5061             nan     0.1000    0.0245
##      6        1.4912             nan     0.1000    0.0216
##      7        1.4780             nan     0.1000    0.0196
##      8        1.4659             nan     0.1000    0.0158
##      9        1.4558             nan     0.1000    0.0145
##     10        1.4465             nan     0.1000    0.0145
##     20        1.3735             nan     0.1000    0.0084
##     40        1.2847             nan     0.1000    0.0037
##     60        1.2285             nan     0.1000    0.0032
##     80        1.1863             nan     0.1000    0.0014
##    100        1.1536             nan     0.1000    0.0013
##    120        1.1264             nan     0.1000    0.0015
##    140        1.1038             nan     0.1000    0.0000
##    150        1.0928             nan     0.1000    0.0011
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.0877
##      2        1.5554             nan     0.1000    0.0729
##      3        1.5109             nan     0.1000    0.0550
##      4        1.4782             nan     0.1000    0.0377
##      5        1.4533             nan     0.1000    0.0359
##      6        1.4309             nan     0.1000    0.0351
##      7        1.4104             nan     0.1000    0.0293
##      8        1.3918             nan     0.1000    0.0245
##      9        1.3763             nan     0.1000    0.0239
##     10        1.3603             nan     0.1000    0.0197
##     20        1.2524             nan     0.1000    0.0116
##     40        1.1299             nan     0.1000    0.0083
##     60        1.0472             nan     0.1000    0.0042
##     80        0.9869             nan     0.1000    0.0038
##    100        0.9353             nan     0.1000    0.0014
##    120        0.8920             nan     0.1000    0.0016
##    140        0.8576             nan     0.1000    0.0008
##    150        0.8409             nan     0.1000    0.0029
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1096
##      2        1.5432             nan     0.1000    0.0800
##      3        1.4952             nan     0.1000    0.0577
##      4        1.4576             nan     0.1000    0.0576
##      5        1.4226             nan     0.1000    0.0515
##      6        1.3902             nan     0.1000    0.0422
##      7        1.3636             nan     0.1000    0.0379
##      8        1.3404             nan     0.1000    0.0286
##      9        1.3211             nan     0.1000    0.0278
##     10        1.3022             nan     0.1000    0.0293
##     20        1.1681             nan     0.1000    0.0135
##     40        1.0256             nan     0.1000    0.0060
##     60        0.9328             nan     0.1000    0.0038
##     80        0.8632             nan     0.1000    0.0012
##    100        0.8080             nan     0.1000    0.0032
##    120        0.7573             nan     0.1000    0.0022
##    140        0.7156             nan     0.1000    0.0018
##    150        0.6936             nan     0.1000    0.0021
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.0589
##      2        1.5727             nan     0.1000    0.0454
##      3        1.5443             nan     0.1000    0.0353
##      4        1.5224             nan     0.1000    0.0298
##      5        1.5042             nan     0.1000    0.0252
##      6        1.4888             nan     0.1000    0.0212
##      7        1.4754             nan     0.1000    0.0187
##      8        1.4635             nan     0.1000    0.0177
##      9        1.4526             nan     0.1000    0.0163
##     10        1.4422             nan     0.1000    0.0127
##     20        1.3694             nan     0.1000    0.0072
##     40        1.2814             nan     0.1000    0.0049
##     60        1.2234             nan     0.1000    0.0020
##     80        1.1815             nan     0.1000    0.0025
##    100        1.1476             nan     0.1000    0.0017
##    120        1.1211             nan     0.1000    0.0011
##    140        1.0984             nan     0.1000    0.0005
##    150        1.0890             nan     0.1000    0.0009
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.0899
##      2        1.5550             nan     0.1000    0.0647
##      3        1.5153             nan     0.1000    0.0611
##      4        1.4787             nan     0.1000    0.0388
##      5        1.4540             nan     0.1000    0.0380
##      6        1.4303             nan     0.1000    0.0296
##      7        1.4115             nan     0.1000    0.0336
##      8        1.3891             nan     0.1000    0.0271
##      9        1.3720             nan     0.1000    0.0260
##     10        1.3544             nan     0.1000    0.0194
##     20        1.2480             nan     0.1000    0.0112
##     40        1.1261             nan     0.1000    0.0088
##     60        1.0440             nan     0.1000    0.0045
##     80        0.9850             nan     0.1000    0.0027
##    100        0.9339             nan     0.1000    0.0019
##    120        0.8920             nan     0.1000    0.0033
##    140        0.8545             nan     0.1000    0.0011
##    150        0.8375             nan     0.1000    0.0013
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1136
##      2        1.5404             nan     0.1000    0.0923
##      3        1.4867             nan     0.1000    0.0615
##      4        1.4491             nan     0.1000    0.0496
##      5        1.4175             nan     0.1000    0.0499
##      6        1.3863             nan     0.1000    0.0358
##      7        1.3647             nan     0.1000    0.0365
##      8        1.3418             nan     0.1000    0.0359
##      9        1.3169             nan     0.1000    0.0325
##     10        1.2946             nan     0.1000    0.0223
##     20        1.1632             nan     0.1000    0.0118
##     40        1.0274             nan     0.1000    0.0061
##     60        0.9361             nan     0.1000    0.0064
##     80        0.8638             nan     0.1000    0.0028
##    100        0.8059             nan     0.1000    0.0022
##    120        0.7552             nan     0.1000    0.0019
##    140        0.7115             nan     0.1000    0.0011
##    150        0.6921             nan     0.1000    0.0015
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.0571
##      2        1.5729             nan     0.1000    0.0445
##      3        1.5446             nan     0.1000    0.0324
##      4        1.5249             nan     0.1000    0.0311
##      5        1.5056             nan     0.1000    0.0244
##      6        1.4897             nan     0.1000    0.0214
##      7        1.4764             nan     0.1000    0.0186
##      8        1.4649             nan     0.1000    0.0144
##      9        1.4554             nan     0.1000    0.0157
##     10        1.4453             nan     0.1000    0.0146
##     20        1.3730             nan     0.1000    0.0086
##     40        1.2847             nan     0.1000    0.0035
##     60        1.2286             nan     0.1000    0.0029
##     80        1.1862             nan     0.1000    0.0022
##    100        1.1531             nan     0.1000    0.0012
##    120        1.1254             nan     0.1000    0.0008
##    140        1.1032             nan     0.1000    0.0006
##    150        1.0933             nan     0.1000    0.0013
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.0880
##      2        1.5549             nan     0.1000    0.0600
##      3        1.5178             nan     0.1000    0.0587
##      4        1.4819             nan     0.1000    0.0407
##      5        1.4567             nan     0.1000    0.0378
##      6        1.4328             nan     0.1000    0.0326
##      7        1.4124             nan     0.1000    0.0271
##      8        1.3946             nan     0.1000    0.0261
##      9        1.3771             nan     0.1000    0.0232
##     10        1.3620             nan     0.1000    0.0235
##     20        1.2557             nan     0.1000    0.0107
##     40        1.1360             nan     0.1000    0.0057
##     60        1.0516             nan     0.1000    0.0018
##     80        0.9919             nan     0.1000    0.0023
##    100        0.9385             nan     0.1000    0.0021
##    120        0.8944             nan     0.1000    0.0010
##    140        0.8577             nan     0.1000    0.0019
##    150        0.8405             nan     0.1000    0.0010
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1110
##      2        1.5413             nan     0.1000    0.0805
##      3        1.4918             nan     0.1000    0.0596
##      4        1.4552             nan     0.1000    0.0543
##      5        1.4204             nan     0.1000    0.0406
##      6        1.3939             nan     0.1000    0.0426
##      7        1.3665             nan     0.1000    0.0405
##      8        1.3402             nan     0.1000    0.0306
##      9        1.3204             nan     0.1000    0.0252
##     10        1.3041             nan     0.1000    0.0294
##     20        1.1679             nan     0.1000    0.0143
##     40        1.0259             nan     0.1000    0.0046
##     60        0.9346             nan     0.1000    0.0064
##     80        0.8604             nan     0.1000    0.0018
##    100        0.8050             nan     0.1000    0.0042
##    120        0.7551             nan     0.1000    0.0022
##    140        0.7116             nan     0.1000    0.0019
##    150        0.6915             nan     0.1000    0.0016
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.0603
##      2        1.5728             nan     0.1000    0.0413
##      3        1.5464             nan     0.1000    0.0370
##      4        1.5233             nan     0.1000    0.0297
##      5        1.5042             nan     0.1000    0.0233
##      6        1.4893             nan     0.1000    0.0240
##      7        1.4748             nan     0.1000    0.0190
##      8        1.4630             nan     0.1000    0.0171
##      9        1.4514             nan     0.1000    0.0140
##     10        1.4428             nan     0.1000    0.0137
##     20        1.3713             nan     0.1000    0.0088
##     40        1.2839             nan     0.1000    0.0032
##     60        1.2261             nan     0.1000    0.0034
##     80        1.1866             nan     0.1000    0.0025
##    100        1.1533             nan     0.1000    0.0017
##    120        1.1247             nan     0.1000    0.0007
##    140        1.1012             nan     0.1000    0.0007
##    150        1.0908             nan     0.1000    0.0004
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.0895
##      2        1.5552             nan     0.1000    0.0618
##      3        1.5169             nan     0.1000    0.0501
##      4        1.4868             nan     0.1000    0.0468
##      5        1.4578             nan     0.1000    0.0396
##      6        1.4334             nan     0.1000    0.0371
##      7        1.4106             nan     0.1000    0.0282
##      8        1.3929             nan     0.1000    0.0243
##      9        1.3764             nan     0.1000    0.0259
##     10        1.3595             nan     0.1000    0.0187
##     20        1.2510             nan     0.1000    0.0138
##     40        1.1304             nan     0.1000    0.0047
##     60        1.0528             nan     0.1000    0.0036
##     80        0.9918             nan     0.1000    0.0033
##    100        0.9366             nan     0.1000    0.0021
##    120        0.8918             nan     0.1000    0.0028
##    140        0.8548             nan     0.1000    0.0016
##    150        0.8379             nan     0.1000    0.0015
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1080
##      2        1.5411             nan     0.1000    0.0760
##      3        1.4937             nan     0.1000    0.0586
##      4        1.4572             nan     0.1000    0.0539
##      5        1.4233             nan     0.1000    0.0504
##      6        1.3915             nan     0.1000    0.0402
##      7        1.3651             nan     0.1000    0.0374
##      8        1.3422             nan     0.1000    0.0353
##      9        1.3205             nan     0.1000    0.0304
##     10        1.3011             nan     0.1000    0.0280
##     20        1.1682             nan     0.1000    0.0143
##     40        1.0246             nan     0.1000    0.0059
##     60        0.9311             nan     0.1000    0.0046
##     80        0.8597             nan     0.1000    0.0043
##    100        0.8021             nan     0.1000    0.0030
##    120        0.7519             nan     0.1000    0.0016
##    140        0.7097             nan     0.1000    0.0018
##    150        0.6879             nan     0.1000    0.0008
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.0554
##      2        1.5753             nan     0.1000    0.0406
##      3        1.5502             nan     0.1000    0.0390
##      4        1.5264             nan     0.1000    0.0302
##      5        1.5079             nan     0.1000    0.0241
##      6        1.4916             nan     0.1000    0.0213
##      7        1.4786             nan     0.1000    0.0195
##      8        1.4661             nan     0.1000    0.0163
##      9        1.4553             nan     0.1000    0.0154
##     10        1.4456             nan     0.1000    0.0134
##     20        1.3738             nan     0.1000    0.0073
##     40        1.2855             nan     0.1000    0.0051
##     60        1.2286             nan     0.1000    0.0024
##     80        1.1855             nan     0.1000    0.0022
##    100        1.1516             nan     0.1000    0.0004
##    120        1.1237             nan     0.1000    0.0014
##    140        1.1000             nan     0.1000    0.0002
##    150        1.0897             nan     0.1000    0.0004
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.0953
##      2        1.5511             nan     0.1000    0.0653
##      3        1.5133             nan     0.1000    0.0476
##      4        1.4836             nan     0.1000    0.0468
##      5        1.4558             nan     0.1000    0.0397
##      6        1.4318             nan     0.1000    0.0330
##      7        1.4117             nan     0.1000    0.0289
##      8        1.3921             nan     0.1000    0.0251
##      9        1.3763             nan     0.1000    0.0220
##     10        1.3614             nan     0.1000    0.0241
##     20        1.2499             nan     0.1000    0.0142
##     40        1.1296             nan     0.1000    0.0048
##     60        1.0523             nan     0.1000    0.0046
##     80        0.9914             nan     0.1000    0.0023
##    100        0.9416             nan     0.1000    0.0024
##    120        0.8975             nan     0.1000    0.0015
##    140        0.8591             nan     0.1000    0.0014
##    150        0.8424             nan     0.1000    0.0012
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1069
##      2        1.5418             nan     0.1000    0.0904
##      3        1.4879             nan     0.1000    0.0671
##      4        1.4472             nan     0.1000    0.0484
##      5        1.4175             nan     0.1000    0.0454
##      6        1.3893             nan     0.1000    0.0381
##      7        1.3646             nan     0.1000    0.0391
##      8        1.3407             nan     0.1000    0.0338
##      9        1.3184             nan     0.1000    0.0321
##     10        1.2984             nan     0.1000    0.0277
##     20        1.1659             nan     0.1000    0.0144
##     40        1.0240             nan     0.1000    0.0062
##     60        0.9328             nan     0.1000    0.0029
##     80        0.8627             nan     0.1000    0.0048
##    100        0.8049             nan     0.1000    0.0027
##    120        0.7576             nan     0.1000    0.0027
##    140        0.7148             nan     0.1000    0.0015
##    150        0.6957             nan     0.1000    0.0015
## 
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.6094             nan     0.1000    0.1111
##      2        1.5427             nan     0.1000    0.0799
##      3        1.4941             nan     0.1000    0.0702
##      4        1.4518             nan     0.1000    0.0488
##      5        1.4219             nan     0.1000    0.0481
##      6        1.3922             nan     0.1000    0.0391
##      7        1.3681             nan     0.1000    0.0359
##      8        1.3455             nan     0.1000    0.0327
##      9        1.3242             nan     0.1000    0.0285
##     10        1.3061             nan     0.1000    0.0298
##     20        1.1709             nan     0.1000    0.0135
##     40        1.0337             nan     0.1000    0.0090
##     60        0.9384             nan     0.1000    0.0040
##     80        0.8695             nan     0.1000    0.0032
##    100        0.8102             nan     0.1000    0.0019
##    120        0.7613             nan     0.1000    0.0038
##    140        0.7182             nan     0.1000    0.0015
##    150        0.6992             nan     0.1000    0.0015

print(fit_gbm, digits=4)

## Stochastic Gradient Boosting 
## 
## 14718 samples
##    12 predictor
##     5 classes: 'A', 'B', 'C', 'D', 'E' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 11774, 11774, 11775, 11775, 11774 
## Resampling results across tuning parameters:
## 
##   interaction.depth  n.trees  Accuracy  Kappa 
##   1                   50      0.5069    0.3660
##   1                  100      0.5590    0.4373
##   1                  150      0.5798    0.4652
##   2                   50      0.6087    0.5026
##   2                  100      0.6694    0.5804
##   2                  150      0.7008    0.6207
##   3                   50      0.6540    0.5609
##   3                  100      0.7199    0.6449
##   3                  150      0.7550    0.6896
## 
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
## 
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
## Accuracy was used to select the optimal model using  the largest value.
## The final values used for the model were n.trees = 150,
##  interaction.depth = 3, shrinkage = 0.1 and n.minobsinnode = 10.

predict_gbm<-predict(fit_gbm,valid_pca)  
(conf_gbm<-confusionMatrix(valid_pca$classe, predict_gbm))

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 1180   56   71   70   18
##          B  109  659  103   44   34
##          C   77   57  674   28   19
##          D   52   43   75  604   30
##          E   59   65   74   43  660
## 
## Overall Statistics
##                                           
##                Accuracy : 0.7702          
##                  95% CI : (0.7582, 0.7819)
##     No Information Rate : 0.3012          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.7088          
##  Mcnemar's Test P-Value : < 2.2e-16       
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            0.7989   0.7489   0.6760   0.7655   0.8673
## Specificity            0.9373   0.9279   0.9537   0.9514   0.9418
## Pos Pred Value         0.8459   0.6944   0.7883   0.7512   0.7325
## Neg Pred Value         0.9154   0.9441   0.9202   0.9549   0.9748
## Prevalence             0.3012   0.1794   0.2033   0.1609   0.1552
## Detection Rate         0.2406   0.1344   0.1374   0.1232   0.1346
## Detection Prevalence   0.2845   0.1935   0.1743   0.1639   0.1837
## Balanced Accuracy      0.8681   0.8384   0.8149   0.8585   0.9046

(accuracy_gbm<-conf_gbm$overall['Accuracy'])

##  Accuracy 
## 0.7701876

###rf
fitControl<-trainControl(method="cv", number=5, allowParallel=TRUE)
fit_rf<-train(classe ~., data=train_pca, method="rf", trControl=fitControl)
print(fit_rf, digits=4)

## Random Forest 
## 
## 14718 samples
##    12 predictor
##     5 classes: 'A', 'B', 'C', 'D', 'E' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 11775, 11775, 11774, 11775, 11773 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy  Kappa 
##    2    0.9543    0.9422
##    7    0.9471    0.9331
##   12    0.9403    0.9246
## 
## Accuracy was used to select the optimal model using  the largest value.
## The final value used for the model was mtry = 2.

predict_rf<-predict(fit_rf,valid_pca)  
(conf_rf<-confusionMatrix(valid_pca$classe, predict_rf))

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 1361    8   16    7    3
##          B   25  898   23    2    1
##          C   10   14  818   11    2
##          D    4    8   30  756    6
##          E    2    6    4    5  884
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9619          
##                  95% CI : (0.9561, 0.9671)
##     No Information Rate : 0.2859          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9518          
##  Mcnemar's Test P-Value : 0.0008301       
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            0.9708   0.9615   0.9181   0.9680   0.9866
## Specificity            0.9903   0.9872   0.9908   0.9884   0.9958
## Pos Pred Value         0.9756   0.9463   0.9567   0.9403   0.9811
## Neg Pred Value         0.9883   0.9909   0.9820   0.9939   0.9970
## Prevalence             0.2859   0.1905   0.1817   0.1593   0.1827
## Detection Rate         0.2775   0.1831   0.1668   0.1542   0.1803
## Detection Prevalence   0.2845   0.1935   0.1743   0.1639   0.1837
## Balanced Accuracy      0.9805   0.9743   0.9544   0.9782   0.9912

(accuracy_rf<-conf_rf$overall['Accuracy'])

##  Accuracy 
## 0.9618679

We can now say that for this dataset, random forest method is better than Generalized Boosted Regression and the accuracy obtained would be greather than 0.95

Results - Prediction on Testing Set

Applying the Random Forest to predict the outcome variable classe for the test set

test_pca <- predict(preProc, testOK[,-53])
test_pca$problem_id <- testOK$problem_id
(predict(fit_rf, test_pca))

##  [1] B A A A A E D B A A B C B A E E A B B B
## Levels: A B C D E

with those 20 predictions we conclude the Course Project

If you have one of those devices, to get your data, try to do the exercises and discover if you are doing them in the right way.

References

Ugulino, W.; Cardador, D.; Vega, K.; Velloso, E.; Milidiu, R.; Fuks, H. Wearable Computing: Accelerometers’ Data Classification of Body Postures and Movements. Proceedings of 21st Brazilian Symposium on Artificial Intelligence. Advances in Artificial Intelligence - SBIA 2012. In: Lecture Notes in Computer Science. , pp. 52-61. Curitiba, PR: Springer Berlin / Heidelberg, 2012. ISBN 978-3-642-34458-9. DOI: 10.1007/978-3-642-34459-6_6. Cited by 2 (Google Scholar)

Are you doing right your exercises?

CWerneck - Claudia Werneck

january, 25, 2018

Abstract

This analysis corresponds to the Project Assignment for the Practical Machine Learning course of the Johns Hopkins Bloomberg School of Public Health Data Science Specialization at Coursera.

Using devices such as Jawbone Up, Nike FuelBand, and Fitbit it is now possible to collect a large amount of data about personal activity relatively inexpensively.

These type of devices are part of the quantified self movement - a group of enthusiasts who take measurements about themselves regularly to improve their health, to find patterns in their behavior, or because they are tech geeks.

One thing that people regularly do is quantify how much of a particular activity they do, but they rarely quantify how well they do it.

The training and test data for this project are available in this two url’s:

https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv

https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv

Data Processing

analysing the principal components, we got that 25 components are necessary to capture .95 of the variance. But it demands alot of machine processing so, we decided by a .80 thresh to capture 80% of the variance using 12 components

We can now say that for this dataset, random forest method is better than Generalized Boosted Regression and the accuracy obtained would be greather than 0.95

Results - Prediction on Testing Set

Applying the Random Forest to predict the outcome variable classe for the test set

with those 20 predictions we conclude the Course Project

If you have one of those devices, to get your data, try to do the exercises and discover if you are doing them in the right way.

References

Thanks for reading!