Background

Using devices such as Jawbone Up, Nike FuelBand, and Fitbit it is now possible to collect a large amount of data about personal activity relatively inexpensively. These type of devices are part of the quantified self movement – a group of enthusiasts who take measurements about themselves regularly to improve their health, to find patterns in their behavior, or because they are tech geeks. One thing that people regularly do is quantify how much of a particular activity they do, but they rarely quantify how well they do it. In this project, your goal will be to use data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants. They were asked to perform barbell lifts correctly and incorrectly in 5 different ways. More information is available from the website here: http://groupware.les.inf.puc-rio.br/har (see the section on the Weight Lifting Exercise Dataset).

Data

The training data for this project are available here:

https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv

The test data are available here:

https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv

The data for this project come from this source: http://groupware.les.inf.puc-rio.br/har. If you use the document you create for this class for any purpose please cite them as they have been very generous in allowing their data to be used for this kind of assignment.

library(caret)
library(randomForest)
library(rpart) 
library(RColorBrewer)
library(e1071)

Loading files

trainUrl <- "Data/pml-training.csv"
testUrl <- "Data/pml-testing.csv"
training <- read.csv(trainUrl, na.strings=c("NA","#DIV/0!",""))
testing <- read.csv(testUrl, na.strings=c("NA","#DIV/0!","")) 

Cleaning files

trainingset<-training[,colSums(is.na(training)) == 0]
testingset.final<-testing
testingset.final <- testingset.final[,colSums(is.na(testingset.final)) == 0]
testingset.final <- testingset.final[,-c(1:7)]
trainingset   <- trainingset[,-c(1:7)]

Making Partitions

inTrain <- createDataPartition(y=trainingset$classe, p=0.75, list=FALSE)
myTraining <- trainingset[inTrain, ]
myTesting <- trainingset[-inTrain, ]
dim(myTraining)
## [1] 14718    53
dim(myTesting)
## [1] 4904   53

Description of the Dataset

In this graphic I show the distribution of the classes in the dataset.

plot(myTraining$classe, col="green", main="Plot of levels of variable classe", xlab="classe", ylab="Frequency")

We Can see that there is a mayority of individuals with class A, but the rest of the dataset is very istributed between the other classes.

Model Creation

I created severeal modeles to try to fit the data and then select the best ones for the a joint or individual prediction.

Random forests

model.rf <-train(classe~.,  data=myTraining, method="rf", type="class")
## Warning: model fit failed for Resample01: mtry= 2 Error : no se puede ubicar un vector de tamaño  112.3 Mb
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info =
## trainInfo, : There were missing values in resampled performance measures.
prediction2 <- predict(model.rf, myTesting)
confusionMatrix(prediction2, myTesting$classe)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 1395    7    0    0    0
##          B    0  941    4    0    1
##          C    0    1  849    7    3
##          D    0    0    2  796    2
##          E    0    0    0    1  895
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9943          
##                  95% CI : (0.9918, 0.9962)
##     No Information Rate : 0.2845          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9928          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            1.0000   0.9916   0.9930   0.9900   0.9933
## Specificity            0.9980   0.9987   0.9973   0.9990   0.9998
## Pos Pred Value         0.9950   0.9947   0.9872   0.9950   0.9989
## Neg Pred Value         1.0000   0.9980   0.9985   0.9981   0.9985
## Prevalence             0.2845   0.1935   0.1743   0.1639   0.1837
## Detection Rate         0.2845   0.1919   0.1731   0.1623   0.1825
## Detection Prevalence   0.2859   0.1929   0.1754   0.1631   0.1827
## Balanced Accuracy      0.9990   0.9952   0.9951   0.9945   0.9965

Svm

model.svm <-svm(classe~.,  data=myTraining)
prediction3 <- predict(model.svm, myTesting, type = "class")
confusionMatrix(prediction3, myTesting$classe)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 1392   67    1    1    0
##          B    2  861   24    0    3
##          C    1   20  818   76   29
##          D    0    1   12  726   27
##          E    0    0    0    1  842
## 
## Overall Statistics
##                                           
##                Accuracy : 0.946           
##                  95% CI : (0.9393, 0.9521)
##     No Information Rate : 0.2845          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9315          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            0.9978   0.9073   0.9567   0.9030   0.9345
## Specificity            0.9803   0.9927   0.9689   0.9902   0.9998
## Pos Pred Value         0.9528   0.9674   0.8665   0.9478   0.9988
## Neg Pred Value         0.9991   0.9781   0.9907   0.9812   0.9855
## Prevalence             0.2845   0.1935   0.1743   0.1639   0.1837
## Detection Rate         0.2838   0.1756   0.1668   0.1480   0.1717
## Detection Prevalence   0.2979   0.1815   0.1925   0.1562   0.1719
## Balanced Accuracy      0.9891   0.9500   0.9628   0.9466   0.9671

Create a Conjunct prediction

Using Random forest and SVM

sum.pred2<-data.frame(prediction2,prediction3,classe=myTesting$classe)
mode.all2<-train(classe~.,method="rf",data=sum.pred2)
prediction.all2 <- predict(mode.all2, sum.pred2)
confusionMatrix(prediction.all2, myTesting$classe)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 1395    7    0    0    0
##          B    0  941    4    0    1
##          C    0    1  849    7    3
##          D    0    0    2  796    2
##          E    0    0    0    1  895
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9943          
##                  95% CI : (0.9918, 0.9962)
##     No Information Rate : 0.2845          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9928          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            1.0000   0.9916   0.9930   0.9900   0.9933
## Specificity            0.9980   0.9987   0.9973   0.9990   0.9998
## Pos Pred Value         0.9950   0.9947   0.9872   0.9950   0.9989
## Neg Pred Value         1.0000   0.9980   0.9985   0.9981   0.9985
## Prevalence             0.2845   0.1935   0.1743   0.1639   0.1837
## Detection Rate         0.2845   0.1919   0.1731   0.1623   0.1825
## Detection Prevalence   0.2859   0.1929   0.1754   0.1631   0.1827
## Balanced Accuracy      0.9990   0.9952   0.9951   0.9945   0.9965

Predicting the test Data Set

prediction.test2<-predict(model.svm,testingset.final)
prediction.test3<-predict(model.rf,testingset.final)

We Check the results for both models and if there is any difference we would need to use a rank system

 prediction.test2==prediction.test3
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [15] TRUE TRUE TRUE TRUE TRUE TRUE

The prediction is the same for the 2 models, so there is no need of a ranking method for them.

Testing Prediction Result

##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
##  B  A  B  A  A  E  D  B  A  A  B  C  B  A  E  E  A  B  B  B 
## Levels: A B C D E