First, let’s download our data.
download.file('https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv', method="wget",'training.csv')
download.file('https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv', method = "wget", 'testing.csv')
training<-read.csv('training.csv')
testing<-read.csv('testing.csv')
Let’s get rid from all variables that are NA’s in testing data.
training<-training[,-c(6,12:36,50:59,69:83,87:101,103:112,125:150)]
testing<-testing[,-c(6,12:36,50:59,69:83,87:101,103:112,125:150)]
Let’s also omit variables that are hardly likely to influence on class: IDs and timestamps and num_window.
training<-training[,-c(1,3:6)]
testing<-testing[,-c(1,3:6)]
We will use randomForest function to build our model.
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
library(randomForest)
## randomForest 4.6-10
## Type rfNews() to see new features/changes/bug fixes.
fit<-randomForest(classe~.,data=training)
Let’s check how it’s doing on training data:
confusionMatrix(training$classe,predict(fit,newdata=training))
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 5580 0 0 0 0
## B 0 3797 0 0 0
## C 0 0 3422 0 0
## D 0 0 0 3216 0
## E 0 0 0 0 3607
##
## Overall Statistics
##
## Accuracy : 1
## 95% CI : (0.9998, 1)
## No Information Rate : 0.2844
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 1
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 1.0000 1.0000 1.0000 1.0000 1.0000
## Specificity 1.0000 1.0000 1.0000 1.0000 1.0000
## Pos Pred Value 1.0000 1.0000 1.0000 1.0000 1.0000
## Neg Pred Value 1.0000 1.0000 1.0000 1.0000 1.0000
## Prevalence 0.2844 0.1935 0.1744 0.1639 0.1838
## Detection Rate 0.2844 0.1935 0.1744 0.1639 0.1838
## Detection Prevalence 0.2844 0.1935 0.1744 0.1639 0.1838
## Balanced Accuracy 1.0000 1.0000 1.0000 1.0000 1.0000
Now, let’s do the prediction:
answers<-predict(fit,newdata=testing)
answers
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## B A B A A E D B A A B C B A E E A B B B
## Levels: A B C D E
That’s it.