Machine Learining Course Project

First, let’s download our data.

download.file('https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv', method="wget",'training.csv')
download.file('https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv', method = "wget", 'testing.csv')
training<-read.csv('training.csv')
testing<-read.csv('testing.csv')

Let’s get rid from all variables that are NA’s in testing data.

training<-training[,-c(6,12:36,50:59,69:83,87:101,103:112,125:150)]
testing<-testing[,-c(6,12:36,50:59,69:83,87:101,103:112,125:150)]

Let’s also omit variables that are hardly likely to influence on class: IDs and timestamps and num_window.

training<-training[,-c(1,3:6)]
testing<-testing[,-c(1,3:6)]

We will use randomForest function to build our model.

library(caret)

## Loading required package: lattice
## Loading required package: ggplot2

library(randomForest)

## randomForest 4.6-10
## Type rfNews() to see new features/changes/bug fixes.

fit<-randomForest(classe~.,data=training)

Let’s check how it’s doing on training data:

confusionMatrix(training$classe,predict(fit,newdata=training))

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    A    B    C    D    E
##          A 5580    0    0    0    0
##          B    0 3797    0    0    0
##          C    0    0 3422    0    0
##          D    0    0    0 3216    0
##          E    0    0    0    0 3607
## 
## Overall Statistics
##                                      
##                Accuracy : 1          
##                  95% CI : (0.9998, 1)
##     No Information Rate : 0.2844     
##     P-Value [Acc > NIR] : < 2.2e-16  
##                                      
##                   Kappa : 1          
##  Mcnemar's Test P-Value : NA         
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            1.0000   1.0000   1.0000   1.0000   1.0000
## Specificity            1.0000   1.0000   1.0000   1.0000   1.0000
## Pos Pred Value         1.0000   1.0000   1.0000   1.0000   1.0000
## Neg Pred Value         1.0000   1.0000   1.0000   1.0000   1.0000
## Prevalence             0.2844   0.1935   0.1744   0.1639   0.1838
## Detection Rate         0.2844   0.1935   0.1744   0.1639   0.1838
## Detection Prevalence   0.2844   0.1935   0.1744   0.1639   0.1838
## Balanced Accuracy      1.0000   1.0000   1.0000   1.0000   1.0000

Now, let’s do the prediction:

answers<-predict(fit,newdata=testing)
answers

##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
##  B  A  B  A  A  E  D  B  A  A  B  C  B  A  E  E  A  B  B  B 
## Levels: A B C D E

That’s it.

Machine Learining Course Project

Egor Ignatenkov

22.12.2014