For this task the Weight Lifting Exercises (WLE) dataset is used. The WLE dataset is obtained by monitoring people during exercises using devices such as Jawbone Up, Nike FuelBand, and Fitbit. The dataset is used to investigate “how (well)” an activity was performed by the wearer.
The goal of this project is to predict the manner in which people did the exercises. This is the “classe” variable in the training set. The task is to create a prediction model and report describing how the model was built.
In order to explore the dataset test and train splits are downloaded from following websites:
Train dataset: https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv
Test dataset: https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv
The dataset is created by Velloso, E.; Bulling, A.; Gellersen, H.; Ugulino, W.; Fuks, H. Qualitative Activity Recognition of Weight Lifting Exercises. Proceedings of 4th International Conference in Cooperation with SIGCHI (Augmented Human ’13) . Stuttgart, Germany: ACM SIGCHI, 2013.
Downloading the dataset and setting the seed for reproducability.
Remove unneccessary variables with near zero variance
data_train <- data_train[-(1:7)]
na_count <-sapply(data_train, function(y) sum(length(which(is.na(y)))))
na_count <- data.frame(na_count)
unique(na_count$na_count)
[1] 0 19216
n <- which(na_count == 19216)
data_train <- data_train[-n]
last<-dim(data_train)[2]
data_train[, -last] <- sapply(data_train[, -last] , function(x) as.numeric(x))
nsv <- nearZeroVar(data_train[,-last],saveMetrics=TRUE)
data_train <- data_train[,!nsv$nzv]
The dataset is splited into train/validation and testing. Validation set serves for optimizing the algoritm and related parameters.
train <- createDataPartition(y = data_train$classe, p=.75, list = FALSE)
training <- data_train[train,]
validation <- data_train[-train,]
Fit random forest model and estimate the error on validation dataset.
fitRF <- randomForest(training$classe ~ ., data=training, ntree=100, na.action = na.roughfix)
predictionRF <- predict(fitRF, validation, type = "class")
confusionMatrix(validation$classe, predictionRF)
Confusion Matrix and Statistics
Reference
Prediction A B C D E
A 1395 0 0 0 0
B 4 943 2 0 0
C 0 8 847 0 0
D 0 0 4 799 1
E 0 0 0 1 900
Overall Statistics
Accuracy : 0.9959
95% CI : (0.9937, 0.9975)
No Information Rate : 0.2853
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.9948
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: A Class: B Class: C Class: D Class: E
Sensitivity 0.9971 0.9916 0.9930 0.9988 0.9989
Specificity 1.0000 0.9985 0.9980 0.9988 0.9998
Pos Pred Value 1.0000 0.9937 0.9906 0.9938 0.9989
Neg Pred Value 0.9989 0.9980 0.9985 0.9998 0.9998
Prevalence 0.2853 0.1939 0.1739 0.1631 0.1837
Detection Rate 0.2845 0.1923 0.1727 0.1629 0.1835
Detection Prevalence 0.2845 0.1935 0.1743 0.1639 0.1837
Balanced Accuracy 0.9986 0.9950 0.9955 0.9988 0.9993
predictSubmission <- predict(fitRF, testing, type="class")
predictSubmission
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
B A B A A E D B A A B C B A E E A B B B
Levels: A B C D E