A majority of research into Human activity recognition focuses on quantitative aspects of activities as opposed to the qualitative. This project aims to focus on how well the activities are completed. It uses data from:
Velloso, E.; Bulling, A.; Gellersen, H.; Ugulino, W.; Fuks, H. Qualitative Activity Recognition of Weight Lifting Exercises. Proceedings of 4th International Conference in Cooperation with SIGCHI (Augmented Human ’13) . Stuttgart, Germany: ACM SIGCHI, 2013.
The data has 6 participants who undertake the various exercises whose correctness ranges from A to E where A is the correct from and the rest are different wrong forms.The dataset being huge and not clean needs to be processed before applying the algorithms. The columns which summarize the data eg. columns which have the phrases “var”,“min”,“max”,“kurt” are removed as they contain redundant information. Additionally, columns that do not contain information that is pertinent to the machine learning process are removed. This includes columns containing index numbers and timestamps.
The columns containing NAs are removed instead of imputed as it can cause errors in the predictions.
The K fold method is used for cross validation to as it shows less bias and variation while also inscreasing the amount of data available for testing and validation. k=10 is chosen. The data is separated into the training and (intermediate)testing sets. This testing set is used to check the accuracy of the ML algorithm before applying it on the actual test set.
The Random Forest algorithm is chosen to perform the predictions with number of trees=10.
## Warning: package 'randomForest' was built under R version 3.4.2
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
## Warning: package 'caret' was built under R version 3.4.2
## Loading required package: lattice
## Loading required package: ggplot2
##
## Attaching package: 'ggplot2'
## The following object is masked from 'package:randomForest':
##
## margin
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 50220 0 0 0 0
## B 0 34173 0 0 0
## C 0 0 30798 0 0
## D 0 0 0 28944 0
## E 0 0 0 0 32463
##
## Overall Statistics
##
## Accuracy : 1
## 95% CI : (1, 1)
## No Information Rate : 0.2844
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 1
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 1.0000 1.0000 1.0000 1.0000 1.0000
## Specificity 1.0000 1.0000 1.0000 1.0000 1.0000
## Pos Pred Value 1.0000 1.0000 1.0000 1.0000 1.0000
## Neg Pred Value 1.0000 1.0000 1.0000 1.0000 1.0000
## Prevalence 0.2844 0.1935 0.1744 0.1639 0.1838
## Detection Rate 0.2844 0.1935 0.1744 0.1639 0.1838
## Detection Prevalence 0.2844 0.1935 0.1744 0.1639 0.1838
## Balanced Accuracy 1.0000 1.0000 1.0000 1.0000 1.0000
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 5580 0 0 0 0
## B 0 3797 0 0 0
## C 0 0 3422 0 0
## D 0 0 0 3216 0
## E 0 0 0 0 3607
##
## Overall Statistics
##
## Accuracy : 1
## 95% CI : (0.9998, 1)
## No Information Rate : 0.2844
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 1
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 1.0000 1.0000 1.0000 1.0000 1.0000
## Specificity 1.0000 1.0000 1.0000 1.0000 1.0000
## Pos Pred Value 1.0000 1.0000 1.0000 1.0000 1.0000
## Neg Pred Value 1.0000 1.0000 1.0000 1.0000 1.0000
## Prevalence 0.2844 0.1935 0.1744 0.1639 0.1838
## Detection Rate 0.2844 0.1935 0.1744 0.1639 0.1838
## Detection Prevalence 0.2844 0.1935 0.1744 0.1639 0.1838
## Balanced Accuracy 1.0000 1.0000 1.0000 1.0000 1.0000
As the accuracy of the said model on the training and intermediate testing sets as shown above (top matrix=training, bottom matrix=testing) is 1 ie 100% it can be used on the final testing set.
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## B A B A A E D B A A B C B A E E A B B B
## Levels: A B C D E
Thus thus the predictions on the test set are shown above