Executive Summary Objective of this analysis is to quantify how effectively the 6 participants of this research used the accelerometer belt, forearm, arm, and dumbell equipments.
In short the results of the analysis from data set provided for this purpose, it seems that based on the data provided only accelerometer belt_z, dumbbell_x, dumbbell_y, dumbbell_z, forearm_x are the key variables amongst other non-accelerometer variable. To quantify the accelerometer variables forming part of the model contribution can be said to be in the range of of 37% to 52%.
Assumption: The results of the model is entirely based on the assumption that the values for the outcome variable (Classe) in the training data set was determined with at least 95% accuracy and the probability of type 1 & type 2 errors were minimal if not zero.
Loading data Downloading both train and test data
Loading = function(trainf, testf) {
wd <<- getwd()
fil = trainf
fil1 = testf
file = paste(wd, fil, sep = "/")
file1 = paste(wd, fil1, sep = "/")
if (!file.exists(file)) {
download.file("https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv",
file)
}
if (!file.exists(file1)) {
download.file("https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv",
file1)
}
trainfull <<- data.frame(read.csv(file, header = T, sep = ",", stringsAsFactors = T,
na.strings = c("?", NA, "NA", " ", NULL, "#DIV/0!"), blank.lines.skip = T))
testfull <<- data.frame(read.csv(file1, header = T, sep = ",", stringsAsFactors = T,
na.strings = c("?", NA, "NA", " ", NULL, "#DIV/0!"), blank.lines.skip = T))
}
Preprocessing train: Function to preprocess the data - Removing NA, data, time fields and total fields as they do not bear any value in determining the predictive model. - Determining the Zero & near zero values
pprocess = function(data) {
pstat = data.frame()
ncol(data)
data = data[, !apply(data, 2, function(x) any(is.na(x)))]
data1 <<- subset(data, select = -c(X, raw_timestamp_part_1, raw_timestamp_part_2,
cvtd_timestamp, new_window, num_window, total_accel_arm, total_accel_belt,
total_accel_dumbbell, total_accel_forearm))
ncols <- ncol(data1)
row = nrow(data1)
preProcess(data1[, -ncols], method = "pca", thresh = 0.95)
datanzv = nearZeroVar(data1, saveMetrics = TRUE)
pstat = length(which(datanzv$zeroVar == "TRUE"))
pstat = cbind(pstat, length(which(datanzv$nzv %in% "TRUE")))
colnames(pstat) = c("#-of zero var", "#-near zero var")
print(pstat)
}
**Training data:** Function to create train & test folds. Partitioning the training data into 55% training data and 45% as test data. Activating parallel processing of the training process to accommodate the data load. Using Random forest to train the data for prediction.
traindata = function(pdata = data1) {
gc()
mkcluster = makeCluster(detectCores() - 1, methods = FALSE)
registerDoParallel(mkcluster)
trctrl = trainControl(classProbs = TRUE, savePredictions = TRUE, allowParallel = TRUE,
method = "oob")
trainfold = createDataPartition(data1$classe, 10, p = 0.55, list = FALSE)
train = data1[trainfold, ]
test = data1[-trainfold, ]
tread <<- train(classe ~ ., data = train, method = "rf", trControl = trctrl)
stopCluster(mkcluster)
tread$bestTune
tread$results
}
Modelling & predicting: Function to predict data using the trained data as the base.
predictdata = function(sdata = data1) {
predictdata <- predict(tread, sdata)
sdata = cbind(sdata, predictdata)
matrix = confusionMatrix(predictdata, sdata$predictdata)
}
Predicting using the models: Running tests on training data to fit a model and then on the test data. Storing the predicted values in the variable “predictdata”
Loading("pml_training.csv", "pml_testing.csv")
# Processing & training model on training dataset
pprocess(trainfull)
#-of zero var #-near zero var
[1,] 0 0
traindata(trainfull)
Accuracy Kappa mtry
1 1.0000000 1.0000000 2
2 0.9999907 0.9999883 27
3 1.0000000 1.0000000 53
# Predicting training dataset, results of the trained model
predictdata()
# Key variables that contribute to the model and predictions
varImp(tread)
rf variable importance
only 20 most important variables shown (out of 53)
Overall
roll_belt 100.00
yaw_belt 85.96
magnet_dumbbell_z 75.30
pitch_belt 68.71
magnet_dumbbell_y 67.88
pitch_forearm 66.07
roll_forearm 60.00
magnet_dumbbell_x 59.45
accel_belt_z 56.10
accel_dumbbell_y 53.56
magnet_belt_y 50.39
magnet_belt_z 50.26
roll_dumbbell 49.02
accel_dumbbell_z 46.90
roll_arm 46.28
accel_forearm_x 41.14
accel_dumbbell_x 40.77
yaw_dumbbell 39.51
gyros_belt_z 38.33
magnet_arm_y 37.56
# Processing test dataset
pprocess(testfull)
#-of zero var #-near zero var
[1,] 0 0
# Predicting test dataset and results
predictdata()
# Key variables that contribute to the model and predictions
varImp(tread)
rf variable importance
only 20 most important variables shown (out of 53)
Overall
roll_belt 100.00
yaw_belt 85.96
magnet_dumbbell_z 75.30
pitch_belt 68.71
magnet_dumbbell_y 67.88
pitch_forearm 66.07
roll_forearm 60.00
magnet_dumbbell_x 59.45
accel_belt_z 56.10
accel_dumbbell_y 53.56
magnet_belt_y 50.39
magnet_belt_z 50.26
roll_dumbbell 49.02
accel_dumbbell_z 46.90
roll_arm 46.28
accel_forearm_x 41.14
accel_dumbbell_x 40.77
yaw_dumbbell 39.51
gyros_belt_z 38.33
magnet_arm_y 37.56
Results: When comparing the results of the predicted value to the actual value in the training data set, the predictions match one for one. This seems to indicate that the final model-“mtry2”, is a good fit. Further testing the model with the test data provided results with sensitivity and specificity values close to that of the training set. The variables that contributed to predicting the values for test and the training values, have the same level of importance in both data sets. The model thus created seems to be stable. Out of sample error is: 1- Accuracy= 1-1= 0
Environment: 1. OS: Windows 10; Tool: R version 3.2.5; R Studio version 0.99.893; Publishing tool: RPubs, HTML 4. Data: With thanks to source: http://groupware.les.inf.puc-rio.br/har. Reference: Velloso, E.; Bulling, A.; Gellersen, H.; Ugulino, W.; Fuks, H. Qualitative Activity Recognition of Weight Lifting Exercises. Proceedings of 4th International Conference in Cooperation with SIGCHI (Augmented Human ’13) . Stuttgart, Germany: ACM SIGCHI, 2013. Read more: http://groupware.les.inf.puc-rio.br/har#weight_lifting_exercises#ixzz49Qq0gDZE 6. Analyst: Uma Venkataramani; Date of Analysis: May 2016