Summary
For this project, machine learning algorithms will be used in an attempt to correctly classify five ways of performing the Unilateral Dumbell Biceps Curl. To do so, the data is processed and cleaned, after which five different models are estimated and cross-validated to evaluate which model can classify the data best and how well it does so. The results show that the best performing model uses gradient boosting, with an in sample error of 0.29% and an out of sample error of 0.53%.
Introduction
Using devices such as Jawbone Up, Nike FuelBand, and Fitbit it is now possible to collect a large amount of data about personal activity relatively inexpensively. These type of devices are part of the quantified self movement - a group of enthusiasts who take measurements about themselves regularly to improve their health, to find patterns in their behavior, or because they are tech geeks. One thing that people regularly do is quantify how much of a particular activity they do, but they rarely quantify how well they do it. In this project, the goal is to use data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants to predict in which way the Unilateral Dumbell Biceps Curl (UDBC) is performed by a participant. They were asked to perform UDBC lifts correctly and incorrectly in 5 different ways. The classe variable indicates how the activity was performed, where A indicates performance according to the specification, whereas B, C, D and E are classes which indicate common mistakes.
More information is available from the website here: http://web.archive.org/web/20161224072740/http:/groupware.les.inf.puc-rio.br/har
Library and data
First, the required packages for the analysis must be loaded and the data must be read into R.
# Library
library(ggplot2)
library(scales)
library(dplyr)
library(magrittr)
library(caret)
library(MASS)
library(nnet)
library(randomForest)
library(parallel)
library(doParallel)
library(e1071)
library(knitr)# Working directory
setwd("D:/R Directories/Coursera/Practical Machine Learning/PML")
# Load test- and train datasets
test <- read.csv("pml-testing.csv")
train <- read.csv("pml-training.csv")Cleaning the dataset
Using the str function, an initial idea of the dataset can be obtained, the output of which can be found in appendix A. Several issues are immediately apparent:
- The first 7 variables do not stem from accelerometer measurements, whereas the goal of this project is to use accelerometer data for predictions;
- Many variables contain missing values.
The first issue is easily tackled, as the first 7 variables can simply be removed. Additionally, the response variable classe will be put into a seperate vector, while column 160 of the test set (problem_id) will also be removed, as it is an ID column. To address the second issue, more insight in the occurence of missing values in the data is required.
# Remove response and unnecessary variables
response <- train[,160]
train <- train[,-c(1:7, 160)]
test <- test[,-c(1:7, 160)]
# Missing values
dfm <- rbind(train,test) %>%
apply(2, as.numeric) %>%
apply(2, function(x) sum(is.na(x))) %>%
{data.frame(Names=names(.), MV=.)}
tbl <- table(dfm$MV)
tbl <- data.frame(names(tbl), as.numeric(tbl))
colnames(tbl) <- c("Number of missing values", "Frequency")
kable(tbl, format="html", caption="Frequencies of missing values")| Number of missing values | Frequency |
|---|---|
| 0 | 52 |
| 19236 | 67 |
| 19237 | 1 |
| 19238 | 1 |
| 19240 | 1 |
| 19241 | 4 |
| 19245 | 1 |
| 19246 | 4 |
| 19247 | 2 |
| 19268 | 2 |
| 19313 | 1 |
| 19314 | 1 |
| 19316 | 2 |
| 19319 | 1 |
| 19320 | 4 |
| 19321 | 2 |
| 19642 | 6 |
The table shows the amount of columns which have a specific sum of missing values. Accordingly, there are 52 columns which have no missing values, while all other columns have between 19236 and 19642 missing values. Given that the combined dataset contains 19642 observations, imputation cannot be used to clean these variables for analysis. Hence, all columns which contain missing values shall be removed from the dataset.
# Remove variables with too many MV's
dfm <- dfm %>%
filter(MV >0) %>%
.$Names
train <- train[,!(names(train) %in% dfm)]
test <- test[,!(names(test) %in% dfm)]
rm(dfm)Before proceeding to the analysis, a last check on the variables is performed to evaluate if any of them have (near-) zero variance, as this may be problematic for further analyses. To do so, nearZeroVar from caret is used, the output of which can be found in appendix B.
As the output contains no (near-) zero variance variables, the next step is to split the data set.
Splitting the data set
As the project requires an estimate for both the in sample error and out of sample error, the training set is split into a training and validation set. The training set, which will contain 75% of the observations, will then be used to pick the best model for prediction, which will subsequently be used on the validation set, which contains the remaining 25% of observations, to estimate out of sample error. Note that the response vector is also split.
# Split training set
inTrain <- createDataPartition(response, p=3/4, list=F)
valid <- train[-inTrain,]
train <- train[inTrain,]
responseT <- response[inTrain]
responseV <- response[-inTrain]
rm(inTrain, response)Model testing
To evaluate how well the response variable can be predicted, five machine learning algorithms will be tested:
- Multinomial logistic regression;
- Linear discriminant analysis;
- Random forest;
- Support vector machine (with linear kernel);
- Gradient boosting.
To pick the best model, cross validation is used with 10 folds. Additionally, the used variables are centered and scaled, to further increase the predictive power of the models.
# Training parameters
ctrl <- trainControl(method="cv", number=10, allowParallel=T)
preP <- preProcess(train, method=c("center", "scale"))
train_stand <- predict(preP, train)As the estimation of the models can be quite computationally demanding, parallel processing is used to speed up the process. Note that this is completely optional, and can be opted out of by not running the subsequent code chunck and setting allowParallel in the previous code chunck equal to FALSE (F).
# Enable parallelized computing
cluster <- makeCluster(detectCores()-1)
registerDoParallel(cluster)Next, the models are estimated and saved for subsequent evaluation.
# Multinomial logistic regression
set.seed(2017)
logfit <- train(x=train_stand, y=responseT, method="multinom",
trControl=ctrl, trace=FALSE)
# Linear discriminant analysis
set.seed(2017)
ldafit <- train(x=train_stand, y=responseT, method="lda",
trControl=ctrl)
# Random forest
set.seed(2017)
rffit <- train(x=train_stand, y=responseT, method="rf",
trControl=ctrl)
# Support vector machine with linear kernel
set.seed(2017)
lsvmfit <- train(x=train_stand, y=responseT, method="svmLinear",
tunelength=14, trControl=ctrl)
# XGBOOST
set.seed(2017)
xgbfit <- train(x=train_stand, y=responseT, method="xgbLinear",
trControl=ctrl, nthread=1)
# Stop parallel
stopCluster(cluster)
rm(cluster)Having estimated all models, the subsequent code is used to combine all models into one object, after which a parallel plot is shown and the summary function is used to analyse the results.
#
allResamples <- resamples(list("Multinom log" = logfit,
"LDA" = ldafit,
"RandomForest" = rffit,
"Linear SVM" = lsvmfit,
"XGBoost" = xgbfit)
)
parallelplot(allResamples)As is apparant from the parallel plot, the RandomForest and XGBoost models vastly outperformed the other models on all folds, with accuracies close to 1. Furthermore, the XGBoost models seem to outperform the RandomForest models by a fraction, which is best further analyzed by using summary statistics.
summary(allResamples)
Call:
summary.resamples(object = allResamples)
Models: Multinom log, LDA, RandomForest, Linear SVM, XGBoost
Number of resamples: 10
Accuracy
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
Multinom log 0.7056 0.7313 0.7331 0.7350 0.7393 0.7595 0
LDA 0.6886 0.6955 0.6992 0.7021 0.7050 0.7235 0
RandomForest 0.9878 0.9888 0.9929 0.9924 0.9947 0.9973 0
Linear SVM 0.7607 0.7783 0.7813 0.7819 0.7867 0.8003 0
XGBoost 0.9939 0.9956 0.9969 0.9966 0.9973 0.9986 0
Kappa
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
Multinom log 0.6259 0.6597 0.6614 0.6640 0.6693 0.6950 0
LDA 0.6051 0.6148 0.6190 0.6230 0.6271 0.6499 0
RandomForest 0.9845 0.9858 0.9910 0.9904 0.9933 0.9966 0
Linear SVM 0.6953 0.7183 0.7223 0.7228 0.7288 0.7464 0
XGBoost 0.9923 0.9944 0.9961 0.9957 0.9966 0.9983 0
As can be seen from the accuracy table, overall the models performed quite well, as the worst model (LDA) already had an accuracy of 0.6836. The RandomForest and XGBoost models performed best, as was also seen in the parallel plot, with XGBoost achieving the highest accuracy, albeit by a fraction.
The XGBoost models produced the following classifications:
confusionMatrix(xgbfit)Cross-Validated (10 fold) Confusion Matrix
(entries are percentual average cell counts across resamples)
Reference
Prediction A B C D E
A 28.4 0.0 0.0 0.0 0.0
B 0.0 19.3 0.1 0.0 0.0
C 0.0 0.0 17.3 0.1 0.0
D 0.0 0.0 0.1 16.3 0.0
E 0.0 0.0 0.0 0.0 18.3
Accuracy (average) : 0.9966
The confusion matrix shows that the models only misclassified among the wrongfully executed UDBC’s, whereas it correctly classified all correct performances. With an overall in sample error of 0.0029, the next step is to evaluate the best model’s estimated out of sample error by using the validation set.
# Out of sample error
preddat <- predict(preP, valid)
yclass <- predict(xgbfit, preddat)
confusionMatrix(yclass, responseV)Confusion Matrix and Statistics
Reference
Prediction A B C D E
A 1393 1 0 0 0
B 1 947 2 0 0
C 1 1 852 0 3
D 0 0 1 803 1
E 0 0 0 1 897
Overall Statistics
Accuracy : 0.9976
95% CI : (0.9957, 0.9987)
No Information Rate : 0.2845
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.9969
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: A Class: B Class: C Class: D Class: E
Sensitivity 0.9986 0.9979 0.9965 0.9988 0.9956
Specificity 0.9997 0.9992 0.9988 0.9995 0.9998
Pos Pred Value 0.9993 0.9968 0.9942 0.9975 0.9989
Neg Pred Value 0.9994 0.9995 0.9993 0.9998 0.9990
Prevalence 0.2845 0.1935 0.1743 0.1639 0.1837
Detection Rate 0.2841 0.1931 0.1737 0.1637 0.1829
Detection Prevalence 0.2843 0.1937 0.1748 0.1642 0.1831
Balanced Accuracy 0.9991 0.9986 0.9976 0.9991 0.9977
Using the validation set, similar results arise from the confusion matrix. The model has been able to correctly classify all correct executions of the UDBC, while making only few mistakes in the classification of the wrongful executions of the exercise. With an out of sample error of 0.0053, the model is ready to be used on the test set.
# Preprocess testdata and classify
testdat <- predict(preP, test)
yclass <- predict(xgbfit, testdat)
yclass [1] B A B A A E D B A A B C B A E E A B B B
Levels: A B C D E
Having obtained these predictions, which will be used in the Course Project Prediction Quiz, this report has reached its end. For those who have read all the way through it, thank you for your time, and I hope you found my analysis interesting!
Appendix
A.
str(strdf[,1:99])## 'data.frame': 19622 obs. of 99 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ user_name : Factor w/ 6 levels "adelmo","carlitos",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ raw_timestamp_part_1 : int 1323084231 1323084231 1323084231 1323084232 1323084232 1323084232 1323084232 1323084232 1323084232 1323084232 ...
## $ raw_timestamp_part_2 : int 788290 808298 820366 120339 196328 304277 368296 440390 484323 484434 ...
## $ cvtd_timestamp : Factor w/ 20 levels "02/12/2011 13:32",..: 9 9 9 9 9 9 9 9 9 9 ...
## $ new_window : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ num_window : int 11 11 11 12 12 12 12 12 12 12 ...
## $ roll_belt : num 1.41 1.41 1.42 1.48 1.48 1.45 1.42 1.42 1.43 1.45 ...
## $ pitch_belt : num 8.07 8.07 8.07 8.05 8.07 8.06 8.09 8.13 8.16 8.17 ...
## $ yaw_belt : num -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 ...
## $ total_accel_belt : int 3 3 3 3 3 3 3 3 3 3 ...
## $ kurtosis_roll_belt : Factor w/ 397 levels "","-0.016850",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ kurtosis_picth_belt : Factor w/ 317 levels "","-0.021887",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ kurtosis_yaw_belt : Factor w/ 2 levels "","#DIV/0!": 1 1 1 1 1 1 1 1 1 1 ...
## $ skewness_roll_belt : Factor w/ 395 levels "","-0.003095",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ skewness_roll_belt.1 : Factor w/ 338 levels "","-0.005928",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ skewness_yaw_belt : Factor w/ 2 levels "","#DIV/0!": 1 1 1 1 1 1 1 1 1 1 ...
## $ max_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_picth_belt : int NA NA NA NA NA NA NA NA NA NA ...
## $ max_yaw_belt : Factor w/ 68 levels "","-0.1","-0.2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ min_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_pitch_belt : int NA NA NA NA NA NA NA NA NA NA ...
## $ min_yaw_belt : Factor w/ 68 levels "","-0.1","-0.2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ amplitude_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_pitch_belt : int NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_yaw_belt : Factor w/ 4 levels "","#DIV/0!","0.00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ var_total_accel_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_pitch_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_pitch_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_pitch_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ gyros_belt_x : num 0 0.02 0 0.02 0.02 0.02 0.02 0.02 0.02 0.03 ...
## $ gyros_belt_y : num 0 0 0 0 0.02 0 0 0 0 0 ...
## $ gyros_belt_z : num -0.02 -0.02 -0.02 -0.03 -0.02 -0.02 -0.02 -0.02 -0.02 0 ...
## $ accel_belt_x : int -21 -22 -20 -22 -21 -21 -22 -22 -20 -21 ...
## $ accel_belt_y : int 4 4 5 3 2 4 3 4 2 4 ...
## $ accel_belt_z : int 22 22 23 21 24 21 21 21 24 22 ...
## $ magnet_belt_x : int -3 -7 -2 -6 -6 0 -4 -2 1 -3 ...
## $ magnet_belt_y : int 599 608 600 604 600 603 599 603 602 609 ...
## $ magnet_belt_z : int -313 -311 -305 -310 -302 -312 -311 -313 -312 -308 ...
## $ roll_arm : num -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 ...
## $ pitch_arm : num 22.5 22.5 22.5 22.1 22.1 22 21.9 21.8 21.7 21.6 ...
## $ yaw_arm : num -161 -161 -161 -161 -161 -161 -161 -161 -161 -161 ...
## $ total_accel_arm : int 34 34 34 34 34 34 34 34 34 34 ...
## $ var_accel_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_yaw_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_yaw_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_yaw_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ gyros_arm_x : num 0 0.02 0.02 0.02 0 0.02 0 0.02 0.02 0.02 ...
## $ gyros_arm_y : num 0 -0.02 -0.02 -0.03 -0.03 -0.03 -0.03 -0.02 -0.03 -0.03 ...
## $ gyros_arm_z : num -0.02 -0.02 -0.02 0.02 0 0 0 0 -0.02 -0.02 ...
## $ accel_arm_x : int -288 -290 -289 -289 -289 -289 -289 -289 -288 -288 ...
## $ accel_arm_y : int 109 110 110 111 111 111 111 111 109 110 ...
## $ accel_arm_z : int -123 -125 -126 -123 -123 -122 -125 -124 -122 -124 ...
## $ magnet_arm_x : int -368 -369 -368 -372 -374 -369 -373 -372 -369 -376 ...
## $ magnet_arm_y : int 337 337 344 344 337 342 336 338 341 334 ...
## $ magnet_arm_z : int 516 513 513 512 506 513 509 510 518 516 ...
## $ kurtosis_roll_arm : Factor w/ 330 levels "","-0.02438",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ kurtosis_picth_arm : Factor w/ 328 levels "","-0.00484",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ kurtosis_yaw_arm : Factor w/ 395 levels "","-0.01548",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ skewness_roll_arm : Factor w/ 331 levels "","-0.00051",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ skewness_pitch_arm : Factor w/ 328 levels "","-0.00184",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ skewness_yaw_arm : Factor w/ 395 levels "","-0.00311",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ max_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_picth_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_yaw_arm : int NA NA NA NA NA NA NA NA NA NA ...
## $ min_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_yaw_arm : int NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_yaw_arm : int NA NA NA NA NA NA NA NA NA NA ...
## $ roll_dumbbell : num 13.1 13.1 12.9 13.4 13.4 ...
## $ pitch_dumbbell : num -70.5 -70.6 -70.3 -70.4 -70.4 ...
## $ yaw_dumbbell : num -84.9 -84.7 -85.1 -84.9 -84.9 ...
## $ kurtosis_roll_dumbbell : Factor w/ 398 levels "","-0.0035","-0.0073",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ kurtosis_picth_dumbbell: Factor w/ 401 levels "","-0.0163","-0.0233",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ kurtosis_yaw_dumbbell : Factor w/ 2 levels "","#DIV/0!": 1 1 1 1 1 1 1 1 1 1 ...
## $ skewness_roll_dumbbell : Factor w/ 401 levels "","-0.0082","-0.0096",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ skewness_pitch_dumbbell: Factor w/ 402 levels "","-0.0053","-0.0084",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ skewness_yaw_dumbbell : Factor w/ 2 levels "","#DIV/0!": 1 1 1 1 1 1 1 1 1 1 ...
## $ max_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_picth_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_yaw_dumbbell : Factor w/ 73 levels "","-0.1","-0.2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ min_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_pitch_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_yaw_dumbbell : Factor w/ 73 levels "","-0.1","-0.2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ amplitude_roll_dumbbell: num NA NA NA NA NA NA NA NA NA NA ...
str(strdf[,100:160])## 'data.frame': 19622 obs. of 61 variables:
## $ amplitude_pitch_dumbbell: num NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_yaw_dumbbell : Factor w/ 3 levels "","#DIV/0!","0.00": 1 1 1 1 1 1 1 1 1 1 ...
## $ total_accel_dumbbell : int 37 37 37 37 37 37 37 37 37 37 ...
## $ var_accel_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_pitch_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_pitch_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_pitch_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_yaw_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_yaw_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_yaw_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ gyros_dumbbell_x : num 0 0 0 0 0 0 0 0 0 0 ...
## $ gyros_dumbbell_y : num -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 ...
## $ gyros_dumbbell_z : num 0 0 0 -0.02 0 0 0 0 0 0 ...
## $ accel_dumbbell_x : int -234 -233 -232 -232 -233 -234 -232 -234 -232 -235 ...
## $ accel_dumbbell_y : int 47 47 46 48 48 48 47 46 47 48 ...
## $ accel_dumbbell_z : int -271 -269 -270 -269 -270 -269 -270 -272 -269 -270 ...
## $ magnet_dumbbell_x : int -559 -555 -561 -552 -554 -558 -551 -555 -549 -558 ...
## $ magnet_dumbbell_y : int 293 296 298 303 292 294 295 300 292 291 ...
## $ magnet_dumbbell_z : num -65 -64 -63 -60 -68 -66 -70 -74 -65 -69 ...
## $ roll_forearm : num 28.4 28.3 28.3 28.1 28 27.9 27.9 27.8 27.7 27.7 ...
## $ pitch_forearm : num -63.9 -63.9 -63.9 -63.9 -63.9 -63.9 -63.9 -63.8 -63.8 -63.8 ...
## $ yaw_forearm : num -153 -153 -152 -152 -152 -152 -152 -152 -152 -152 ...
## $ kurtosis_roll_forearm : Factor w/ 322 levels "","-0.0227","-0.0359",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ kurtosis_picth_forearm : Factor w/ 323 levels "","-0.0073","-0.0442",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ kurtosis_yaw_forearm : Factor w/ 2 levels "","#DIV/0!": 1 1 1 1 1 1 1 1 1 1 ...
## $ skewness_roll_forearm : Factor w/ 323 levels "","-0.0004","-0.0013",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ skewness_pitch_forearm : Factor w/ 319 levels "","-0.0113","-0.0131",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ skewness_yaw_forearm : Factor w/ 2 levels "","#DIV/0!": 1 1 1 1 1 1 1 1 1 1 ...
## $ max_roll_forearm : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_picth_forearm : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_yaw_forearm : Factor w/ 45 levels "","-0.1","-0.2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ min_roll_forearm : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_pitch_forearm : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_yaw_forearm : Factor w/ 45 levels "","-0.1","-0.2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ amplitude_roll_forearm : num NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_pitch_forearm : num NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_yaw_forearm : Factor w/ 3 levels "","#DIV/0!","0.00": 1 1 1 1 1 1 1 1 1 1 ...
## $ total_accel_forearm : int 36 36 36 36 36 36 36 36 36 36 ...
## $ var_accel_forearm : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_roll_forearm : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_roll_forearm : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_roll_forearm : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_pitch_forearm : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_pitch_forearm : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_pitch_forearm : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_yaw_forearm : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_yaw_forearm : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_yaw_forearm : num NA NA NA NA NA NA NA NA NA NA ...
## $ gyros_forearm_x : num 0.03 0.02 0.03 0.02 0.02 0.02 0.02 0.02 0.03 0.02 ...
## $ gyros_forearm_y : num 0 0 -0.02 -0.02 0 -0.02 0 -0.02 0 0 ...
## $ gyros_forearm_z : num -0.02 -0.02 0 0 -0.02 -0.03 -0.02 0 -0.02 -0.02 ...
## $ accel_forearm_x : int 192 192 196 189 189 193 195 193 193 190 ...
## $ accel_forearm_y : int 203 203 204 206 206 203 205 205 204 205 ...
## $ accel_forearm_z : int -215 -216 -213 -214 -214 -215 -215 -213 -214 -215 ...
## $ magnet_forearm_x : int -17 -18 -18 -16 -17 -9 -18 -9 -16 -22 ...
## $ magnet_forearm_y : num 654 661 658 658 655 660 659 660 653 656 ...
## $ magnet_forearm_z : num 476 473 469 469 473 478 470 474 476 473 ...
## $ classe : Factor w/ 5 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...
B.
nzvdf## freqRatio percentUnique zeroVar nzv
## roll_belt 1.101904 6.7762957 FALSE FALSE
## pitch_belt 1.036082 9.3676815 FALSE FALSE
## yaw_belt 1.056420 9.9633439 FALSE FALSE
## total_accel_belt 1.064128 0.1476428 FALSE FALSE
## gyros_belt_x 1.059341 0.7127584 FALSE FALSE
## gyros_belt_y 1.145626 0.3512881 FALSE FALSE
## gyros_belt_z 1.063770 0.8604012 FALSE FALSE
## accel_belt_x 1.052699 0.8349455 FALSE FALSE
## accel_belt_y 1.112198 0.7280318 FALSE FALSE
## accel_belt_z 1.078767 1.5222482 FALSE FALSE
## magnet_belt_x 1.090141 1.6647999 FALSE FALSE
## magnet_belt_y 1.097978 1.5171571 FALSE FALSE
## magnet_belt_z 1.004228 2.3266470 FALSE FALSE
## roll_arm 52.461538 13.5169535 FALSE FALSE
## pitch_arm 87.461538 15.7214133 FALSE FALSE
## yaw_arm 33.106796 14.6420935 FALSE FALSE
## total_accel_arm 1.025612 0.3360147 FALSE FALSE
## gyros_arm_x 1.019380 3.2735974 FALSE FALSE
## gyros_arm_y 1.458252 1.9142653 FALSE FALSE
## gyros_arm_z 1.110476 1.2626005 FALSE FALSE
## accel_arm_x 1.005714 3.9558090 FALSE FALSE
## accel_arm_y 1.140187 2.7339375 FALSE FALSE
## accel_arm_z 1.119048 4.0321759 FALSE FALSE
## magnet_arm_x 1.000000 6.8170247 FALSE FALSE
## magnet_arm_y 1.068182 4.4394664 FALSE FALSE
## magnet_arm_z 1.036364 6.4402810 FALSE FALSE
## roll_dumbbell 1.022388 84.1971286 FALSE FALSE
## pitch_dumbbell 2.284672 81.7330211 FALSE FALSE
## yaw_dumbbell 1.132231 83.4741880 FALSE FALSE
## total_accel_dumbbell 1.071795 0.2189186 FALSE FALSE
## gyros_dumbbell_x 1.003268 1.2269626 FALSE FALSE
## gyros_dumbbell_y 1.266212 1.4153345 FALSE FALSE
## gyros_dumbbell_z 1.060100 1.0487730 FALSE FALSE
## accel_dumbbell_x 1.018018 2.1688219 FALSE FALSE
## accel_dumbbell_y 1.053061 2.3724672 FALSE FALSE
## accel_dumbbell_z 1.133333 2.0873638 FALSE FALSE
## magnet_dumbbell_x 1.098266 5.7478872 FALSE FALSE
## magnet_dumbbell_y 1.197740 4.2969148 FALSE FALSE
## magnet_dumbbell_z 1.020833 3.4416047 FALSE FALSE
## roll_forearm 11.592262 11.0783016 FALSE FALSE
## pitch_forearm 64.900000 14.8457387 FALSE FALSE
## yaw_forearm 15.266667 10.1364423 FALSE FALSE
## total_accel_forearm 1.128721 0.3563792 FALSE FALSE
## gyros_forearm_x 1.061069 1.5171571 FALSE FALSE
## gyros_forearm_y 1.033854 3.7725283 FALSE FALSE
## gyros_forearm_z 1.125000 1.5629773 FALSE FALSE
## accel_forearm_x 1.126437 4.0423582 FALSE FALSE
## accel_forearm_y 1.059406 5.1064046 FALSE FALSE
## accel_forearm_z 1.006250 2.9528561 FALSE FALSE
## magnet_forearm_x 1.000000 7.7588840 FALSE FALSE
## magnet_forearm_y 1.246914 9.5305977 FALSE FALSE
## magnet_forearm_z 1.017241 8.5683739 FALSE FALSE