With the birth of wearable devices such as Jawbone Up, Nike FuelBand, and Fitbit, collecting extensive data on personal activity has become increasingly popular. These devices are central to quantified self-movement, where individuals routinely track their data to enhance their health, identify behavioral patterns, or simply out of interest in technology. While many users focus on quantifying the frequency of their activities, they often need to pay more attention to the quality of their performance.
This project aims to bridge that gap by analyzing data collected from six participants’ accelerometers placed in their belts, forearms, arms, and dumbbells. These participants performed instructed barbell lifts correctly and incorrectly in five distinct ways.
data can be downloaded at: https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv
## 'data.frame': 19622 obs. of 160 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ user_name : chr "carlitos" "carlitos" "carlitos" "carlitos" ...
## $ raw_timestamp_part_1 : int 1323084231 1323084231 1323084231 1323084232 1323084232 1323084232 1323084232 1323084232 1323084232 1323084232 ...
## $ raw_timestamp_part_2 : int 788290 808298 820366 120339 196328 304277 368296 440390 484323 484434 ...
## $ cvtd_timestamp : chr "05/12/2011 11:23" "05/12/2011 11:23" "05/12/2011 11:23" "05/12/2011 11:23" ...
## $ new_window : chr "no" "no" "no" "no" ...
## $ num_window : int 11 11 11 12 12 12 12 12 12 12 ...
## $ roll_belt : num 1.41 1.41 1.42 1.48 1.48 1.45 1.42 1.42 1.43 1.45 ...
## $ pitch_belt : num 8.07 8.07 8.07 8.05 8.07 8.06 8.09 8.13 8.16 8.17 ...
## $ yaw_belt : num -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 ...
## $ total_accel_belt : int 3 3 3 3 3 3 3 3 3 3 ...
## $ kurtosis_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ kurtosis_picth_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ kurtosis_yaw_belt : logi NA NA NA NA NA NA ...
## $ skewness_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_roll_belt.1 : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_yaw_belt : logi NA NA NA NA NA NA ...
## $ max_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_picth_belt : int NA NA NA NA NA NA NA NA NA NA ...
## $ max_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_pitch_belt : int NA NA NA NA NA NA NA NA NA NA ...
## $ min_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_pitch_belt : int NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_total_accel_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_roll_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_pitch_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_pitch_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_pitch_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_yaw_belt : num NA NA NA NA NA NA NA NA NA NA ...
## $ gyros_belt_x : num 0 0.02 0 0.02 0.02 0.02 0.02 0.02 0.02 0.03 ...
## $ gyros_belt_y : num 0 0 0 0 0.02 0 0 0 0 0 ...
## $ gyros_belt_z : num -0.02 -0.02 -0.02 -0.03 -0.02 -0.02 -0.02 -0.02 -0.02 0 ...
## $ accel_belt_x : int -21 -22 -20 -22 -21 -21 -22 -22 -20 -21 ...
## $ accel_belt_y : int 4 4 5 3 2 4 3 4 2 4 ...
## $ accel_belt_z : int 22 22 23 21 24 21 21 21 24 22 ...
## $ magnet_belt_x : int -3 -7 -2 -6 -6 0 -4 -2 1 -3 ...
## $ magnet_belt_y : int 599 608 600 604 600 603 599 603 602 609 ...
## $ magnet_belt_z : int -313 -311 -305 -310 -302 -312 -311 -313 -312 -308 ...
## $ roll_arm : num -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 ...
## $ pitch_arm : num 22.5 22.5 22.5 22.1 22.1 22 21.9 21.8 21.7 21.6 ...
## $ yaw_arm : num -161 -161 -161 -161 -161 -161 -161 -161 -161 -161 ...
## $ total_accel_arm : int 34 34 34 34 34 34 34 34 34 34 ...
## $ var_accel_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ avg_yaw_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ stddev_yaw_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ var_yaw_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ gyros_arm_x : num 0 0.02 0.02 0.02 0 0.02 0 0.02 0.02 0.02 ...
## $ gyros_arm_y : num 0 -0.02 -0.02 -0.03 -0.03 -0.03 -0.03 -0.02 -0.03 -0.03 ...
## $ gyros_arm_z : num -0.02 -0.02 -0.02 0.02 0 0 0 0 -0.02 -0.02 ...
## $ accel_arm_x : int -288 -290 -289 -289 -289 -289 -289 -289 -288 -288 ...
## $ accel_arm_y : int 109 110 110 111 111 111 111 111 109 110 ...
## $ accel_arm_z : int -123 -125 -126 -123 -123 -122 -125 -124 -122 -124 ...
## $ magnet_arm_x : int -368 -369 -368 -372 -374 -369 -373 -372 -369 -376 ...
## $ magnet_arm_y : int 337 337 344 344 337 342 336 338 341 334 ...
## $ magnet_arm_z : int 516 513 513 512 506 513 509 510 518 516 ...
## $ kurtosis_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ kurtosis_picth_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ kurtosis_yaw_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_yaw_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_picth_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_yaw_arm : int NA NA NA NA NA NA NA NA NA NA ...
## $ min_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_yaw_arm : int NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_roll_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_pitch_arm : num NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_yaw_arm : int NA NA NA NA NA NA NA NA NA NA ...
## $ roll_dumbbell : num 13.1 13.1 12.9 13.4 13.4 ...
## $ pitch_dumbbell : num -70.5 -70.6 -70.3 -70.4 -70.4 ...
## $ yaw_dumbbell : num -84.9 -84.7 -85.1 -84.9 -84.9 ...
## $ kurtosis_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ kurtosis_picth_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ kurtosis_yaw_dumbbell : logi NA NA NA NA NA NA ...
## $ skewness_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_pitch_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ skewness_yaw_dumbbell : logi NA NA NA NA NA NA ...
## $ max_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_picth_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ max_yaw_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_pitch_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ min_yaw_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## $ amplitude_roll_dumbbell : num NA NA NA NA NA NA NA NA NA NA ...
## [list output truncated]
Table: Data
Preprocessing the data is crucial to ensure the model’s accuracy and performance. The following steps were taken:
Removal of Near-Zero Variance Predictors: Variables with very little variance were removed as they provide little to no information for model training. Handling Missing Data: Columns with excessive missing values (more than 50% NA) were excluded from the dataset. Removing Irrelevant Columns: Columns like user_name, raw_timestamp_part_1, raw_timestamp_part_2, and cvtd_timestamp were removed as they don’t contribute to predicting “classe”. Factorizing the Target Variable: The “classe” variable was converted to a factor to ensure it was treated as a categorical variable.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.884173 -0.120137 0.001538 0.002088 0.097787 0.849132
## 'data.frame': 13737 obs. of 53 variables:
## $ roll_belt : num 1.41 1.41 1.48 1.42 1.43 1.45 1.43 1.42 1.42 1.48 ...
## $ pitch_belt : num 8.07 8.07 8.07 8.09 8.16 8.17 8.18 8.2 8.21 8.15 ...
## $ yaw_belt : num -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 ...
## $ total_accel_belt : int 3 3 3 3 3 3 3 3 3 3 ...
## $ gyros_belt_x : num 0 0.02 0.02 0.02 0.02 0.03 0.02 0.02 0.02 0 ...
## $ gyros_belt_y : num 0 0 0.02 0 0 0 0 0 0 0 ...
## $ gyros_belt_z : num -0.02 -0.02 -0.02 -0.02 -0.02 0 -0.02 0 -0.02 0 ...
## $ accel_belt_x : int -21 -22 -21 -22 -20 -21 -22 -22 -22 -21 ...
## $ accel_belt_y : int 4 4 2 3 2 4 2 4 4 4 ...
## $ accel_belt_z : int 22 22 24 21 24 22 23 21 21 23 ...
## $ magnet_belt_x : int -3 -7 -6 -4 1 -3 -2 -3 -8 0 ...
## $ magnet_belt_y : int 599 608 600 599 602 609 602 606 598 592 ...
## $ magnet_belt_z : int -313 -311 -302 -311 -312 -308 -319 -309 -310 -305 ...
## $ roll_arm : num -128 -128 -128 -128 -128 -128 -128 -128 -128 -129 ...
## $ pitch_arm : num 22.5 22.5 22.1 21.9 21.7 21.6 21.5 21.4 21.4 21.3 ...
## $ yaw_arm : num -161 -161 -161 -161 -161 -161 -161 -161 -161 -161 ...
## $ total_accel_arm : int 34 34 34 34 34 34 34 34 34 34 ...
## $ gyros_arm_x : num 0 0.02 0 0 0.02 0.02 0.02 0.02 0.02 0.02 ...
## $ gyros_arm_y : num 0 -0.02 -0.03 -0.03 -0.03 -0.03 -0.03 -0.02 0 0 ...
## $ gyros_arm_z : num -0.02 -0.02 0 0 -0.02 -0.02 0 -0.02 -0.03 -0.03 ...
## $ accel_arm_x : int -288 -290 -289 -289 -288 -288 -288 -287 -288 -289 ...
## $ accel_arm_y : int 109 110 111 111 109 110 111 111 111 109 ...
## $ accel_arm_z : int -123 -125 -123 -125 -122 -124 -123 -124 -124 -121 ...
## $ magnet_arm_x : int -368 -369 -374 -373 -369 -376 -363 -372 -371 -367 ...
## $ magnet_arm_y : int 337 337 337 336 341 334 343 338 331 340 ...
## $ magnet_arm_z : int 516 513 506 509 518 516 520 509 523 509 ...
## $ roll_dumbbell : num 13.1 13.1 13.4 13.1 13.2 ...
## $ pitch_dumbbell : num -70.5 -70.6 -70.4 -70.2 -70.4 ...
## $ yaw_dumbbell : num -84.9 -84.7 -84.9 -85.1 -84.9 ...
## $ total_accel_dumbbell: int 37 37 37 37 37 37 37 37 37 37 ...
## $ gyros_dumbbell_x : num 0 0 0 0 0 0 0 0 0.02 0 ...
## $ gyros_dumbbell_y : num -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 ...
## $ gyros_dumbbell_z : num 0 0 0 0 0 0 0 -0.02 -0.02 0 ...
## $ accel_dumbbell_x : int -234 -233 -233 -232 -232 -235 -233 -234 -234 -233 ...
## $ accel_dumbbell_y : int 47 47 48 47 47 48 47 48 48 48 ...
## $ accel_dumbbell_z : int -271 -269 -270 -270 -269 -270 -270 -269 -268 -271 ...
## $ magnet_dumbbell_x : int -559 -555 -554 -551 -549 -558 -554 -552 -554 -554 ...
## $ magnet_dumbbell_y : int 293 296 292 295 292 291 291 302 295 297 ...
## $ magnet_dumbbell_z : num -65 -64 -68 -70 -65 -69 -65 -69 -68 -73 ...
## $ roll_forearm : num 28.4 28.3 28 27.9 27.7 27.7 27.5 27.2 27.2 27.1 ...
## $ pitch_forearm : num -63.9 -63.9 -63.9 -63.9 -63.8 -63.8 -63.8 -63.9 -63.9 -64 ...
## $ yaw_forearm : num -153 -153 -152 -152 -152 -152 -152 -151 -151 -151 ...
## $ total_accel_forearm : int 36 36 36 36 36 36 36 36 36 36 ...
## $ gyros_forearm_x : num 0.03 0.02 0.02 0.02 0.03 0.02 0.02 0 0 0.02 ...
## $ gyros_forearm_y : num 0 0 0 0 0 0 0.02 0 -0.02 0 ...
## $ gyros_forearm_z : num -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.03 -0.03 -0.03 0 ...
## $ accel_forearm_x : int 192 192 189 195 193 190 191 193 193 194 ...
## $ accel_forearm_y : int 203 203 206 205 204 205 203 205 202 204 ...
## $ accel_forearm_z : int -215 -216 -214 -215 -214 -215 -215 -215 -214 -215 ...
## $ magnet_forearm_x : int -17 -18 -17 -18 -16 -22 -11 -15 -14 -13 ...
## $ magnet_forearm_y : num 654 661 655 659 653 656 657 655 659 656 ...
## $ magnet_forearm_z : num 476 473 473 470 476 473 478 472 478 471 ...
## $ classe : Factor w/ 5 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...
Table: Train Data
## 'data.frame': 5885 obs. of 53 variables:
## $ roll_belt : num 1.42 1.48 1.45 1.42 1.45 1.45 1.57 1.56 1.51 1.43 ...
## $ pitch_belt : num 8.07 8.05 8.06 8.13 8.18 8.2 8.09 8.1 8.1 8.17 ...
## $ yaw_belt : num -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.4 -94.3 -94.4 -94.4 ...
## $ total_accel_belt : int 3 3 3 3 3 3 3 3 3 3 ...
## $ gyros_belt_x : num 0 0.02 0.02 0.02 0.03 0 0.02 0.02 0.02 0 ...
## $ gyros_belt_y : num 0 0 0 0 0 0 0.02 0 0 0 ...
## $ gyros_belt_z : num -0.02 -0.03 -0.02 -0.02 -0.02 0 -0.02 -0.02 -0.02 -0.03 ...
## $ accel_belt_x : int -20 -22 -21 -22 -21 -21 -21 -21 -20 -22 ...
## $ accel_belt_y : int 5 3 4 4 2 2 3 4 4 4 ...
## $ accel_belt_z : int 23 21 21 21 23 22 21 21 22 19 ...
## $ magnet_belt_x : int -2 -6 0 -2 -5 -1 -2 -4 -3 4 ...
## $ magnet_belt_y : int 600 604 603 603 596 597 604 606 601 602 ...
## $ magnet_belt_z : int -305 -310 -312 -313 -317 -310 -313 -311 -318 -316 ...
## $ roll_arm : num -128 -128 -128 -128 -128 -129 -129 -129 -129 -129 ...
## $ pitch_arm : num 22.5 22.1 22 21.8 21.5 21.4 20.8 20.7 20.7 20.5 ...
## $ yaw_arm : num -161 -161 -161 -161 -161 -161 -161 -161 -161 -161 ...
## $ total_accel_arm : int 34 34 34 34 34 34 34 34 34 34 ...
## $ gyros_arm_x : num 0.02 0.02 0.02 0.02 0.02 0.02 0.03 0.02 -0.02 0.03 ...
## $ gyros_arm_y : num -0.02 -0.03 -0.03 -0.02 -0.03 0 -0.02 -0.02 0 -0.02 ...
## $ gyros_arm_z : num -0.02 0.02 0 0 0 -0.03 -0.02 -0.02 -0.02 0 ...
## $ accel_arm_x : int -289 -289 -289 -289 -290 -289 -289 -290 -289 -290 ...
## $ accel_arm_y : int 110 111 111 111 110 111 111 110 110 110 ...
## $ accel_arm_z : int -126 -123 -122 -124 -123 -124 -123 -123 -125 -126 ...
## $ magnet_arm_x : int -368 -372 -369 -372 -366 -374 -372 -373 -374 -375 ...
## $ magnet_arm_y : int 344 344 342 338 339 342 338 333 350 339 ...
## $ magnet_arm_z : int 513 512 513 510 509 510 510 509 516 508 ...
## $ roll_dumbbell : num 12.9 13.4 13.4 12.8 13.1 ...
## $ pitch_dumbbell : num -70.3 -70.4 -70.8 -70.3 -70.6 ...
## $ yaw_dumbbell : num -85.1 -84.9 -84.5 -85.1 -84.7 ...
## $ total_accel_dumbbell: int 37 37 37 37 37 37 37 37 37 37 ...
## $ gyros_dumbbell_x : num 0 0 0 0 0 0 0 0 0 0 ...
## $ gyros_dumbbell_y : num -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 ...
## $ gyros_dumbbell_z : num 0 -0.02 0 0 0 0 0 0 0 -0.02 ...
## $ accel_dumbbell_x : int -232 -232 -234 -234 -233 -234 -233 -234 -235 -234 ...
## $ accel_dumbbell_y : int 46 48 48 46 47 47 48 48 47 48 ...
## $ accel_dumbbell_z : int -270 -269 -269 -272 -269 -270 -270 -270 -271 -272 ...
## $ magnet_dumbbell_x : int -561 -552 -558 -555 -564 -554 -554 -557 -558 -556 ...
## $ magnet_dumbbell_y : int 298 303 294 300 299 294 301 294 291 298 ...
## $ magnet_dumbbell_z : num -63 -60 -66 -74 -64 -63 -65 -69 -71 -62 ...
## $ roll_forearm : num 28.3 28.1 27.9 27.8 27.6 27.2 27 26.9 27.1 26.7 ...
## $ pitch_forearm : num -63.9 -63.9 -63.9 -63.8 -63.8 -63.9 -63.9 -63.8 -63.7 -63.7 ...
## $ yaw_forearm : num -152 -152 -152 -152 -152 -151 -151 -151 -151 -151 ...
## $ total_accel_forearm : int 36 36 36 36 36 36 36 36 36 36 ...
## $ gyros_forearm_x : num 0.03 0.02 0.02 0.02 0.02 0 0.02 0.02 0.03 0 ...
## $ gyros_forearm_y : num -0.02 -0.02 -0.02 -0.02 -0.02 -0.02 -0.03 -0.02 -0.03 -0.02 ...
## $ gyros_forearm_z : num 0 0 -0.03 0 -0.02 -0.02 -0.02 -0.02 0 -0.02 ...
## $ accel_forearm_x : int 196 189 193 193 193 192 191 194 193 196 ...
## $ accel_forearm_y : int 204 206 203 205 205 201 206 206 203 207 ...
## $ accel_forearm_z : int -213 -214 -215 -213 -214 -214 -213 -214 -213 -216 ...
## $ magnet_forearm_x : int -18 -16 -9 -9 -17 -16 -17 -10 -11 -15 ...
## $ magnet_forearm_y : num 658 658 660 660 657 656 654 653 661 650 ...
## $ magnet_forearm_z : num 469 469 478 474 465 472 478 467 470 473 ...
## $ classe : Factor w/ 5 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...
Table: Test Data
Improves model stability by removing redundant information. Highly correlated predictors can cause issues like multicollinearity, which can affect the stability and interpretability of your model
Correlation Matrix: Helps to understand the relationships between numeric predictors. findCorrelation Function: Efficiently identifies and removes highly correlated predictors based on a specified threshold. Data Reduction: Improves model stability by removing redundant information.
## [1] "Random Forest"
## Random Forest
##
## 13737 samples
## 52 predictor
## 5 classes: 'A', 'B', 'C', 'D', 'E'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 12362, 12363, 12364, 12364, 12364, 12363, ...
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.9916286 0.9894096
## 27 0.9924291 0.9904234
## 52 0.9873331 0.9839752
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 27.
## [1] "Gradient Boosting"
## Stochastic Gradient Boosting
##
## 13737 samples
## 52 predictor
## 5 classes: 'A', 'B', 'C', 'D', 'E'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 12362, 12363, 12364, 12364, 12364, 12363, ...
## Resampling results across tuning parameters:
##
## interaction.depth n.trees Accuracy Kappa
## 1 50 0.7530025 0.6870463
## 1 100 0.8187374 0.7706256
## 1 150 0.8530229 0.8140483
## 2 50 0.8561540 0.8178078
## 2 100 0.9060186 0.8810658
## 2 150 0.9304792 0.9120372
## 3 50 0.8943715 0.8663085
## 3 100 0.9418351 0.9264136
## 3 150 0.9609081 0.9505472
##
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
##
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were n.trees = 150, interaction.depth =
## 3, shrinkage = 0.1 and n.minobsinnode = 10.
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 1653 32 0 2 3
## B 12 1073 28 7 10
## C 5 34 980 24 21
## D 1 0 16 923 17
## E 3 0 2 8 1031
##
## Overall Statistics
##
## Accuracy : 0.9618
## 95% CI : (0.9565, 0.9665)
## No Information Rate : 0.2845
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9516
##
## Mcnemar's Test P-Value : 9.061e-08
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 0.9875 0.9421 0.9552 0.9575 0.9529
## Specificity 0.9912 0.9880 0.9827 0.9931 0.9973
## Pos Pred Value 0.9781 0.9496 0.9211 0.9645 0.9875
## Neg Pred Value 0.9950 0.9861 0.9905 0.9917 0.9895
## Prevalence 0.2845 0.1935 0.1743 0.1638 0.1839
## Detection Rate 0.2809 0.1823 0.1665 0.1568 0.1752
## Detection Prevalence 0.2872 0.1920 0.1808 0.1626 0.1774
## Balanced Accuracy 0.9893 0.9650 0.9689 0.9753 0.9751
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 1674 5 0 1 0
## B 0 1133 3 0 1
## C 0 1 1016 12 0
## D 0 0 7 947 1
## E 0 0 0 4 1080
##
## Overall Statistics
##
## Accuracy : 0.9941
## 95% CI : (0.9917, 0.9959)
## No Information Rate : 0.2845
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9925
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 1.0000 0.9947 0.9903 0.9824 0.9982
## Specificity 0.9986 0.9992 0.9973 0.9984 0.9992
## Pos Pred Value 0.9964 0.9965 0.9874 0.9916 0.9963
## Neg Pred Value 1.0000 0.9987 0.9979 0.9966 0.9996
## Prevalence 0.2845 0.1935 0.1743 0.1638 0.1839
## Detection Rate 0.2845 0.1925 0.1726 0.1609 0.1835
## Detection Prevalence 0.2855 0.1932 0.1749 0.1623 0.1842
## Balanced Accuracy 0.9993 0.9969 0.9938 0.9904 0.9987
For this multiclass classification problem, several models could be considered. A Random Forest model was selected due to its robustness, ability to handle large datasets with higher dimensionality, and relatively minimal tuning requirements. The model’s ability to handle correlated features also made it an ideal choice for this dataset.
The model was trained using the caret package with a 10-fold cross-validation strategy to ensure the model’s performance was robust and generalizable.
Cross-Validation: This technique divides the training data into 10 parts, trains the model on 9 parts, and validates it on the remaining part. This process is repeated 10 times, with each part serving as the validation set once. The results are averaged to provide an estimate of model performance on unseen data.
Here the comparison of the 3 selected models is presented.
##
## Call:
## summary.resamples(object = resamples)
##
## Models: rf, svm, gbm
## Number of resamples: 10
##
## Accuracy
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## rf 0.9876184 0.9912632 0.9923578 0.9924291 0.9934486 0.9970888 0
## svm 0.9898108 0.9912600 0.9916305 0.9914830 0.9919927 0.9934450 0
## gbm 0.9533867 0.9581670 0.9606987 0.9609081 0.9632590 0.9679767 0
##
## Kappa
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## rf 0.9843351 0.9889503 0.9903318 0.9904234 0.9917127 0.9963177 0
## svm 0.9871118 0.9889428 0.9894149 0.9892271 0.9898732 0.9917082 0
## gbm 0.9410101 0.9470939 0.9502992 0.9505472 0.9535224 0.9594746 0
Due to the slightly higher performance, random forest model will be used to evaluate the test set
## Confusion Matrix and Statistics
##
## Reference
## Prediction A B C D E
## A 1673 6 0 0 0
## B 1 1126 5 0 0
## C 0 7 1018 10 4
## D 0 0 3 954 4
## E 0 0 0 0 1074
##
## Overall Statistics
##
## Accuracy : 0.9932
## 95% CI : (0.9908, 0.9951)
## No Information Rate : 0.2845
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9914
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: A Class: B Class: C Class: D Class: E
## Sensitivity 0.9994 0.9886 0.9922 0.9896 0.9926
## Specificity 0.9986 0.9987 0.9957 0.9986 1.0000
## Pos Pred Value 0.9964 0.9947 0.9798 0.9927 1.0000
## Neg Pred Value 0.9998 0.9973 0.9983 0.9980 0.9983
## Prevalence 0.2845 0.1935 0.1743 0.1638 0.1839
## Detection Rate 0.2843 0.1913 0.1730 0.1621 0.1825
## Detection Prevalence 0.2853 0.1924 0.1766 0.1633 0.1825
## Balanced Accuracy 0.9990 0.9937 0.9939 0.9941 0.9963
Confusion Matrix: This provides a detailed breakdown of the model’s
performance across all classes, showing how often predictions were
correct versus incorrect. Accuracy: The overall accuracy of the model
was derived from the confusion matrix. The model’s performance was
strong, indicating effective classification of the different exercise
forms.
# Summary of the Visualizations: Feature Importance Plot: This plot will
help you understand which features contributed most to the model’s
predictions. Higher importance values indicate more influential
features. Confusion Matrix Visualization: This heatmap-style
visualization will clearly show the distribution of correct and
incorrect predictions, making it easier to spot any misclassification
patterns. Accuracy Plot from Cross-Validation: This plot will show how
the model’s accuracy varied across different cross-validation folds,
helping assess its consistency.
The expected out-of-sample error was estimated using the cross-validation results. Since cross-validation provides an average performance measure across different subsets of the training data, it gives a reliable estimate of how the model will perform on completely unseen data. Expected Out-of-Sample Error Estimate: The Random Forest model was evaluated using 10-fold cross-validation. The average accuracy across the folds was approximately < 5%. Therefore, the expected out-of-sample error, which reflects the error rate when the model is applied to new, unseen data, is estimated to be around 0.008%.
## [1] "expected out of sample_error"
## [1] 0.007570892
## [1] "mean accuracy"
## [1] 0.9924291
x | |
---|---|
1.Accuracy | 0.9965986 |
2.Accuracy | 0.9897959 |
3.Accuracy | 1.0000000 |
4.Accuracy | 0.9966102 |
5.Accuracy | 0.9965986 |
6.Accuracy | 0.9965986 |
7.Accuracy | 0.9795918 |
8.Accuracy | 0.9931973 |
9.Accuracy | 1.0000000 |
10.Accuracy | 0.9830508 |
11.Accuracy | 0.9864407 |
12.Accuracy | 0.9931973 |
13.Accuracy | 0.9931973 |
14.Accuracy | 0.9965986 |
15.Accuracy | 0.9966102 |
16.Accuracy | 1.0000000 |
17.Accuracy | 0.9830508 |
18.Accuracy | 0.9863946 |
19.Accuracy | 1.0000000 |
20.Accuracy | 0.9965986 |
x |
---|
0.9932065 |
## 'data.frame': 20 obs. of 53 variables:
## $ roll_belt : num 123 1.02 0.87 125 1.35 -5.92 1.2 0.43 0.93 114 ...
## $ pitch_belt : num 27 4.87 1.82 -41.6 3.33 1.59 4.44 4.15 6.72 22.4 ...
## $ yaw_belt : num -4.75 -88.9 -88.5 162 -88.6 -87.7 -87.3 -88.5 -93.7 -13.1 ...
## $ total_accel_belt : int 20 4 5 17 3 4 4 4 4 18 ...
## $ gyros_belt_x : num -0.5 -0.06 0.05 0.11 0.03 0.1 -0.06 -0.18 0.1 0.14 ...
## $ gyros_belt_y : num -0.02 -0.02 0.02 0.11 0.02 0.05 0 -0.02 0 0.11 ...
## $ gyros_belt_z : num -0.46 -0.07 0.03 -0.16 0 -0.13 0 -0.03 -0.02 -0.16 ...
## $ accel_belt_x : int -38 -13 1 46 -8 -11 -14 -10 -15 -25 ...
## $ accel_belt_y : int 69 11 -1 45 4 -16 2 -2 1 63 ...
## $ accel_belt_z : int -179 39 49 -156 27 38 35 42 32 -158 ...
## $ magnet_belt_x : int -13 43 29 169 33 31 50 39 -6 10 ...
## $ magnet_belt_y : int 581 636 631 608 566 638 622 635 600 601 ...
## $ magnet_belt_z : int -382 -309 -312 -304 -418 -291 -315 -305 -302 -330 ...
## $ roll_arm : num 40.7 0 0 -109 76.1 0 0 0 -137 -82.4 ...
## $ pitch_arm : num -27.8 0 0 55 2.76 0 0 0 11.2 -63.8 ...
## $ yaw_arm : num 178 0 0 -142 102 0 0 0 -167 -75.3 ...
## $ total_accel_arm : int 10 38 44 25 29 14 15 22 34 32 ...
## $ gyros_arm_x : num -1.65 -1.17 2.1 0.22 -1.96 0.02 2.36 -3.71 0.03 0.26 ...
## $ gyros_arm_y : num 0.48 0.85 -1.36 -0.51 0.79 0.05 -1.01 1.85 -0.02 -0.5 ...
## $ gyros_arm_z : num -0.18 -0.43 1.13 0.92 -0.54 -0.07 0.89 -0.69 -0.02 0.79 ...
## $ accel_arm_x : int 16 -290 -341 -238 -197 -26 99 -98 -287 -301 ...
## $ accel_arm_y : int 38 215 245 -57 200 130 79 175 111 -42 ...
## $ accel_arm_z : int 93 -90 -87 6 -30 -19 -67 -78 -122 -80 ...
## $ magnet_arm_x : int -326 -325 -264 -173 -170 396 702 535 -367 -420 ...
## $ magnet_arm_y : int 385 447 474 257 275 176 15 215 335 294 ...
## $ magnet_arm_z : int 481 434 413 633 617 516 217 385 520 493 ...
## $ roll_dumbbell : num -17.7 54.5 57.1 43.1 -101.4 ...
## $ pitch_dumbbell : num 25 -53.7 -51.4 -30 -53.4 ...
## $ yaw_dumbbell : num 126.2 -75.5 -75.2 -103.3 -14.2 ...
## $ total_accel_dumbbell: int 9 31 29 18 4 29 29 29 3 2 ...
## $ gyros_dumbbell_x : num 0.64 0.34 0.39 0.1 0.29 -0.59 0.34 0.37 0.03 0.42 ...
## $ gyros_dumbbell_y : num 0.06 0.05 0.14 -0.02 -0.47 0.8 0.16 0.14 -0.21 0.51 ...
## $ gyros_dumbbell_z : num -0.61 -0.71 -0.34 0.05 -0.46 1.1 -0.23 -0.39 -0.21 -0.03 ...
## $ accel_dumbbell_x : int 21 -153 -141 -51 -18 -138 -145 -140 0 -7 ...
## $ accel_dumbbell_y : int -15 155 155 72 -30 166 150 159 25 -20 ...
## $ accel_dumbbell_z : int 81 -205 -196 -148 -5 -186 -190 -191 9 7 ...
## $ magnet_dumbbell_x : int 523 -502 -506 -576 -424 -543 -484 -515 -519 -531 ...
## $ magnet_dumbbell_y : int -528 388 349 238 252 262 354 350 348 321 ...
## $ magnet_dumbbell_z : int -56 -36 41 53 312 96 97 53 -32 -164 ...
## $ roll_forearm : num 141 109 131 0 -176 150 155 -161 15.5 13.2 ...
## $ pitch_forearm : num 49.3 -17.6 -32.6 0 -2.16 1.46 34.5 43.6 -63.5 19.4 ...
## $ yaw_forearm : num 156 106 93 0 -47.9 89.7 152 -89.5 -139 -105 ...
## $ total_accel_forearm : int 33 39 34 43 24 43 32 47 36 24 ...
## $ gyros_forearm_x : num 0.74 1.12 0.18 1.38 -0.75 -0.88 -0.53 0.63 0.03 0.02 ...
## $ gyros_forearm_y : num -3.34 -2.78 -0.79 0.69 3.1 4.26 1.8 -0.74 0.02 0.13 ...
## $ gyros_forearm_z : num -0.59 -0.18 0.28 1.8 0.8 1.35 0.75 0.49 -0.02 -0.07 ...
## $ accel_forearm_x : int -110 212 154 -92 131 230 -192 -151 195 -212 ...
## $ accel_forearm_y : int 267 297 271 406 -93 322 170 -331 204 98 ...
## $ accel_forearm_z : int -149 -118 -129 -39 172 -144 -175 -282 -217 -7 ...
## $ magnet_forearm_x : int -714 -237 -51 -233 375 -300 -678 -109 0 -403 ...
## $ magnet_forearm_y : int 419 791 698 783 -787 800 284 -619 652 723 ...
## $ magnet_forearm_z : int 617 873 783 521 91 884 585 -32 469 512 ...
## $ problem_id : int 1 2 3 4 5 6 7 8 9 10 ...
Table: Testing- Validating Data
## [1] B A B A A E D B A A B C B A E E A B B B
## Levels: A B C D E
In this analysis, a Random Forest model was built using the caret package to predict the “classe” variable from a dataset of wearable device readings during exercise. The model was carefully trained and validated using cross-validation to ensure it generalizes well to unseen data. The expected out-of-sample error was estimated based on cross-validation results, and the model was evaluated on a test set, showing strong performance. This approach provides a reliable method for predicting exercise form based on sensor data.
Note: testing validating data showed 100% match