This course project utilises tidyverse tool set to execute various tasks; hence, it follows a streamlines process of modelling. The project is divided into four stages i.e., data import, data wrangling, data modelling, and results.
First of all, the datasets “pml-testing.csv” and “pml-training.csv” must be available in working directory in order to execute the same code on any other machine. Also, we will load “readr” library from tidyverse collection of data science libraries.
library(readr)
training <- read_csv("pml-training.csv")
## New names:
## Rows: 19622 Columns: 160
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (34): user_name, cvtd_timestamp, new_window, kurtosis_roll_belt, kurtos... dbl
## (126): ...1, raw_timestamp_part_1, raw_timestamp_part_2, num_window, rol...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
testing <- read_csv("pml-testing.csv")
## New names:
## Rows: 20 Columns: 160
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (3): user_name, cvtd_timestamp, new_window dbl (57): ...1,
## raw_timestamp_part_1, raw_timestamp_part_2, num_window, rol... lgl (100):
## kurtosis_roll_belt, kurtosis_picth_belt, kurtosis_yaw_belt, skewn...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
Data wrangling refers to cleaning or transforming the data into appropriate format for machine learning. This stage is involves series of tasks that we will go one-by-one.
Firstly we will convert the data object into a tidy object using as_tibble function from tidyr.
library(tidyr)
training <- as_tibble(training)
testing <- as_tibble(testing)
Secondly, we will see a glimpse of our dataset using skimr.
library(skimr)
training %>%
skim()
| Name | Piped data |
| Number of rows | 19622 |
| Number of columns | 160 |
| _______________________ | |
| Column type frequency: | |
| character | 34 |
| numeric | 126 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| user_name | 0 | 1.00 | 5 | 8 | 0 | 6 | 0 |
| cvtd_timestamp | 0 | 1.00 | 16 | 16 | 0 | 20 | 0 |
| new_window | 0 | 1.00 | 2 | 3 | 0 | 2 | 0 |
| kurtosis_roll_belt | 19216 | 0.02 | 7 | 9 | 0 | 396 | 0 |
| kurtosis_picth_belt | 19216 | 0.02 | 7 | 9 | 0 | 316 | 0 |
| kurtosis_yaw_belt | 19216 | 0.02 | 7 | 7 | 0 | 1 | 0 |
| skewness_roll_belt.1 | 19216 | 0.02 | 7 | 9 | 0 | 337 | 0 |
| skewness_yaw_belt | 19216 | 0.02 | 7 | 7 | 0 | 1 | 0 |
| max_yaw_belt | 19216 | 0.02 | 3 | 7 | 0 | 67 | 0 |
| min_yaw_belt | 19216 | 0.02 | 3 | 7 | 0 | 67 | 0 |
| amplitude_yaw_belt | 19216 | 0.02 | 4 | 7 | 0 | 3 | 0 |
| kurtosis_roll_arm | 19216 | 0.02 | 7 | 8 | 0 | 329 | 0 |
| kurtosis_picth_arm | 19216 | 0.02 | 7 | 8 | 0 | 327 | 0 |
| kurtosis_yaw_arm | 19216 | 0.02 | 7 | 8 | 0 | 394 | 0 |
| skewness_roll_arm | 19216 | 0.02 | 7 | 8 | 0 | 330 | 0 |
| skewness_pitch_arm | 19216 | 0.02 | 7 | 8 | 0 | 327 | 0 |
| skewness_yaw_arm | 19216 | 0.02 | 7 | 8 | 0 | 394 | 0 |
| kurtosis_roll_dumbbell | 19216 | 0.02 | 6 | 7 | 0 | 397 | 0 |
| kurtosis_picth_dumbbell | 19216 | 0.02 | 6 | 7 | 0 | 400 | 0 |
| kurtosis_yaw_dumbbell | 19216 | 0.02 | 7 | 7 | 0 | 1 | 0 |
| skewness_yaw_dumbbell | 19216 | 0.02 | 7 | 7 | 0 | 1 | 0 |
| max_yaw_dumbbell | 19216 | 0.02 | 3 | 7 | 0 | 72 | 0 |
| min_yaw_dumbbell | 19216 | 0.02 | 3 | 7 | 0 | 72 | 0 |
| amplitude_yaw_dumbbell | 19216 | 0.02 | 4 | 7 | 0 | 2 | 0 |
| kurtosis_roll_forearm | 19216 | 0.02 | 6 | 7 | 0 | 321 | 0 |
| kurtosis_picth_forearm | 19216 | 0.02 | 6 | 7 | 0 | 322 | 0 |
| kurtosis_yaw_forearm | 19216 | 0.02 | 7 | 7 | 0 | 1 | 0 |
| skewness_roll_forearm | 19216 | 0.02 | 6 | 7 | 0 | 322 | 0 |
| skewness_pitch_forearm | 19216 | 0.02 | 6 | 7 | 0 | 318 | 0 |
| skewness_yaw_forearm | 19216 | 0.02 | 7 | 7 | 0 | 1 | 0 |
| max_yaw_forearm | 19216 | 0.02 | 3 | 7 | 0 | 44 | 0 |
| min_yaw_forearm | 19216 | 0.02 | 3 | 7 | 0 | 44 | 0 |
| amplitude_yaw_forearm | 19216 | 0.02 | 4 | 7 | 0 | 2 | 0 |
| classe | 0 | 1.00 | 1 | 1 | 0 | 5 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| …1 | 0 | 1.00 | 9811.50 | 5664.53 | 1.00000e+00 | 4906.25 | 9811.50 | 14716.75 | 1.962200e+04 | ▇▇▇▇▇ |
| raw_timestamp_part_1 | 0 | 1.00 | 1322827119.27 | 204927.68 | 1.32249e+09 | 1322673099.00 | 1322832920.00 | 1323084264.00 | 1.323095e+09 | ▃▃▇▁▆ |
| raw_timestamp_part_2 | 0 | 1.00 | 500656.14 | 288222.88 | 2.94000e+02 | 252912.25 | 496380.00 | 751890.75 | 9.988010e+05 | ▇▇▇▇▇ |
| num_window | 0 | 1.00 | 430.64 | 247.91 | 1.00000e+00 | 222.00 | 424.00 | 644.00 | 8.640000e+02 | ▇▇▇▇▇ |
| roll_belt | 0 | 1.00 | 64.41 | 62.75 | -2.89000e+01 | 1.10 | 113.00 | 123.00 | 1.620000e+02 | ▇▁▁▅▅ |
| pitch_belt | 0 | 1.00 | 0.31 | 22.35 | -5.58000e+01 | 1.76 | 5.28 | 14.90 | 6.030000e+01 | ▃▁▇▅▁ |
| yaw_belt | 0 | 1.00 | -11.21 | 95.19 | -1.80000e+02 | -88.30 | -13.00 | 12.90 | 1.790000e+02 | ▁▇▅▁▃ |
| total_accel_belt | 0 | 1.00 | 11.31 | 7.74 | 0.00000e+00 | 3.00 | 17.00 | 18.00 | 2.900000e+01 | ▇▁▂▆▁ |
| skewness_roll_belt | 19225 | 0.02 | -0.03 | 0.92 | -5.74000e+00 | -0.44 | 0.00 | 0.42 | 3.600000e+00 | ▁▁▆▇▁ |
| max_roll_belt | 19216 | 0.02 | -6.67 | 94.59 | -9.43000e+01 | -88.00 | -5.10 | 18.50 | 1.800000e+02 | ▇▅▁▁▃ |
| max_picth_belt | 19216 | 0.02 | 12.92 | 8.01 | 3.00000e+00 | 5.00 | 18.00 | 19.00 | 3.000000e+01 | ▇▁▆▃▁ |
| min_roll_belt | 19216 | 0.02 | -10.44 | 93.62 | -1.80000e+02 | -88.40 | -7.85 | 9.05 | 1.730000e+02 | ▁▇▅▁▃ |
| min_pitch_belt | 19216 | 0.02 | 10.76 | 7.47 | 0.00000e+00 | 3.00 | 16.00 | 17.00 | 2.300000e+01 | ▇▁▁▇▂ |
| amplitude_roll_belt | 19216 | 0.02 | 3.77 | 25.26 | 0.00000e+00 | 0.30 | 1.00 | 2.08 | 3.600000e+02 | ▇▁▁▁▁ |
| amplitude_pitch_belt | 19216 | 0.02 | 2.17 | 2.36 | 0.00000e+00 | 1.00 | 1.00 | 2.00 | 1.200000e+01 | ▇▁▁▁▁ |
| var_total_accel_belt | 19216 | 0.02 | 0.93 | 2.22 | 0.00000e+00 | 0.10 | 0.20 | 0.30 | 1.650000e+01 | ▇▁▁▁▁ |
| avg_roll_belt | 19216 | 0.02 | 68.06 | 63.14 | -2.74000e+01 | 1.10 | 116.35 | 123.38 | 1.574000e+02 | ▇▁▁▃▆ |
| stddev_roll_belt | 19216 | 0.02 | 1.34 | 2.44 | 0.00000e+00 | 0.20 | 0.40 | 0.70 | 1.420000e+01 | ▇▁▁▁▁ |
| var_roll_belt | 19216 | 0.02 | 7.70 | 23.16 | 0.00000e+00 | 0.00 | 0.10 | 0.50 | 2.007000e+02 | ▇▁▁▁▁ |
| avg_pitch_belt | 19216 | 0.02 | 0.52 | 22.41 | -5.14000e+01 | 2.02 | 5.20 | 15.78 | 5.970000e+01 | ▃▁▇▃▁ |
| stddev_pitch_belt | 19216 | 0.02 | 0.60 | 0.64 | 0.00000e+00 | 0.20 | 0.40 | 0.70 | 4.000000e+00 | ▇▁▁▁▁ |
| var_pitch_belt | 19216 | 0.02 | 0.77 | 1.76 | 0.00000e+00 | 0.00 | 0.10 | 0.50 | 1.620000e+01 | ▇▁▁▁▁ |
| avg_yaw_belt | 19216 | 0.02 | -8.83 | 93.48 | -1.38300e+02 | -88.18 | -6.55 | 14.12 | 1.735000e+02 | ▇▁▅▁▃ |
| stddev_yaw_belt | 19216 | 0.02 | 1.34 | 10.29 | 0.00000e+00 | 0.10 | 0.30 | 0.70 | 1.766000e+02 | ▇▁▁▁▁ |
| var_yaw_belt | 19216 | 0.02 | 107.49 | 1655.52 | 0.00000e+00 | 0.01 | 0.09 | 0.48 | 3.118324e+04 | ▇▁▁▁▁ |
| gyros_belt_x | 0 | 1.00 | -0.01 | 0.21 | -1.04000e+00 | -0.03 | 0.03 | 0.11 | 2.220000e+00 | ▁▇▁▁▁ |
| gyros_belt_y | 0 | 1.00 | 0.04 | 0.08 | -6.40000e-01 | 0.00 | 0.02 | 0.11 | 6.400000e-01 | ▁▁▇▁▁ |
| gyros_belt_z | 0 | 1.00 | -0.13 | 0.24 | -1.46000e+00 | -0.20 | -0.10 | -0.02 | 1.620000e+00 | ▁▂▇▁▁ |
| accel_belt_x | 0 | 1.00 | -5.59 | 29.64 | -1.20000e+02 | -21.00 | -15.00 | -5.00 | 8.500000e+01 | ▁▁▇▁▂ |
| accel_belt_y | 0 | 1.00 | 30.15 | 28.58 | -6.90000e+01 | 3.00 | 35.00 | 61.00 | 1.640000e+02 | ▁▇▇▁▁ |
| accel_belt_z | 0 | 1.00 | -72.59 | 100.45 | -2.75000e+02 | -162.00 | -152.00 | 27.00 | 1.050000e+02 | ▁▇▁▅▃ |
| magnet_belt_x | 0 | 1.00 | 55.60 | 64.18 | -5.20000e+01 | 9.00 | 35.00 | 59.00 | 4.850000e+02 | ▇▁▂▁▁ |
| magnet_belt_y | 0 | 1.00 | 593.68 | 35.68 | 3.54000e+02 | 581.00 | 601.00 | 610.00 | 6.730000e+02 | ▁▁▁▇▃ |
| magnet_belt_z | 0 | 1.00 | -345.48 | 65.21 | -6.23000e+02 | -375.00 | -320.00 | -306.00 | 2.930000e+02 | ▁▇▁▁▁ |
| roll_arm | 0 | 1.00 | 17.83 | 72.74 | -1.80000e+02 | -31.78 | 0.00 | 77.30 | 1.800000e+02 | ▁▃▇▆▂ |
| pitch_arm | 0 | 1.00 | -4.61 | 30.68 | -8.88000e+01 | -25.90 | 0.00 | 11.20 | 8.850000e+01 | ▁▅▇▂▁ |
| yaw_arm | 0 | 1.00 | -0.62 | 71.36 | -1.80000e+02 | -43.10 | 0.00 | 45.88 | 1.800000e+02 | ▁▃▇▃▂ |
| total_accel_arm | 0 | 1.00 | 25.51 | 10.52 | 1.00000e+00 | 17.00 | 27.00 | 33.00 | 6.600000e+01 | ▃▆▇▁▁ |
| var_accel_arm | 19216 | 0.02 | 53.24 | 53.98 | 0.00000e+00 | 9.03 | 40.61 | 75.62 | 3.317000e+02 | ▇▂▁▁▁ |
| avg_roll_arm | 19216 | 0.02 | 12.68 | 68.58 | -1.66670e+02 | -38.37 | 0.00 | 76.32 | 1.633300e+02 | ▁▅▇▅▃ |
| stddev_roll_arm | 19216 | 0.02 | 11.20 | 17.10 | 0.00000e+00 | 1.38 | 5.70 | 14.92 | 1.619600e+02 | ▇▁▁▁▁ |
| var_roll_arm | 19216 | 0.02 | 417.26 | 2007.16 | 0.00000e+00 | 1.90 | 32.52 | 222.65 | 2.623221e+04 | ▇▁▁▁▁ |
| avg_pitch_arm | 19216 | 0.02 | -4.90 | 26.83 | -8.17700e+01 | -22.77 | 0.00 | 8.28 | 7.566000e+01 | ▁▃▇▂▁ |
| stddev_pitch_arm | 19216 | 0.02 | 10.38 | 9.40 | 0.00000e+00 | 1.64 | 8.13 | 16.33 | 4.341000e+01 | ▇▅▂▁▁ |
| var_pitch_arm | 19216 | 0.02 | 195.86 | 292.60 | 0.00000e+00 | 2.70 | 66.15 | 266.58 | 1.884560e+03 | ▇▁▁▁▁ |
| avg_yaw_arm | 19216 | 0.02 | 2.36 | 61.33 | -1.73440e+02 | -29.20 | 0.00 | 38.18 | 1.520000e+02 | ▁▂▇▃▂ |
| stddev_yaw_arm | 19216 | 0.02 | 22.27 | 23.69 | 0.00000e+00 | 2.58 | 16.68 | 35.98 | 1.770400e+02 | ▇▂▁▁▁ |
| var_yaw_arm | 19216 | 0.02 | 1055.93 | 2722.17 | 0.00000e+00 | 6.64 | 278.31 | 1294.85 | 3.134457e+04 | ▇▁▁▁▁ |
| gyros_arm_x | 0 | 1.00 | 0.04 | 1.99 | -6.37000e+00 | -1.33 | 0.08 | 1.57 | 4.870000e+00 | ▁▃▇▆▂ |
| gyros_arm_y | 0 | 1.00 | -0.26 | 0.85 | -3.44000e+00 | -0.80 | -0.24 | 0.14 | 2.840000e+00 | ▁▂▇▂▁ |
| gyros_arm_z | 0 | 1.00 | 0.27 | 0.55 | -2.33000e+00 | -0.07 | 0.23 | 0.72 | 3.020000e+00 | ▁▂▇▂▁ |
| accel_arm_x | 0 | 1.00 | -60.24 | 182.04 | -4.04000e+02 | -242.00 | -44.00 | 84.00 | 4.370000e+02 | ▇▅▇▅▁ |
| accel_arm_y | 0 | 1.00 | 32.60 | 109.87 | -3.18000e+02 | -54.00 | 14.00 | 139.00 | 3.080000e+02 | ▁▃▇▆▂ |
| accel_arm_z | 0 | 1.00 | -71.25 | 134.65 | -6.36000e+02 | -143.00 | -47.00 | 23.00 | 2.920000e+02 | ▁▁▅▇▁ |
| magnet_arm_x | 0 | 1.00 | 191.72 | 443.64 | -5.84000e+02 | -300.00 | 289.00 | 637.00 | 7.820000e+02 | ▆▃▂▃▇ |
| magnet_arm_y | 0 | 1.00 | 156.61 | 201.91 | -3.92000e+02 | -9.00 | 202.00 | 323.00 | 5.830000e+02 | ▁▅▅▇▂ |
| magnet_arm_z | 0 | 1.00 | 306.49 | 326.62 | -5.97000e+02 | 131.25 | 444.00 | 545.00 | 6.940000e+02 | ▁▂▂▃▇ |
| max_roll_arm | 19216 | 0.02 | 11.24 | 26.93 | -7.31000e+01 | -0.18 | 4.95 | 26.78 | 8.550000e+01 | ▁▂▇▃▁ |
| max_picth_arm | 19216 | 0.02 | 35.75 | 69.62 | -1.73000e+02 | -1.98 | 23.25 | 95.97 | 1.800000e+02 | ▁▂▇▆▃ |
| max_yaw_arm | 19216 | 0.02 | 35.46 | 10.45 | 4.00000e+00 | 29.00 | 34.00 | 41.00 | 6.500000e+01 | ▁▂▇▃▁ |
| min_roll_arm | 19216 | 0.02 | -21.22 | 28.72 | -8.91000e+01 | -41.98 | -22.45 | 0.00 | 6.640000e+01 | ▂▆▇▂▁ |
| min_pitch_arm | 19216 | 0.02 | -33.92 | 60.83 | -1.80000e+02 | -72.62 | -33.85 | 0.00 | 1.520000e+02 | ▁▆▇▂▁ |
| min_yaw_arm | 19216 | 0.02 | 14.67 | 9.11 | 1.00000e+00 | 8.00 | 13.00 | 19.00 | 3.800000e+01 | ▆▇▃▂▂ |
| amplitude_roll_arm | 19216 | 0.02 | 32.45 | 27.39 | 0.00000e+00 | 5.43 | 28.45 | 50.96 | 1.195000e+02 | ▇▆▃▁▁ |
| amplitude_pitch_arm | 19216 | 0.02 | 69.68 | 66.98 | 0.00000e+00 | 9.93 | 54.90 | 115.17 | 3.600000e+02 | ▇▅▂▁▁ |
| amplitude_yaw_arm | 19216 | 0.02 | 20.79 | 12.28 | 0.00000e+00 | 13.00 | 22.00 | 28.75 | 5.200000e+01 | ▅▅▇▃▁ |
| roll_dumbbell | 0 | 1.00 | 23.84 | 69.93 | -1.53710e+02 | -18.49 | 48.17 | 67.61 | 1.535500e+02 | ▂▂▃▇▂ |
| pitch_dumbbell | 0 | 1.00 | -10.78 | 36.99 | -1.49590e+02 | -40.89 | -20.96 | 17.50 | 1.494000e+02 | ▁▆▇▂▁ |
| yaw_dumbbell | 0 | 1.00 | 1.67 | 82.52 | -1.50870e+02 | -77.64 | -3.32 | 79.64 | 1.549500e+02 | ▃▇▅▅▆ |
| skewness_roll_dumbbell | 19220 | 0.02 | -0.12 | 0.82 | -7.38000e+00 | -0.58 | -0.08 | 0.40 | 1.960000e+00 | ▁▁▁▇▆ |
| skewness_pitch_dumbbell | 19217 | 0.02 | -0.03 | 0.86 | -7.45000e+00 | -0.53 | -0.09 | 0.51 | 3.770000e+00 | ▁▁▂▇▁ |
| max_roll_dumbbell | 19216 | 0.02 | 13.76 | 48.30 | -7.01000e+01 | -27.15 | 14.85 | 50.58 | 1.370000e+02 | ▆▇▇▅▂ |
| max_picth_dumbbell | 19216 | 0.02 | 32.75 | 93.37 | -1.12900e+02 | -66.70 | 40.05 | 133.23 | 1.550000e+02 | ▆▃▂▂▇ |
| min_roll_dumbbell | 19216 | 0.02 | -41.24 | 34.71 | -1.49600e+02 | -59.68 | -43.55 | -25.20 | 7.320000e+01 | ▁▃▇▃▁ |
| min_pitch_dumbbell | 19216 | 0.02 | -33.18 | 74.28 | -1.47000e+02 | -91.80 | -66.15 | 21.20 | 1.209000e+02 | ▆▇▅▃▃ |
| amplitude_roll_dumbbell | 19216 | 0.02 | 55.00 | 54.94 | 0.00000e+00 | 14.97 | 35.05 | 81.04 | 2.564800e+02 | ▇▂▁▁▁ |
| amplitude_pitch_dumbbell | 19216 | 0.02 | 65.93 | 65.23 | 0.00000e+00 | 17.06 | 41.72 | 99.54 | 2.735900e+02 | ▇▂▂▁▁ |
| total_accel_dumbbell | 0 | 1.00 | 13.72 | 10.23 | 0.00000e+00 | 4.00 | 10.00 | 19.00 | 5.800000e+01 | ▇▅▃▁▁ |
| var_accel_dumbbell | 19216 | 0.02 | 4.39 | 13.51 | 0.00000e+00 | 0.38 | 1.00 | 3.43 | 2.304300e+02 | ▇▁▁▁▁ |
| avg_roll_dumbbell | 19216 | 0.02 | 23.86 | 62.90 | -1.28960e+02 | -12.33 | 48.23 | 64.37 | 1.259900e+02 | ▂▂▃▇▂ |
| stddev_roll_dumbbell | 19216 | 0.02 | 20.76 | 24.30 | 0.00000e+00 | 4.64 | 12.20 | 26.36 | 1.237800e+02 | ▇▂▁▁▁ |
| var_roll_dumbbell | 19216 | 0.02 | 1020.27 | 2262.56 | 0.00000e+00 | 21.52 | 148.95 | 694.65 | 1.532101e+04 | ▇▁▁▁▁ |
| avg_pitch_dumbbell | 19216 | 0.02 | -12.33 | 32.06 | -7.07300e+01 | -42.00 | -19.90 | 13.21 | 9.428000e+01 | ▇▇▇▂▁ |
| stddev_pitch_dumbbell | 19216 | 0.02 | 13.15 | 13.34 | 0.00000e+00 | 3.48 | 8.09 | 19.24 | 8.268000e+01 | ▇▂▁▁▁ |
| var_pitch_dumbbell | 19216 | 0.02 | 350.31 | 673.96 | 0.00000e+00 | 12.13 | 65.43 | 370.11 | 6.836020e+03 | ▇▁▁▁▁ |
| avg_yaw_dumbbell | 19216 | 0.02 | 0.20 | 78.21 | -1.17950e+02 | -76.70 | -4.50 | 71.23 | 1.349000e+02 | ▇▃▅▃▅ |
| stddev_yaw_dumbbell | 19216 | 0.02 | 16.65 | 17.71 | 0.00000e+00 | 3.88 | 10.26 | 24.67 | 1.070900e+02 | ▇▂▁▁▁ |
| var_yaw_dumbbell | 19216 | 0.02 | 589.84 | 1244.59 | 0.00000e+00 | 15.09 | 105.35 | 608.79 | 1.146791e+04 | ▇▁▁▁▁ |
| gyros_dumbbell_x | 0 | 1.00 | 0.16 | 1.51 | -2.04000e+02 | -0.03 | 0.13 | 0.35 | 2.220000e+00 | ▁▁▁▁▇ |
| gyros_dumbbell_y | 0 | 1.00 | 0.05 | 0.61 | -2.10000e+00 | -0.14 | 0.03 | 0.21 | 5.200000e+01 | ▇▁▁▁▁ |
| gyros_dumbbell_z | 0 | 1.00 | -0.13 | 2.29 | -2.38000e+00 | -0.31 | -0.13 | 0.03 | 3.170000e+02 | ▇▁▁▁▁ |
| accel_dumbbell_x | 0 | 1.00 | -28.62 | 67.32 | -4.19000e+02 | -50.00 | -8.00 | 11.00 | 2.350000e+02 | ▁▁▆▇▁ |
| accel_dumbbell_y | 0 | 1.00 | 52.63 | 80.75 | -1.89000e+02 | -8.00 | 41.50 | 111.00 | 3.150000e+02 | ▁▇▇▅▁ |
| accel_dumbbell_z | 0 | 1.00 | -38.32 | 109.47 | -3.34000e+02 | -142.00 | -1.00 | 38.00 | 3.180000e+02 | ▁▆▇▃▁ |
| magnet_dumbbell_x | 0 | 1.00 | -328.48 | 339.72 | -6.43000e+02 | -535.00 | -479.00 | -304.00 | 5.920000e+02 | ▇▂▁▁▂ |
| magnet_dumbbell_y | 0 | 1.00 | 220.97 | 326.87 | -3.60000e+03 | 231.00 | 311.00 | 390.00 | 6.330000e+02 | ▁▁▁▁▇ |
| magnet_dumbbell_z | 0 | 1.00 | 46.05 | 139.96 | -2.62000e+02 | -45.00 | 13.00 | 95.00 | 4.520000e+02 | ▁▇▆▂▂ |
| roll_forearm | 0 | 1.00 | 33.83 | 108.04 | -1.80000e+02 | -0.74 | 21.70 | 140.00 | 1.800000e+02 | ▃▂▇▂▇ |
| pitch_forearm | 0 | 1.00 | 10.71 | 28.15 | -7.25000e+01 | 0.00 | 9.24 | 28.40 | 8.980000e+01 | ▁▁▇▃▁ |
| yaw_forearm | 0 | 1.00 | 19.21 | 103.22 | -1.80000e+02 | -68.60 | 0.00 | 110.00 | 1.800000e+02 | ▅▅▇▆▇ |
| max_roll_forearm | 19216 | 0.02 | 24.49 | 31.04 | -6.66000e+01 | 0.00 | 26.80 | 45.95 | 8.980000e+01 | ▁▁▇▇▂ |
| max_picth_forearm | 19216 | 0.02 | 81.49 | 95.54 | -1.51000e+02 | 0.00 | 113.00 | 174.75 | 1.800000e+02 | ▁▂▃▂▇ |
| min_roll_forearm | 19216 | 0.02 | -0.17 | 22.59 | -7.25000e+01 | -6.07 | 0.00 | 12.07 | 6.210000e+01 | ▁▁▇▃▁ |
| min_pitch_forearm | 19216 | 0.02 | -57.57 | 110.74 | -1.80000e+02 | -175.00 | -61.00 | 0.00 | 1.670000e+02 | ▇▂▅▂▂ |
| amplitude_roll_forearm | 19216 | 0.02 | 24.65 | 25.88 | 0.00000e+00 | 1.12 | 17.77 | 39.88 | 1.260000e+02 | ▇▃▂▁▁ |
| amplitude_pitch_forearm | 19216 | 0.02 | 139.06 | 147.86 | 0.00000e+00 | 2.00 | 83.70 | 350.00 | 3.600000e+02 | ▇▃▁▁▅ |
| total_accel_forearm | 0 | 1.00 | 34.72 | 10.06 | 0.00000e+00 | 29.00 | 36.00 | 41.00 | 1.080000e+02 | ▁▇▂▁▁ |
| var_accel_forearm | 19216 | 0.02 | 33.50 | 33.95 | 0.00000e+00 | 6.76 | 21.16 | 51.24 | 1.726100e+02 | ▇▃▂▁▁ |
| avg_roll_forearm | 19216 | 0.02 | 33.17 | 79.52 | -1.77230e+02 | -0.91 | 11.17 | 107.13 | 1.772600e+02 | ▁▂▇▃▅ |
| stddev_roll_forearm | 19216 | 0.02 | 41.99 | 59.33 | 0.00000e+00 | 0.43 | 8.03 | 85.37 | 1.791700e+02 | ▇▁▁▁▂ |
| var_roll_forearm | 19216 | 0.02 | 5274.10 | 9177.18 | 0.00000e+00 | 0.18 | 64.48 | 7289.08 | 3.210224e+04 | ▇▁▁▁▁ |
| avg_pitch_forearm | 19216 | 0.02 | 11.80 | 24.83 | -6.81700e+01 | 0.00 | 12.02 | 28.48 | 7.209000e+01 | ▁▁▇▆▁ |
| stddev_pitch_forearm | 19216 | 0.02 | 7.98 | 8.73 | 0.00000e+00 | 0.34 | 5.52 | 12.87 | 4.775000e+01 | ▇▃▁▁▁ |
| var_pitch_forearm | 19216 | 0.02 | 139.59 | 266.49 | 0.00000e+00 | 0.11 | 30.43 | 165.53 | 2.279620e+03 | ▇▁▁▁▁ |
| avg_yaw_forearm | 19216 | 0.02 | 18.00 | 77.56 | -1.55060e+02 | -26.26 | 0.00 | 85.79 | 1.692400e+02 | ▂▃▇▆▃ |
| stddev_yaw_forearm | 19216 | 0.02 | 44.85 | 51.33 | 0.00000e+00 | 0.52 | 24.74 | 85.82 | 1.975100e+02 | ▇▂▂▂▁ |
| var_yaw_forearm | 19216 | 0.02 | 4639.85 | 7284.97 | 0.00000e+00 | 0.27 | 612.21 | 7368.41 | 3.900933e+04 | ▇▂▁▁▁ |
| gyros_forearm_x | 0 | 1.00 | 0.16 | 0.65 | -2.20000e+01 | -0.22 | 0.05 | 0.56 | 3.970000e+00 | ▁▁▁▁▇ |
| gyros_forearm_y | 0 | 1.00 | 0.08 | 3.10 | -7.02000e+00 | -1.46 | 0.03 | 1.62 | 3.110000e+02 | ▇▁▁▁▁ |
| gyros_forearm_z | 0 | 1.00 | 0.15 | 1.75 | -8.09000e+00 | -0.18 | 0.08 | 0.49 | 2.310000e+02 | ▇▁▁▁▁ |
| accel_forearm_x | 0 | 1.00 | -61.65 | 180.59 | -4.98000e+02 | -178.00 | -57.00 | 76.00 | 4.770000e+02 | ▂▆▇▅▁ |
| accel_forearm_y | 0 | 1.00 | 163.66 | 200.13 | -6.32000e+02 | 57.00 | 201.00 | 312.00 | 9.230000e+02 | ▁▂▇▅▁ |
| accel_forearm_z | 0 | 1.00 | -55.29 | 138.40 | -4.46000e+02 | -182.00 | -39.00 | 26.00 | 2.910000e+02 | ▁▇▅▅▃ |
| magnet_forearm_x | 0 | 1.00 | -312.58 | 346.96 | -1.28000e+03 | -616.00 | -378.00 | -73.00 | 6.720000e+02 | ▁▇▇▅▁ |
| magnet_forearm_y | 0 | 1.00 | 380.12 | 509.37 | -8.96000e+02 | 2.00 | 591.00 | 737.00 | 1.480000e+03 | ▂▂▂▇▁ |
| magnet_forearm_z | 0 | 1.00 | 393.61 | 369.27 | -9.73000e+02 | 191.00 | 511.00 | 653.00 | 1.090000e+03 | ▁▁▂▇▃ |
As we see, this is a very big dataset and there are too many variables. However, we’re only interested in accelerometer data. All variable starting with accel_ are related to accelerometer data. So, we need select only these variable and our outcome variable classe in further analysis of this project.
Using stringer we will create a list of variables that start with accel prefix, as accel_names.
library(stringr)
names <- names(training)
accel_names <- str_subset(names, "^accel")
Now using dplyr the accel_names string will be used to sub-set variables related to accelerometers on the belt, forearm, arm, and dumbbell.Also we will select the classe as our outcome variable and the new dataset is named as clean_data.
library(dplyr)
clean_data <- training %>%
select(accel_names, classe)
Let see how our new cleaned dataset looks like.
clean_data %>%
skim()
| Name | Piped data |
| Number of rows | 19622 |
| Number of columns | 13 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 12 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| classe | 0 | 1 | 1 | 1 | 0 | 5 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| accel_belt_x | 0 | 1 | -5.59 | 29.64 | -120 | -21 | -15.0 | -5 | 85 | ▁▁▇▁▂ |
| accel_belt_y | 0 | 1 | 30.15 | 28.58 | -69 | 3 | 35.0 | 61 | 164 | ▁▇▇▁▁ |
| accel_belt_z | 0 | 1 | -72.59 | 100.45 | -275 | -162 | -152.0 | 27 | 105 | ▁▇▁▅▃ |
| accel_arm_x | 0 | 1 | -60.24 | 182.04 | -404 | -242 | -44.0 | 84 | 437 | ▇▅▇▅▁ |
| accel_arm_y | 0 | 1 | 32.60 | 109.87 | -318 | -54 | 14.0 | 139 | 308 | ▁▃▇▆▂ |
| accel_arm_z | 0 | 1 | -71.25 | 134.65 | -636 | -143 | -47.0 | 23 | 292 | ▁▁▅▇▁ |
| accel_dumbbell_x | 0 | 1 | -28.62 | 67.32 | -419 | -50 | -8.0 | 11 | 235 | ▁▁▆▇▁ |
| accel_dumbbell_y | 0 | 1 | 52.63 | 80.75 | -189 | -8 | 41.5 | 111 | 315 | ▁▇▇▅▁ |
| accel_dumbbell_z | 0 | 1 | -38.32 | 109.47 | -334 | -142 | -1.0 | 38 | 318 | ▁▆▇▃▁ |
| accel_forearm_x | 0 | 1 | -61.65 | 180.59 | -498 | -178 | -57.0 | 76 | 477 | ▂▆▇▅▁ |
| accel_forearm_y | 0 | 1 | 163.66 | 200.13 | -632 | 57 | 201.0 | 312 | 923 | ▁▂▇▅▁ |
| accel_forearm_z | 0 | 1 | -55.29 | 138.40 | -446 | -182 | -39.0 | 26 | 291 | ▁▇▅▅▃ |
Finally, the data looks tidy and we have all the required variables. The next now is to use this data to model.
Modelling stage refers to applying machin learning or other statistical models to data. Our objective in this project was to create a prediction model. As our outcome variable is a distinct class variable so we need to use a classification algorithm. We first tried to use decision tree but due to lack of accuracy the random forest algorithm was adopted.
Modelling stage involves various steps such as data splitting, recipe, workflows, model specification, fit, and accuracy.
The dataset was split using rsample package of tidyverse.
library(rsample)
set.seed(1234)
split_data <- initial_split(clean_data, prop = 2/3)
training_data <- training(split_data)
testing_data <- testing(split_data)
Feature engineering or recipe was created using recipes package. As there is one outcome variable and all other variable were predictor this step was pretty simple.
library(recipes)
recipe <- training_data %>%
recipe(classe ~.)
summary(recipe)
## # A tibble: 13 × 4
## variable type role source
## <chr> <list> <chr> <chr>
## 1 accel_belt_x <chr [2]> predictor original
## 2 accel_belt_y <chr [2]> predictor original
## 3 accel_belt_z <chr [2]> predictor original
## 4 accel_arm_x <chr [2]> predictor original
## 5 accel_arm_y <chr [2]> predictor original
## 6 accel_arm_z <chr [2]> predictor original
## 7 accel_dumbbell_x <chr [2]> predictor original
## 8 accel_dumbbell_y <chr [2]> predictor original
## 9 accel_dumbbell_z <chr [2]> predictor original
## 10 accel_forearm_x <chr [2]> predictor original
## 11 accel_forearm_y <chr [2]> predictor original
## 12 accel_forearm_z <chr [2]> predictor original
## 13 classe <chr [3]> outcome original
Parsnip package was used to decide or set model specifications.
library(parsnip)
rf_model <- parsnip::rand_forest() %>%
parsnip::set_mode("classification") %>%
parsnip::set_engine("randomForest")
Using work flows
library(workflows)
rf_wflow <-workflows::workflow() %>%
workflows::add_recipe(recipe) %>%
workflows::add_model(rf_model)
Now using rasample vfold_cv function we will create cross validation.
vfold <- vfold_cv(data = training_data, v = 4)
Model Fitting using fit_resamples function of tune package.
library(tune)
rf_resample_fit <- fit_resamples(rf_wflow, vfold)
Knowing Accuracy using Collect Metrices function.
collect_metrics(rf_resample_fit)
## # A tibble: 2 × 6
## .metric .estimator mean n std_err .config
## <chr> <chr> <dbl> <int> <dbl> <chr>
## 1 accuracy multiclass 0.933 4 0.00443 Preprocessor1_Model1
## 2 roc_auc hand_till 0.994 4 0.000349 Preprocessor1_Model1
Hurray!! using RF we have about 93% accuracy so we are good to go with this model.
We used our model on initial test dataset to see the results out of sample dataset.
pred_testing <- predict(fit(rf_wflow, training_data), new_data = testing)
pred_testing
## # A tibble: 20 × 1
## .pred_class
## <fct>
## 1 B
## 2 A
## 3 C
## 4 A
## 5 A
## 6 E
## 7 D
## 8 B
## 9 A
## 10 A
## 11 B
## 12 C
## 13 B
## 14 A
## 15 E
## 16 E
## 17 A
## 18 B
## 19 C
## 20 B