1. Introduction

This course project utilises tidyverse tool set to execute various tasks; hence, it follows a streamlines process of modelling. The project is divided into four stages i.e., data import, data wrangling, data modelling, and results.

2. Data Import

First of all, the datasets “pml-testing.csv” and “pml-training.csv” must be available in working directory in order to execute the same code on any other machine. Also, we will load “readr” library from tidyverse collection of data science libraries.

        library(readr)

        training <- read_csv("pml-training.csv")
## New names:
## Rows: 19622 Columns: 160
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (34): user_name, cvtd_timestamp, new_window, kurtosis_roll_belt, kurtos... dbl
## (126): ...1, raw_timestamp_part_1, raw_timestamp_part_2, num_window, rol...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
        testing <- read_csv("pml-testing.csv") 
## New names:
## Rows: 20 Columns: 160
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (3): user_name, cvtd_timestamp, new_window dbl (57): ...1,
## raw_timestamp_part_1, raw_timestamp_part_2, num_window, rol... lgl (100):
## kurtosis_roll_belt, kurtosis_picth_belt, kurtosis_yaw_belt, skewn...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

3. Data Wrangling

Data wrangling refers to cleaning or transforming the data into appropriate format for machine learning. This stage is involves series of tasks that we will go one-by-one.

Firstly we will convert the data object into a tidy object using as_tibble function from tidyr.

        library(tidyr)
        
        training <- as_tibble(training)
        testing <- as_tibble(testing)

Secondly, we will see a glimpse of our dataset using skimr.

library(skimr)
        
        training %>%
                skim()
Data summary
Name Piped data
Number of rows 19622
Number of columns 160
_______________________
Column type frequency:
character 34
numeric 126
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
user_name 0 1.00 5 8 0 6 0
cvtd_timestamp 0 1.00 16 16 0 20 0
new_window 0 1.00 2 3 0 2 0
kurtosis_roll_belt 19216 0.02 7 9 0 396 0
kurtosis_picth_belt 19216 0.02 7 9 0 316 0
kurtosis_yaw_belt 19216 0.02 7 7 0 1 0
skewness_roll_belt.1 19216 0.02 7 9 0 337 0
skewness_yaw_belt 19216 0.02 7 7 0 1 0
max_yaw_belt 19216 0.02 3 7 0 67 0
min_yaw_belt 19216 0.02 3 7 0 67 0
amplitude_yaw_belt 19216 0.02 4 7 0 3 0
kurtosis_roll_arm 19216 0.02 7 8 0 329 0
kurtosis_picth_arm 19216 0.02 7 8 0 327 0
kurtosis_yaw_arm 19216 0.02 7 8 0 394 0
skewness_roll_arm 19216 0.02 7 8 0 330 0
skewness_pitch_arm 19216 0.02 7 8 0 327 0
skewness_yaw_arm 19216 0.02 7 8 0 394 0
kurtosis_roll_dumbbell 19216 0.02 6 7 0 397 0
kurtosis_picth_dumbbell 19216 0.02 6 7 0 400 0
kurtosis_yaw_dumbbell 19216 0.02 7 7 0 1 0
skewness_yaw_dumbbell 19216 0.02 7 7 0 1 0
max_yaw_dumbbell 19216 0.02 3 7 0 72 0
min_yaw_dumbbell 19216 0.02 3 7 0 72 0
amplitude_yaw_dumbbell 19216 0.02 4 7 0 2 0
kurtosis_roll_forearm 19216 0.02 6 7 0 321 0
kurtosis_picth_forearm 19216 0.02 6 7 0 322 0
kurtosis_yaw_forearm 19216 0.02 7 7 0 1 0
skewness_roll_forearm 19216 0.02 6 7 0 322 0
skewness_pitch_forearm 19216 0.02 6 7 0 318 0
skewness_yaw_forearm 19216 0.02 7 7 0 1 0
max_yaw_forearm 19216 0.02 3 7 0 44 0
min_yaw_forearm 19216 0.02 3 7 0 44 0
amplitude_yaw_forearm 19216 0.02 4 7 0 2 0
classe 0 1.00 1 1 0 5 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
…1 0 1.00 9811.50 5664.53 1.00000e+00 4906.25 9811.50 14716.75 1.962200e+04 ▇▇▇▇▇
raw_timestamp_part_1 0 1.00 1322827119.27 204927.68 1.32249e+09 1322673099.00 1322832920.00 1323084264.00 1.323095e+09 ▃▃▇▁▆
raw_timestamp_part_2 0 1.00 500656.14 288222.88 2.94000e+02 252912.25 496380.00 751890.75 9.988010e+05 ▇▇▇▇▇
num_window 0 1.00 430.64 247.91 1.00000e+00 222.00 424.00 644.00 8.640000e+02 ▇▇▇▇▇
roll_belt 0 1.00 64.41 62.75 -2.89000e+01 1.10 113.00 123.00 1.620000e+02 ▇▁▁▅▅
pitch_belt 0 1.00 0.31 22.35 -5.58000e+01 1.76 5.28 14.90 6.030000e+01 ▃▁▇▅▁
yaw_belt 0 1.00 -11.21 95.19 -1.80000e+02 -88.30 -13.00 12.90 1.790000e+02 ▁▇▅▁▃
total_accel_belt 0 1.00 11.31 7.74 0.00000e+00 3.00 17.00 18.00 2.900000e+01 ▇▁▂▆▁
skewness_roll_belt 19225 0.02 -0.03 0.92 -5.74000e+00 -0.44 0.00 0.42 3.600000e+00 ▁▁▆▇▁
max_roll_belt 19216 0.02 -6.67 94.59 -9.43000e+01 -88.00 -5.10 18.50 1.800000e+02 ▇▅▁▁▃
max_picth_belt 19216 0.02 12.92 8.01 3.00000e+00 5.00 18.00 19.00 3.000000e+01 ▇▁▆▃▁
min_roll_belt 19216 0.02 -10.44 93.62 -1.80000e+02 -88.40 -7.85 9.05 1.730000e+02 ▁▇▅▁▃
min_pitch_belt 19216 0.02 10.76 7.47 0.00000e+00 3.00 16.00 17.00 2.300000e+01 ▇▁▁▇▂
amplitude_roll_belt 19216 0.02 3.77 25.26 0.00000e+00 0.30 1.00 2.08 3.600000e+02 ▇▁▁▁▁
amplitude_pitch_belt 19216 0.02 2.17 2.36 0.00000e+00 1.00 1.00 2.00 1.200000e+01 ▇▁▁▁▁
var_total_accel_belt 19216 0.02 0.93 2.22 0.00000e+00 0.10 0.20 0.30 1.650000e+01 ▇▁▁▁▁
avg_roll_belt 19216 0.02 68.06 63.14 -2.74000e+01 1.10 116.35 123.38 1.574000e+02 ▇▁▁▃▆
stddev_roll_belt 19216 0.02 1.34 2.44 0.00000e+00 0.20 0.40 0.70 1.420000e+01 ▇▁▁▁▁
var_roll_belt 19216 0.02 7.70 23.16 0.00000e+00 0.00 0.10 0.50 2.007000e+02 ▇▁▁▁▁
avg_pitch_belt 19216 0.02 0.52 22.41 -5.14000e+01 2.02 5.20 15.78 5.970000e+01 ▃▁▇▃▁
stddev_pitch_belt 19216 0.02 0.60 0.64 0.00000e+00 0.20 0.40 0.70 4.000000e+00 ▇▁▁▁▁
var_pitch_belt 19216 0.02 0.77 1.76 0.00000e+00 0.00 0.10 0.50 1.620000e+01 ▇▁▁▁▁
avg_yaw_belt 19216 0.02 -8.83 93.48 -1.38300e+02 -88.18 -6.55 14.12 1.735000e+02 ▇▁▅▁▃
stddev_yaw_belt 19216 0.02 1.34 10.29 0.00000e+00 0.10 0.30 0.70 1.766000e+02 ▇▁▁▁▁
var_yaw_belt 19216 0.02 107.49 1655.52 0.00000e+00 0.01 0.09 0.48 3.118324e+04 ▇▁▁▁▁
gyros_belt_x 0 1.00 -0.01 0.21 -1.04000e+00 -0.03 0.03 0.11 2.220000e+00 ▁▇▁▁▁
gyros_belt_y 0 1.00 0.04 0.08 -6.40000e-01 0.00 0.02 0.11 6.400000e-01 ▁▁▇▁▁
gyros_belt_z 0 1.00 -0.13 0.24 -1.46000e+00 -0.20 -0.10 -0.02 1.620000e+00 ▁▂▇▁▁
accel_belt_x 0 1.00 -5.59 29.64 -1.20000e+02 -21.00 -15.00 -5.00 8.500000e+01 ▁▁▇▁▂
accel_belt_y 0 1.00 30.15 28.58 -6.90000e+01 3.00 35.00 61.00 1.640000e+02 ▁▇▇▁▁
accel_belt_z 0 1.00 -72.59 100.45 -2.75000e+02 -162.00 -152.00 27.00 1.050000e+02 ▁▇▁▅▃
magnet_belt_x 0 1.00 55.60 64.18 -5.20000e+01 9.00 35.00 59.00 4.850000e+02 ▇▁▂▁▁
magnet_belt_y 0 1.00 593.68 35.68 3.54000e+02 581.00 601.00 610.00 6.730000e+02 ▁▁▁▇▃
magnet_belt_z 0 1.00 -345.48 65.21 -6.23000e+02 -375.00 -320.00 -306.00 2.930000e+02 ▁▇▁▁▁
roll_arm 0 1.00 17.83 72.74 -1.80000e+02 -31.78 0.00 77.30 1.800000e+02 ▁▃▇▆▂
pitch_arm 0 1.00 -4.61 30.68 -8.88000e+01 -25.90 0.00 11.20 8.850000e+01 ▁▅▇▂▁
yaw_arm 0 1.00 -0.62 71.36 -1.80000e+02 -43.10 0.00 45.88 1.800000e+02 ▁▃▇▃▂
total_accel_arm 0 1.00 25.51 10.52 1.00000e+00 17.00 27.00 33.00 6.600000e+01 ▃▆▇▁▁
var_accel_arm 19216 0.02 53.24 53.98 0.00000e+00 9.03 40.61 75.62 3.317000e+02 ▇▂▁▁▁
avg_roll_arm 19216 0.02 12.68 68.58 -1.66670e+02 -38.37 0.00 76.32 1.633300e+02 ▁▅▇▅▃
stddev_roll_arm 19216 0.02 11.20 17.10 0.00000e+00 1.38 5.70 14.92 1.619600e+02 ▇▁▁▁▁
var_roll_arm 19216 0.02 417.26 2007.16 0.00000e+00 1.90 32.52 222.65 2.623221e+04 ▇▁▁▁▁
avg_pitch_arm 19216 0.02 -4.90 26.83 -8.17700e+01 -22.77 0.00 8.28 7.566000e+01 ▁▃▇▂▁
stddev_pitch_arm 19216 0.02 10.38 9.40 0.00000e+00 1.64 8.13 16.33 4.341000e+01 ▇▅▂▁▁
var_pitch_arm 19216 0.02 195.86 292.60 0.00000e+00 2.70 66.15 266.58 1.884560e+03 ▇▁▁▁▁
avg_yaw_arm 19216 0.02 2.36 61.33 -1.73440e+02 -29.20 0.00 38.18 1.520000e+02 ▁▂▇▃▂
stddev_yaw_arm 19216 0.02 22.27 23.69 0.00000e+00 2.58 16.68 35.98 1.770400e+02 ▇▂▁▁▁
var_yaw_arm 19216 0.02 1055.93 2722.17 0.00000e+00 6.64 278.31 1294.85 3.134457e+04 ▇▁▁▁▁
gyros_arm_x 0 1.00 0.04 1.99 -6.37000e+00 -1.33 0.08 1.57 4.870000e+00 ▁▃▇▆▂
gyros_arm_y 0 1.00 -0.26 0.85 -3.44000e+00 -0.80 -0.24 0.14 2.840000e+00 ▁▂▇▂▁
gyros_arm_z 0 1.00 0.27 0.55 -2.33000e+00 -0.07 0.23 0.72 3.020000e+00 ▁▂▇▂▁
accel_arm_x 0 1.00 -60.24 182.04 -4.04000e+02 -242.00 -44.00 84.00 4.370000e+02 ▇▅▇▅▁
accel_arm_y 0 1.00 32.60 109.87 -3.18000e+02 -54.00 14.00 139.00 3.080000e+02 ▁▃▇▆▂
accel_arm_z 0 1.00 -71.25 134.65 -6.36000e+02 -143.00 -47.00 23.00 2.920000e+02 ▁▁▅▇▁
magnet_arm_x 0 1.00 191.72 443.64 -5.84000e+02 -300.00 289.00 637.00 7.820000e+02 ▆▃▂▃▇
magnet_arm_y 0 1.00 156.61 201.91 -3.92000e+02 -9.00 202.00 323.00 5.830000e+02 ▁▅▅▇▂
magnet_arm_z 0 1.00 306.49 326.62 -5.97000e+02 131.25 444.00 545.00 6.940000e+02 ▁▂▂▃▇
max_roll_arm 19216 0.02 11.24 26.93 -7.31000e+01 -0.18 4.95 26.78 8.550000e+01 ▁▂▇▃▁
max_picth_arm 19216 0.02 35.75 69.62 -1.73000e+02 -1.98 23.25 95.97 1.800000e+02 ▁▂▇▆▃
max_yaw_arm 19216 0.02 35.46 10.45 4.00000e+00 29.00 34.00 41.00 6.500000e+01 ▁▂▇▃▁
min_roll_arm 19216 0.02 -21.22 28.72 -8.91000e+01 -41.98 -22.45 0.00 6.640000e+01 ▂▆▇▂▁
min_pitch_arm 19216 0.02 -33.92 60.83 -1.80000e+02 -72.62 -33.85 0.00 1.520000e+02 ▁▆▇▂▁
min_yaw_arm 19216 0.02 14.67 9.11 1.00000e+00 8.00 13.00 19.00 3.800000e+01 ▆▇▃▂▂
amplitude_roll_arm 19216 0.02 32.45 27.39 0.00000e+00 5.43 28.45 50.96 1.195000e+02 ▇▆▃▁▁
amplitude_pitch_arm 19216 0.02 69.68 66.98 0.00000e+00 9.93 54.90 115.17 3.600000e+02 ▇▅▂▁▁
amplitude_yaw_arm 19216 0.02 20.79 12.28 0.00000e+00 13.00 22.00 28.75 5.200000e+01 ▅▅▇▃▁
roll_dumbbell 0 1.00 23.84 69.93 -1.53710e+02 -18.49 48.17 67.61 1.535500e+02 ▂▂▃▇▂
pitch_dumbbell 0 1.00 -10.78 36.99 -1.49590e+02 -40.89 -20.96 17.50 1.494000e+02 ▁▆▇▂▁
yaw_dumbbell 0 1.00 1.67 82.52 -1.50870e+02 -77.64 -3.32 79.64 1.549500e+02 ▃▇▅▅▆
skewness_roll_dumbbell 19220 0.02 -0.12 0.82 -7.38000e+00 -0.58 -0.08 0.40 1.960000e+00 ▁▁▁▇▆
skewness_pitch_dumbbell 19217 0.02 -0.03 0.86 -7.45000e+00 -0.53 -0.09 0.51 3.770000e+00 ▁▁▂▇▁
max_roll_dumbbell 19216 0.02 13.76 48.30 -7.01000e+01 -27.15 14.85 50.58 1.370000e+02 ▆▇▇▅▂
max_picth_dumbbell 19216 0.02 32.75 93.37 -1.12900e+02 -66.70 40.05 133.23 1.550000e+02 ▆▃▂▂▇
min_roll_dumbbell 19216 0.02 -41.24 34.71 -1.49600e+02 -59.68 -43.55 -25.20 7.320000e+01 ▁▃▇▃▁
min_pitch_dumbbell 19216 0.02 -33.18 74.28 -1.47000e+02 -91.80 -66.15 21.20 1.209000e+02 ▆▇▅▃▃
amplitude_roll_dumbbell 19216 0.02 55.00 54.94 0.00000e+00 14.97 35.05 81.04 2.564800e+02 ▇▂▁▁▁
amplitude_pitch_dumbbell 19216 0.02 65.93 65.23 0.00000e+00 17.06 41.72 99.54 2.735900e+02 ▇▂▂▁▁
total_accel_dumbbell 0 1.00 13.72 10.23 0.00000e+00 4.00 10.00 19.00 5.800000e+01 ▇▅▃▁▁
var_accel_dumbbell 19216 0.02 4.39 13.51 0.00000e+00 0.38 1.00 3.43 2.304300e+02 ▇▁▁▁▁
avg_roll_dumbbell 19216 0.02 23.86 62.90 -1.28960e+02 -12.33 48.23 64.37 1.259900e+02 ▂▂▃▇▂
stddev_roll_dumbbell 19216 0.02 20.76 24.30 0.00000e+00 4.64 12.20 26.36 1.237800e+02 ▇▂▁▁▁
var_roll_dumbbell 19216 0.02 1020.27 2262.56 0.00000e+00 21.52 148.95 694.65 1.532101e+04 ▇▁▁▁▁
avg_pitch_dumbbell 19216 0.02 -12.33 32.06 -7.07300e+01 -42.00 -19.90 13.21 9.428000e+01 ▇▇▇▂▁
stddev_pitch_dumbbell 19216 0.02 13.15 13.34 0.00000e+00 3.48 8.09 19.24 8.268000e+01 ▇▂▁▁▁
var_pitch_dumbbell 19216 0.02 350.31 673.96 0.00000e+00 12.13 65.43 370.11 6.836020e+03 ▇▁▁▁▁
avg_yaw_dumbbell 19216 0.02 0.20 78.21 -1.17950e+02 -76.70 -4.50 71.23 1.349000e+02 ▇▃▅▃▅
stddev_yaw_dumbbell 19216 0.02 16.65 17.71 0.00000e+00 3.88 10.26 24.67 1.070900e+02 ▇▂▁▁▁
var_yaw_dumbbell 19216 0.02 589.84 1244.59 0.00000e+00 15.09 105.35 608.79 1.146791e+04 ▇▁▁▁▁
gyros_dumbbell_x 0 1.00 0.16 1.51 -2.04000e+02 -0.03 0.13 0.35 2.220000e+00 ▁▁▁▁▇
gyros_dumbbell_y 0 1.00 0.05 0.61 -2.10000e+00 -0.14 0.03 0.21 5.200000e+01 ▇▁▁▁▁
gyros_dumbbell_z 0 1.00 -0.13 2.29 -2.38000e+00 -0.31 -0.13 0.03 3.170000e+02 ▇▁▁▁▁
accel_dumbbell_x 0 1.00 -28.62 67.32 -4.19000e+02 -50.00 -8.00 11.00 2.350000e+02 ▁▁▆▇▁
accel_dumbbell_y 0 1.00 52.63 80.75 -1.89000e+02 -8.00 41.50 111.00 3.150000e+02 ▁▇▇▅▁
accel_dumbbell_z 0 1.00 -38.32 109.47 -3.34000e+02 -142.00 -1.00 38.00 3.180000e+02 ▁▆▇▃▁
magnet_dumbbell_x 0 1.00 -328.48 339.72 -6.43000e+02 -535.00 -479.00 -304.00 5.920000e+02 ▇▂▁▁▂
magnet_dumbbell_y 0 1.00 220.97 326.87 -3.60000e+03 231.00 311.00 390.00 6.330000e+02 ▁▁▁▁▇
magnet_dumbbell_z 0 1.00 46.05 139.96 -2.62000e+02 -45.00 13.00 95.00 4.520000e+02 ▁▇▆▂▂
roll_forearm 0 1.00 33.83 108.04 -1.80000e+02 -0.74 21.70 140.00 1.800000e+02 ▃▂▇▂▇
pitch_forearm 0 1.00 10.71 28.15 -7.25000e+01 0.00 9.24 28.40 8.980000e+01 ▁▁▇▃▁
yaw_forearm 0 1.00 19.21 103.22 -1.80000e+02 -68.60 0.00 110.00 1.800000e+02 ▅▅▇▆▇
max_roll_forearm 19216 0.02 24.49 31.04 -6.66000e+01 0.00 26.80 45.95 8.980000e+01 ▁▁▇▇▂
max_picth_forearm 19216 0.02 81.49 95.54 -1.51000e+02 0.00 113.00 174.75 1.800000e+02 ▁▂▃▂▇
min_roll_forearm 19216 0.02 -0.17 22.59 -7.25000e+01 -6.07 0.00 12.07 6.210000e+01 ▁▁▇▃▁
min_pitch_forearm 19216 0.02 -57.57 110.74 -1.80000e+02 -175.00 -61.00 0.00 1.670000e+02 ▇▂▅▂▂
amplitude_roll_forearm 19216 0.02 24.65 25.88 0.00000e+00 1.12 17.77 39.88 1.260000e+02 ▇▃▂▁▁
amplitude_pitch_forearm 19216 0.02 139.06 147.86 0.00000e+00 2.00 83.70 350.00 3.600000e+02 ▇▃▁▁▅
total_accel_forearm 0 1.00 34.72 10.06 0.00000e+00 29.00 36.00 41.00 1.080000e+02 ▁▇▂▁▁
var_accel_forearm 19216 0.02 33.50 33.95 0.00000e+00 6.76 21.16 51.24 1.726100e+02 ▇▃▂▁▁
avg_roll_forearm 19216 0.02 33.17 79.52 -1.77230e+02 -0.91 11.17 107.13 1.772600e+02 ▁▂▇▃▅
stddev_roll_forearm 19216 0.02 41.99 59.33 0.00000e+00 0.43 8.03 85.37 1.791700e+02 ▇▁▁▁▂
var_roll_forearm 19216 0.02 5274.10 9177.18 0.00000e+00 0.18 64.48 7289.08 3.210224e+04 ▇▁▁▁▁
avg_pitch_forearm 19216 0.02 11.80 24.83 -6.81700e+01 0.00 12.02 28.48 7.209000e+01 ▁▁▇▆▁
stddev_pitch_forearm 19216 0.02 7.98 8.73 0.00000e+00 0.34 5.52 12.87 4.775000e+01 ▇▃▁▁▁
var_pitch_forearm 19216 0.02 139.59 266.49 0.00000e+00 0.11 30.43 165.53 2.279620e+03 ▇▁▁▁▁
avg_yaw_forearm 19216 0.02 18.00 77.56 -1.55060e+02 -26.26 0.00 85.79 1.692400e+02 ▂▃▇▆▃
stddev_yaw_forearm 19216 0.02 44.85 51.33 0.00000e+00 0.52 24.74 85.82 1.975100e+02 ▇▂▂▂▁
var_yaw_forearm 19216 0.02 4639.85 7284.97 0.00000e+00 0.27 612.21 7368.41 3.900933e+04 ▇▂▁▁▁
gyros_forearm_x 0 1.00 0.16 0.65 -2.20000e+01 -0.22 0.05 0.56 3.970000e+00 ▁▁▁▁▇
gyros_forearm_y 0 1.00 0.08 3.10 -7.02000e+00 -1.46 0.03 1.62 3.110000e+02 ▇▁▁▁▁
gyros_forearm_z 0 1.00 0.15 1.75 -8.09000e+00 -0.18 0.08 0.49 2.310000e+02 ▇▁▁▁▁
accel_forearm_x 0 1.00 -61.65 180.59 -4.98000e+02 -178.00 -57.00 76.00 4.770000e+02 ▂▆▇▅▁
accel_forearm_y 0 1.00 163.66 200.13 -6.32000e+02 57.00 201.00 312.00 9.230000e+02 ▁▂▇▅▁
accel_forearm_z 0 1.00 -55.29 138.40 -4.46000e+02 -182.00 -39.00 26.00 2.910000e+02 ▁▇▅▅▃
magnet_forearm_x 0 1.00 -312.58 346.96 -1.28000e+03 -616.00 -378.00 -73.00 6.720000e+02 ▁▇▇▅▁
magnet_forearm_y 0 1.00 380.12 509.37 -8.96000e+02 2.00 591.00 737.00 1.480000e+03 ▂▂▂▇▁
magnet_forearm_z 0 1.00 393.61 369.27 -9.73000e+02 191.00 511.00 653.00 1.090000e+03 ▁▁▂▇▃

As we see, this is a very big dataset and there are too many variables. However, we’re only interested in accelerometer data. All variable starting with accel_ are related to accelerometer data. So, we need select only these variable and our outcome variable classe in further analysis of this project.

Using stringer we will create a list of variables that start with accel prefix, as accel_names.

        library(stringr)
        
        names <- names(training)
        accel_names <- str_subset(names, "^accel")

Now using dplyr the accel_names string will be used to sub-set variables related to accelerometers on the belt, forearm, arm, and dumbbell.Also we will select the classe as our outcome variable and the new dataset is named as clean_data.

        library(dplyr)
        
        clean_data <- training %>%
                select(accel_names, classe)

Let see how our new cleaned dataset looks like.

        clean_data %>%
                skim()
Data summary
Name Piped data
Number of rows 19622
Number of columns 13
_______________________
Column type frequency:
character 1
numeric 12
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
classe 0 1 1 1 0 5 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
accel_belt_x 0 1 -5.59 29.64 -120 -21 -15.0 -5 85 ▁▁▇▁▂
accel_belt_y 0 1 30.15 28.58 -69 3 35.0 61 164 ▁▇▇▁▁
accel_belt_z 0 1 -72.59 100.45 -275 -162 -152.0 27 105 ▁▇▁▅▃
accel_arm_x 0 1 -60.24 182.04 -404 -242 -44.0 84 437 ▇▅▇▅▁
accel_arm_y 0 1 32.60 109.87 -318 -54 14.0 139 308 ▁▃▇▆▂
accel_arm_z 0 1 -71.25 134.65 -636 -143 -47.0 23 292 ▁▁▅▇▁
accel_dumbbell_x 0 1 -28.62 67.32 -419 -50 -8.0 11 235 ▁▁▆▇▁
accel_dumbbell_y 0 1 52.63 80.75 -189 -8 41.5 111 315 ▁▇▇▅▁
accel_dumbbell_z 0 1 -38.32 109.47 -334 -142 -1.0 38 318 ▁▆▇▃▁
accel_forearm_x 0 1 -61.65 180.59 -498 -178 -57.0 76 477 ▂▆▇▅▁
accel_forearm_y 0 1 163.66 200.13 -632 57 201.0 312 923 ▁▂▇▅▁
accel_forearm_z 0 1 -55.29 138.40 -446 -182 -39.0 26 291 ▁▇▅▅▃

Finally, the data looks tidy and we have all the required variables. The next now is to use this data to model.

4. Modelling

Modelling stage refers to applying machin learning or other statistical models to data. Our objective in this project was to create a prediction model. As our outcome variable is a distinct class variable so we need to use a classification algorithm. We first tried to use decision tree but due to lack of accuracy the random forest algorithm was adopted.

Modelling stage involves various steps such as data splitting, recipe, workflows, model specification, fit, and accuracy.

The dataset was split using rsample package of tidyverse.

        library(rsample)
        
        set.seed(1234)
        
        split_data <- initial_split(clean_data, prop = 2/3)
        
        training_data <- training(split_data)
        testing_data <- testing(split_data)

Feature engineering or recipe was created using recipes package. As there is one outcome variable and all other variable were predictor this step was pretty simple.

        library(recipes)
        
        recipe <- training_data %>%
                recipe(classe ~.)
        
        summary(recipe)
## # A tibble: 13 × 4
##    variable         type      role      source  
##    <chr>            <list>    <chr>     <chr>   
##  1 accel_belt_x     <chr [2]> predictor original
##  2 accel_belt_y     <chr [2]> predictor original
##  3 accel_belt_z     <chr [2]> predictor original
##  4 accel_arm_x      <chr [2]> predictor original
##  5 accel_arm_y      <chr [2]> predictor original
##  6 accel_arm_z      <chr [2]> predictor original
##  7 accel_dumbbell_x <chr [2]> predictor original
##  8 accel_dumbbell_y <chr [2]> predictor original
##  9 accel_dumbbell_z <chr [2]> predictor original
## 10 accel_forearm_x  <chr [2]> predictor original
## 11 accel_forearm_y  <chr [2]> predictor original
## 12 accel_forearm_z  <chr [2]> predictor original
## 13 classe           <chr [3]> outcome   original

Parsnip package was used to decide or set model specifications.

        library(parsnip)

        rf_model <- parsnip::rand_forest() %>%
             parsnip::set_mode("classification") %>%
             parsnip::set_engine("randomForest")

Using work flows

        library(workflows)

        rf_wflow <-workflows::workflow() %>%
                      workflows::add_recipe(recipe) %>%
                      workflows::add_model(rf_model)

Now using rasample vfold_cv function we will create cross validation.

        vfold <- vfold_cv(data = training_data, v = 4)

Model Fitting using fit_resamples function of tune package.

        library(tune)
        rf_resample_fit <- fit_resamples(rf_wflow, vfold)

Knowing Accuracy using Collect Metrices function.

        collect_metrics(rf_resample_fit)
## # A tibble: 2 × 6
##   .metric  .estimator  mean     n  std_err .config             
##   <chr>    <chr>      <dbl> <int>    <dbl> <chr>               
## 1 accuracy multiclass 0.933     4 0.00443  Preprocessor1_Model1
## 2 roc_auc  hand_till  0.994     4 0.000349 Preprocessor1_Model1

Hurray!! using RF we have about 93% accuracy so we are good to go with this model.

5. Results

We used our model on initial test dataset to see the results out of sample dataset.

        pred_testing <- predict(fit(rf_wflow, training_data), new_data = testing)
        
        pred_testing
## # A tibble: 20 × 1
##    .pred_class
##    <fct>      
##  1 B          
##  2 A          
##  3 C          
##  4 A          
##  5 A          
##  6 E          
##  7 D          
##  8 B          
##  9 A          
## 10 A          
## 11 B          
## 12 C          
## 13 B          
## 14 A          
## 15 E          
## 16 E          
## 17 A          
## 18 B          
## 19 C          
## 20 B