Using devices such as Jawbone Up, Nike FuelBand, and Fitbit it is now possible to collect a large amount of data about personal activity relatively inexpensively. These type of devices are part of the quantified self movement – a group of enthusiasts who take measurements about themselves regularly to improve their health, to find patterns in their behavior, or because they are tech geeks. One thing that people regularly do is quantify how much of a particular activity they do, but they rarely quantify how well they do it. In this project, your goal will be to use data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants. They were asked to perform barbell lifts correctly and incorrectly in 5 different ways. More information is available from the website here: http://web.archive.org/web/20161224072740/http:/groupware.les.inf.puc-rio.br/har (see the section on the Weight Lifting Exercise Dataset). Project aims to quantify how well participants do particular activities. We will use data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants. This is the “classe” variable in the training set.
print("Operating System:")
[1] "Operating System:"
version
_
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 6.3
year 2020
month 02
day 29
svn rev 77875
language R
version.string R version 3.6.3 (2020-02-29)
nickname Holding the Windsock
Let’s start by checking if needed R packages for this project are installed. If not, code below will install them.
# required packages for our project
if(!require(kableExtra)) install.packages('kableExtra',
repos = 'http://cran.us.r-project.org')
if(!require(tidyverse)) install.packages('tidyverse',
repos = 'http://cran.us.r-project.org')
if(!require(caret)) install.packages('caret',
repos = 'http://cran.us.r-project.org')
if(!require(corrplot)) install.packages('corrplot',
repos = 'http://cran.us.r-project.org')
if(!require(randomForest)) install.packages('randomForest',
repos = 'http://cran.us.r-project.org')
Now we are ready for data downloading:
# Links saved in objects:
train_link <- "https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv"
test_link <- "https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv"
Now let’s load our working data
# Load data
train_data <- read.csv(url(train_link), na.strings=c("NA","#DIV/0!",""))
test_data <- read.csv(url(test_link), na.strings=c("NA","#DIV/0!",""))
When developing an algorithm, we usually have a dataset for which we know the outcomes. Therefore, to mimic the ultimate evaluation process, we typically split the data into two parts and act as if we don’t know the outcome for one of these.
We stop pretending we don’t know the outcome to evaluate the algorithm, but only after we are done constructing it. We refer to the group for which we know the outcome, and use to develop the algorithm, as the training set.
We refer to the group for which we pretend we don’t know the outcome as the test set.
A standard way of generating the training and test sets is by randomly splitting the data. The caret package includes the function createDataPartition that helps us generates indexes for randomly splitting the data into training and test sets:
# Generate indexes for randomly splitting data
# Validation set will be 30% of train_data
set.seed(1, sample.kind='Rounding')
train_index <- createDataPartition(y = train_data$classe, p=0.7, times = 1, list = FALSE)
We use the result of the createDataPartition function call to define the training and test sets like this:
train <- train_data[train_index,]
test <- train_data[-train_index,]
First, we make a check if our data format is indeed data frame:
# Check format
class(train)
[1] "data.frame"
class(test)
[1] "data.frame"
Now let’s take a look in our data. We start by finding out more about the structure of our train:
as_tibble(train) %>%
slice(1:5) %>%
knitr::kable()
| X | user_name | raw_timestamp_part_1 | raw_timestamp_part_2 | cvtd_timestamp | new_window | num_window | roll_belt | pitch_belt | yaw_belt | total_accel_belt | kurtosis_roll_belt | kurtosis_picth_belt | kurtosis_yaw_belt | skewness_roll_belt | skewness_roll_belt.1 | skewness_yaw_belt | max_roll_belt | max_picth_belt | max_yaw_belt | min_roll_belt | min_pitch_belt | min_yaw_belt | amplitude_roll_belt | amplitude_pitch_belt | amplitude_yaw_belt | var_total_accel_belt | avg_roll_belt | stddev_roll_belt | var_roll_belt | avg_pitch_belt | stddev_pitch_belt | var_pitch_belt | avg_yaw_belt | stddev_yaw_belt | var_yaw_belt | gyros_belt_x | gyros_belt_y | gyros_belt_z | accel_belt_x | accel_belt_y | accel_belt_z | magnet_belt_x | magnet_belt_y | magnet_belt_z | roll_arm | pitch_arm | yaw_arm | total_accel_arm | var_accel_arm | avg_roll_arm | stddev_roll_arm | var_roll_arm | avg_pitch_arm | stddev_pitch_arm | var_pitch_arm | avg_yaw_arm | stddev_yaw_arm | var_yaw_arm | gyros_arm_x | gyros_arm_y | gyros_arm_z | accel_arm_x | accel_arm_y | accel_arm_z | magnet_arm_x | magnet_arm_y | magnet_arm_z | kurtosis_roll_arm | kurtosis_picth_arm | kurtosis_yaw_arm | skewness_roll_arm | skewness_pitch_arm | skewness_yaw_arm | max_roll_arm | max_picth_arm | max_yaw_arm | min_roll_arm | min_pitch_arm | min_yaw_arm | amplitude_roll_arm | amplitude_pitch_arm | amplitude_yaw_arm | roll_dumbbell | pitch_dumbbell | yaw_dumbbell | kurtosis_roll_dumbbell | kurtosis_picth_dumbbell | kurtosis_yaw_dumbbell | skewness_roll_dumbbell | skewness_pitch_dumbbell | skewness_yaw_dumbbell | max_roll_dumbbell | max_picth_dumbbell | max_yaw_dumbbell | min_roll_dumbbell | min_pitch_dumbbell | min_yaw_dumbbell | amplitude_roll_dumbbell | amplitude_pitch_dumbbell | amplitude_yaw_dumbbell | total_accel_dumbbell | var_accel_dumbbell | avg_roll_dumbbell | stddev_roll_dumbbell | var_roll_dumbbell | avg_pitch_dumbbell | stddev_pitch_dumbbell | var_pitch_dumbbell | avg_yaw_dumbbell | stddev_yaw_dumbbell | var_yaw_dumbbell | gyros_dumbbell_x | gyros_dumbbell_y | gyros_dumbbell_z | accel_dumbbell_x | accel_dumbbell_y | accel_dumbbell_z | magnet_dumbbell_x | magnet_dumbbell_y | magnet_dumbbell_z | roll_forearm | pitch_forearm | yaw_forearm | kurtosis_roll_forearm | kurtosis_picth_forearm | kurtosis_yaw_forearm | skewness_roll_forearm | skewness_pitch_forearm | skewness_yaw_forearm | max_roll_forearm | max_picth_forearm | max_yaw_forearm | min_roll_forearm | min_pitch_forearm | min_yaw_forearm | amplitude_roll_forearm | amplitude_pitch_forearm | amplitude_yaw_forearm | total_accel_forearm | var_accel_forearm | avg_roll_forearm | stddev_roll_forearm | var_roll_forearm | avg_pitch_forearm | stddev_pitch_forearm | var_pitch_forearm | avg_yaw_forearm | stddev_yaw_forearm | var_yaw_forearm | gyros_forearm_x | gyros_forearm_y | gyros_forearm_z | accel_forearm_x | accel_forearm_y | accel_forearm_z | magnet_forearm_x | magnet_forearm_y | magnet_forearm_z | classe |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | carlitos | 1323084231 | 808298 | 05/12/2011 11:23 | no | 11 | 1.41 | 8.07 | -94.4 | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.02 | 0.00 | -0.02 | -22 | 4 | 22 | -7 | 608 | -311 | -128 | 22.5 | -161 | 34 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.02 | -0.02 | -0.02 | -290 | 110 | -125 | -369 | 337 | 513 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 13.13074 | -70.63751 | -84.71065 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 37 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0 | -0.02 | 0.00 | -233 | 47 | -269 | -555 | 296 | -64 | 28.3 | -63.9 | -153 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 36 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.02 | 0.00 | -0.02 | 192 | 203 | -216 | -18 | 661 | 473 | A |
| 3 | carlitos | 1323084231 | 820366 | 05/12/2011 11:23 | no | 11 | 1.42 | 8.07 | -94.4 | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.00 | 0.00 | -0.02 | -20 | 5 | 23 | -2 | 600 | -305 | -128 | 22.5 | -161 | 34 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.02 | -0.02 | -0.02 | -289 | 110 | -126 | -368 | 344 | 513 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 12.85075 | -70.27812 | -85.14078 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 37 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0 | -0.02 | 0.00 | -232 | 46 | -270 | -561 | 298 | -63 | 28.3 | -63.9 | -152 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 36 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.03 | -0.02 | 0.00 | 196 | 204 | -213 | -18 | 658 | 469 | A |
| 4 | carlitos | 1323084232 | 120339 | 05/12/2011 11:23 | no | 12 | 1.48 | 8.05 | -94.4 | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.02 | 0.00 | -0.03 | -22 | 3 | 21 | -6 | 604 | -310 | -128 | 22.1 | -161 | 34 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.02 | -0.03 | 0.02 | -289 | 111 | -123 | -372 | 344 | 512 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 13.43120 | -70.39379 | -84.87363 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 37 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0 | -0.02 | -0.02 | -232 | 48 | -269 | -552 | 303 | -60 | 28.1 | -63.9 | -152 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 36 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.02 | -0.02 | 0.00 | 189 | 206 | -214 | -16 | 658 | 469 | A |
| 5 | carlitos | 1323084232 | 196328 | 05/12/2011 11:23 | no | 12 | 1.48 | 8.07 | -94.4 | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.02 | 0.02 | -0.02 | -21 | 2 | 24 | -6 | 600 | -302 | -128 | 22.1 | -161 | 34 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.00 | -0.03 | 0.00 | -289 | 111 | -123 | -374 | 337 | 506 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 13.37872 | -70.42856 | -84.85306 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 37 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0 | -0.02 | 0.00 | -233 | 48 | -270 | -554 | 292 | -68 | 28.0 | -63.9 | -152 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 36 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.02 | 0.00 | -0.02 | 189 | 206 | -214 | -17 | 655 | 473 | A |
| 7 | carlitos | 1323084232 | 368296 | 05/12/2011 11:23 | no | 12 | 1.42 | 8.09 | -94.4 | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.02 | 0.00 | -0.02 | -22 | 3 | 21 | -4 | 599 | -311 | -128 | 21.9 | -161 | 34 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.00 | -0.03 | 0.00 | -289 | 111 | -125 | -373 | 336 | 509 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 13.12695 | -70.24757 | -85.09961 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 37 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0 | -0.02 | 0.00 | -232 | 47 | -270 | -551 | 295 | -70 | 27.9 | -63.9 | -152 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 36 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.02 | 0.00 | -0.02 | 195 | 205 | -215 | -18 | 659 | 470 | A |
Now for test:
as_tibble(test) %>%
slice(1:5) %>%
knitr::kable()
| X | user_name | raw_timestamp_part_1 | raw_timestamp_part_2 | cvtd_timestamp | new_window | num_window | roll_belt | pitch_belt | yaw_belt | total_accel_belt | kurtosis_roll_belt | kurtosis_picth_belt | kurtosis_yaw_belt | skewness_roll_belt | skewness_roll_belt.1 | skewness_yaw_belt | max_roll_belt | max_picth_belt | max_yaw_belt | min_roll_belt | min_pitch_belt | min_yaw_belt | amplitude_roll_belt | amplitude_pitch_belt | amplitude_yaw_belt | var_total_accel_belt | avg_roll_belt | stddev_roll_belt | var_roll_belt | avg_pitch_belt | stddev_pitch_belt | var_pitch_belt | avg_yaw_belt | stddev_yaw_belt | var_yaw_belt | gyros_belt_x | gyros_belt_y | gyros_belt_z | accel_belt_x | accel_belt_y | accel_belt_z | magnet_belt_x | magnet_belt_y | magnet_belt_z | roll_arm | pitch_arm | yaw_arm | total_accel_arm | var_accel_arm | avg_roll_arm | stddev_roll_arm | var_roll_arm | avg_pitch_arm | stddev_pitch_arm | var_pitch_arm | avg_yaw_arm | stddev_yaw_arm | var_yaw_arm | gyros_arm_x | gyros_arm_y | gyros_arm_z | accel_arm_x | accel_arm_y | accel_arm_z | magnet_arm_x | magnet_arm_y | magnet_arm_z | kurtosis_roll_arm | kurtosis_picth_arm | kurtosis_yaw_arm | skewness_roll_arm | skewness_pitch_arm | skewness_yaw_arm | max_roll_arm | max_picth_arm | max_yaw_arm | min_roll_arm | min_pitch_arm | min_yaw_arm | amplitude_roll_arm | amplitude_pitch_arm | amplitude_yaw_arm | roll_dumbbell | pitch_dumbbell | yaw_dumbbell | kurtosis_roll_dumbbell | kurtosis_picth_dumbbell | kurtosis_yaw_dumbbell | skewness_roll_dumbbell | skewness_pitch_dumbbell | skewness_yaw_dumbbell | max_roll_dumbbell | max_picth_dumbbell | max_yaw_dumbbell | min_roll_dumbbell | min_pitch_dumbbell | min_yaw_dumbbell | amplitude_roll_dumbbell | amplitude_pitch_dumbbell | amplitude_yaw_dumbbell | total_accel_dumbbell | var_accel_dumbbell | avg_roll_dumbbell | stddev_roll_dumbbell | var_roll_dumbbell | avg_pitch_dumbbell | stddev_pitch_dumbbell | var_pitch_dumbbell | avg_yaw_dumbbell | stddev_yaw_dumbbell | var_yaw_dumbbell | gyros_dumbbell_x | gyros_dumbbell_y | gyros_dumbbell_z | accel_dumbbell_x | accel_dumbbell_y | accel_dumbbell_z | magnet_dumbbell_x | magnet_dumbbell_y | magnet_dumbbell_z | roll_forearm | pitch_forearm | yaw_forearm | kurtosis_roll_forearm | kurtosis_picth_forearm | kurtosis_yaw_forearm | skewness_roll_forearm | skewness_pitch_forearm | skewness_yaw_forearm | max_roll_forearm | max_picth_forearm | max_yaw_forearm | min_roll_forearm | min_pitch_forearm | min_yaw_forearm | amplitude_roll_forearm | amplitude_pitch_forearm | amplitude_yaw_forearm | total_accel_forearm | var_accel_forearm | avg_roll_forearm | stddev_roll_forearm | var_roll_forearm | avg_pitch_forearm | stddev_pitch_forearm | var_pitch_forearm | avg_yaw_forearm | stddev_yaw_forearm | var_yaw_forearm | gyros_forearm_x | gyros_forearm_y | gyros_forearm_z | accel_forearm_x | accel_forearm_y | accel_forearm_z | magnet_forearm_x | magnet_forearm_y | magnet_forearm_z | classe |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | carlitos | 1323084231 | 788290 | 05/12/2011 11:23 | no | 11 | 1.41 | 8.07 | -94.4 | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.00 | 0 | -0.02 | -21 | 4 | 22 | -3 | 599 | -313 | -128 | 22.5 | -161 | 34 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.00 | 0.00 | -0.02 | -288 | 109 | -123 | -368 | 337 | 516 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 13.05217 | -70.49400 | -84.87394 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 37 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0 | -0.02 | 0.00 | -234 | 47 | -271 | -559 | 293 | -65 | 28.4 | -63.9 | -153 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 36 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.03 | 0.00 | -0.02 | 192 | 203 | -215 | -17 | 654 | 476 | A |
| 6 | carlitos | 1323084232 | 304277 | 05/12/2011 11:23 | no | 12 | 1.45 | 8.06 | -94.4 | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.02 | 0 | -0.02 | -21 | 4 | 21 | 0 | 603 | -312 | -128 | 22.0 | -161 | 34 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.02 | -0.03 | 0.00 | -289 | 111 | -122 | -369 | 342 | 513 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 13.38246 | -70.81759 | -84.46500 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 37 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0 | -0.02 | 0.00 | -234 | 48 | -269 | -558 | 294 | -66 | 27.9 | -63.9 | -152 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 36 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.02 | -0.02 | -0.03 | 193 | 203 | -215 | -9 | 660 | 478 | A |
| 13 | carlitos | 1323084232 | 560359 | 05/12/2011 11:23 | no | 12 | 1.42 | 8.20 | -94.4 | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.02 | 0 | 0.00 | -22 | 4 | 21 | -3 | 606 | -309 | -128 | 21.4 | -161 | 34 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.02 | -0.02 | -0.02 | -287 | 111 | -124 | -372 | 338 | 509 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 13.38246 | -70.81759 | -84.46500 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 37 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0 | -0.02 | -0.02 | -234 | 48 | -269 | -552 | 302 | -69 | 27.2 | -63.9 | -151 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 36 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.00 | 0.00 | -0.03 | 193 | 205 | -215 | -15 | 655 | 472 | A |
| 15 | carlitos | 1323084232 | 604281 | 05/12/2011 11:23 | no | 12 | 1.45 | 8.20 | -94.4 | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.00 | 0 | 0.00 | -21 | 2 | 22 | -1 | 597 | -310 | -129 | 21.4 | -161 | 34 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.02 | 0.00 | -0.03 | -289 | 111 | -124 | -374 | 342 | 510 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 13.07949 | -70.67116 | -84.69053 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 37 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0 | -0.02 | 0.00 | -234 | 47 | -270 | -554 | 294 | -63 | 27.2 | -63.9 | -151 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 36 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.00 | -0.02 | -0.02 | 192 | 201 | -214 | -16 | 656 | 472 | A |
| 17 | carlitos | 1323084232 | 692324 | 05/12/2011 11:23 | no | 12 | 1.51 | 8.12 | -94.4 | 3 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.00 | 0 | -0.02 | -21 | 4 | 22 | -6 | 598 | -317 | -129 | 21.3 | -161 | 34 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.02 | 0.00 | -0.02 | -289 | 110 | -122 | -371 | 337 | 512 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 13.04835 | -70.10639 | -85.26058 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 37 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0 | -0.02 | 0.00 | -233 | 47 | -272 | -551 | 296 | -56 | 27.1 | -64.0 | -151 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 36 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.02 | -0.02 | 0.00 | 192 | 204 | -213 | -13 | 653 | 481 | A |
We see that train data frame has 13737 rows and 160 variables, while test data frame has 5885 rows and 160.
We will clean data from:
# remove variables with variance nearly zero
nzv_index <- nearZeroVar(train)
# apply index to clean
# for train data
train <- train[, -nzv_index]
# for test data
test <- test[, -nzv_index]
# remove variables with more than 75% NA
na_index <- sapply(train, function(x) mean(is.na(x))) > 0.75
train <- train[, na_index ==FALSE]
test <- test[, na_index ==FALSE]
# remove identificators
train <- train[, -(1:5)]
test <- test[, -(1:5)]
Finally let’s check dimensions of our cleaned dataframes.
# train data dim
dim(train)
[1] 13737 54
# test data dim
dim(test)
[1] 5885 54
We will use Random Forest algorithm as a good choice for this case. ### Random Forest Algorithm
# Random Forest Algorithm
# Model fitting for train data
rf_model <- randomForest(classe ~., data=train, method="class")
# Print model
print(rf_model)
Call:
randomForest(formula = classe ~ ., data = train, method = "class")
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 7
OOB estimate of error rate: 0.28%
Confusion matrix:
A B C D E class.error
A 3905 0 0 0 1 0.0002560164
B 5 2650 3 0 0 0.0030097818
C 0 8 2388 0 0 0.0033388982
D 0 0 16 2236 0 0.0071047957
E 0 0 0 5 2520 0.0019801980
# Predicting on test data
rf_pred <- predict(rf_model, test, Type="class")
# Print prediction
print(head(rf_pred))
1 6 13 15 17 18
A A A A A A
Levels: A B C D E
Now let’s plot Confusion Matrix to check the accuracy of model
# Confussion Matrix
confusionMatrix(rf_pred, test$classe)
Confusion Matrix and Statistics
Reference
Prediction A B C D E
A 1674 1 0 0 0
B 0 1138 5 0 0
C 0 0 1021 3 0
D 0 0 0 960 3
E 0 0 0 1 1079
Overall Statistics
Accuracy : 0.9978
95% CI : (0.9962, 0.9988)
No Information Rate : 0.2845
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.9972
Mcnemar's Test P-Value : NA
Statistics by Class:
Class: A Class: B Class: C Class: D Class: E
Sensitivity 1.0000 0.9991 0.9951 0.9959 0.9972
Specificity 0.9998 0.9989 0.9994 0.9994 0.9998
Pos Pred Value 0.9994 0.9956 0.9971 0.9969 0.9991
Neg Pred Value 1.0000 0.9998 0.9990 0.9992 0.9994
Prevalence 0.2845 0.1935 0.1743 0.1638 0.1839
Detection Rate 0.2845 0.1934 0.1735 0.1631 0.1833
Detection Prevalence 0.2846 0.1942 0.1740 0.1636 0.1835
Balanced Accuracy 0.9999 0.9990 0.9973 0.9976 0.9985
Let’s emphasize our needed information
# Print needed information
print(confusionMatrix(rf_pred, test$classe)$overall['Accuracy'])
Accuracy
0.997791
From the Confusion Matrix we see that model accuracy of the model is very high \(99.81 \%\).
# Print error matrix
error_mat <- rf_model$err.rate
head(error_mat)
OOB A B C D E
[1,] 0.06838279 0.04993342 0.09045726 0.07142857 0.09068924 0.05142232
[2,] 0.07950145 0.04327731 0.11213802 0.09820789 0.10115607 0.06410256
[3,] 0.07290957 0.03819918 0.10039960 0.08673469 0.10187354 0.05838243
[4,] 0.06714483 0.02935835 0.08940545 0.08955224 0.09138381 0.05935423
[5,] 0.06078463 0.02896904 0.08322877 0.08101852 0.07692308 0.05260831
[6,] 0.05467112 0.02618658 0.06992724 0.06856634 0.08203678 0.04514768
# Error rate
error_rate <- error_mat [nrow(error_mat), "OOB"]
print(error_rate)
OOB
0.002766252
Now let’s finally use our prediction model to predict 20 different test cases.
# Use model on 20 cases
small_pred <- predict(rf_model, newdata = test_data, Type="class")
small_pred
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
B A B A A E D B A A B C B A E E A B B B
Levels: A B C D E
We created for our data a predicting model using Random Forest Algorithm. The accuracy of our model is is very high \(99.81 \%\) and error rate is 0.0027663