Background

From the course website:

“Using devices such as Jawbone Up, Nike FuelBand, and Fitbit it is now possible to collect a large amount of data about personal activity relatively inexpensively. These type of devices are part of the quantified self movement – a group of enthusiasts who take measurements about themselves regularly to improve their health, to find patterns in their behavior, or because they are tech geeks. One thing that people regularly do is quantify how much of a particular activity they do, but they rarely quantify how well they do it. In this project, your goal will be to use data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants. They were asked to perform barbell lifts correctly and incorrectly in 5 different ways. More information is available from the website here: http://groupware.les.inf.puc-rio.br/har (see the section on the Weight Lifting Exercise Dataset).”



Download & Import Data

Use package RCurl to download the the data and load it as training and testing_final.

library(RCurl)
## Loading required package: bitops
URL <- "https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv"
X <- getURL(URL, ssl.verifypeer = FALSE)
training <- read.csv(textConnection(X))

URL <- "https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv"
Y <- getURL(URL, ssl.verifypeer = FALSE)
testing_final <- read.csv(textConnection(Y))

rm(list=c("URL","X","Y")) # clean up the workspace 



Subset the Data

Now we’re goign to further divide training data into two sets for cross validation. We will call these training and testing. We’ll use the createDataPartition function in the caret package. We’ll split the data into 80% for training and 20% for cross-validation and we’ll do the splitting proportional to our response variable classe.

library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
inTrain <- createDataPartition(y=training$classe, p=0.8, list=FALSE)
training <- training[inTrain,]
testing <- training[-inTrain,]
rm(inTrain) # clean up your workspace 

Now you can see the sizes of our training and testing data

dim(training)
## [1] 15699   160
dim(testing)
## [1] 3141  160



Inspect the Data

Before we fit a model let’s take a closer look at our training data.

First, how balanced is the response variable (i.e., outcome we’re trying to predict)?

summary(training$classe)
##    A    B    C    D    E 
## 4464 3038 2738 2573 2886
summary(training$classe)/nrow(training)
##         A         B         C         D         E 
## 0.2843493 0.1935155 0.1744060 0.1638958 0.1838334

This looks fairly balanced between different outcomes. Although “A” is a little more common.

Let’s look at all of the predictors that we have:

summary(training)
##        X            user_name    raw_timestamp_part_1 raw_timestamp_part_2
##  Min.   :    1   adelmo  :3105   Min.   :1.322e+09    Min.   :   294      
##  1st Qu.: 4892   carlitos:2520   1st Qu.:1.323e+09    1st Qu.:252303      
##  Median : 9821   charles :2825   Median :1.323e+09    Median :500295      
##  Mean   : 9813   eurico  :2409   Mean   :1.323e+09    Mean   :500863      
##  3rd Qu.:14726   jeremy  :2740   3rd Qu.:1.323e+09    3rd Qu.:752292      
##  Max.   :19622   pedro   :2100   Max.   :1.323e+09    Max.   :998801      
##                                                                           
##           cvtd_timestamp new_window    num_window      roll_belt     
##  05/12/2011 11:24:1202   no :15374   Min.   :  1.0   Min.   :-28.90  
##  05/12/2011 11:25:1172   yes:  325   1st Qu.:223.0   1st Qu.:  1.10  
##  28/11/2011 14:14:1169               Median :424.0   Median :113.00  
##  30/11/2011 17:11:1149               Mean   :431.2   Mean   : 64.42  
##  02/12/2011 14:57:1107               3rd Qu.:645.0   3rd Qu.:123.00  
##  05/12/2011 14:23:1099               Max.   :864.0   Max.   :162.00  
##  (Other)         :8801                                               
##    pitch_belt         yaw_belt       total_accel_belt kurtosis_roll_belt
##  Min.   :-55.800   Min.   :-180.00   Min.   : 0.00             :15374   
##  1st Qu.:  1.830   1st Qu.: -88.30   1st Qu.: 3.00    #DIV/0!  :    9   
##  Median :  5.300   Median : -13.20   Median :17.00    -1.908453:    2   
##  Mean   :  0.373   Mean   : -11.40   Mean   :11.31    -0.021024:    1   
##  3rd Qu.: 15.100   3rd Qu.:  12.55   3rd Qu.:18.00    -0.025513:    1   
##  Max.   : 60.300   Max.   : 179.00   Max.   :29.00    -0.033935:    1   
##                                                       (Other)  :  311   
##  kurtosis_picth_belt kurtosis_yaw_belt skewness_roll_belt
##           :15374            :15374              :15374   
##  #DIV/0!  :   28     #DIV/0!:  325     #DIV/0!  :    8   
##  47.000000:    4                       0.000000 :    3   
##  -0.150950:    3                       0.422463 :    2   
##  1.216445 :    3                       -0.003095:    1   
##  1.326417 :    3                       -0.010002:    1   
##  (Other)  :  284                       (Other)  :  310   
##  skewness_roll_belt.1 skewness_yaw_belt max_roll_belt     max_picth_belt 
##           :15374             :15374     Min.   :-94.300   Min.   : 3.00  
##  #DIV/0!  :   28      #DIV/0!:  325     1st Qu.:-88.000   1st Qu.: 5.00  
##  -2.156553:    3                        Median : -4.900   Median :18.00  
##  -3.072669:    3                        Mean   : -4.574   Mean   :13.05  
##  0.000000 :    3                        3rd Qu.: 20.100   3rd Qu.:19.00  
##  6.855655 :    3                        Max.   :180.000   Max.   :30.00  
##  (Other)  :  285                        NA's   :15374     NA's   :15374  
##   max_yaw_belt   min_roll_belt      min_pitch_belt   min_yaw_belt  
##         :15374   Min.   :-180.000   Min.   : 0.00          :15374  
##  -1.4   :   27   1st Qu.: -88.400   1st Qu.: 3.00   -1.4   :   27  
##  -1.1   :   23   Median :  -7.000   Median :16.00   -1.1   :   23  
##  -1.2   :   22   Mean   :  -8.758   Mean   :10.87   -1.2   :   22  
##  -0.9   :   18   3rd Qu.:  13.600   3rd Qu.:17.00   -0.9   :   18  
##  -0.7   :   17   Max.   : 173.000   Max.   :23.00   -0.7   :   17  
##  (Other):  218   NA's   :15374      NA's   :15374   (Other):  218  
##  amplitude_roll_belt amplitude_pitch_belt amplitude_yaw_belt
##  Min.   :  0.000     Min.   : 0.000              :15374     
##  1st Qu.:  0.300     1st Qu.: 1.000       #DIV/0!:    9     
##  Median :  1.000     Median : 1.000       0.00   :   11     
##  Mean   :  4.183     Mean   : 2.175       0.0000 :  305     
##  3rd Qu.:  2.000     3rd Qu.: 2.000                         
##  Max.   :360.000     Max.   :12.000                         
##  NA's   :15374       NA's   :15374                          
##  var_total_accel_belt avg_roll_belt    stddev_roll_belt var_roll_belt    
##  Min.   : 0.000       Min.   :-20.90   Min.   : 0.000   Min.   :  0.000  
##  1st Qu.: 0.100       1st Qu.:  1.20   1st Qu.: 0.100   1st Qu.:  0.000  
##  Median : 0.200       Median :116.70   Median : 0.400   Median :  0.100  
##  Mean   : 1.004       Mean   : 69.24   Mean   : 1.353   Mean   :  8.153  
##  3rd Qu.: 0.300       3rd Qu.:123.90   3rd Qu.: 0.700   3rd Qu.:  0.440  
##  Max.   :16.500       Max.   :157.40   Max.   :14.200   Max.   :200.700  
##  NA's   :15374        NA's   :15374    NA's   :15374    NA's   :15374    
##  avg_pitch_belt    stddev_pitch_belt var_pitch_belt    avg_yaw_belt    
##  Min.   :-51.400   Min.   :0.000     Min.   : 0.000   Min.   :-138.30  
##  1st Qu.:  1.900   1st Qu.:0.200     1st Qu.: 0.000   1st Qu.: -88.10  
##  Median :  5.300   Median :0.300     Median : 0.100   Median :  -5.90  
##  Mean   :  0.013   Mean   :0.588     Mean   : 0.737   Mean   :  -7.02  
##  3rd Qu.: 15.700   3rd Qu.:0.700     3rd Qu.: 0.500   3rd Qu.:  18.20  
##  Max.   : 41.000   Max.   :4.000     Max.   :16.200   Max.   : 173.40  
##  NA's   :15374     NA's   :15374     NA's   :15374    NA's   :15374    
##  stddev_yaw_belt    var_yaw_belt       gyros_belt_x      
##  Min.   :  0.000   Min.   :    0.00   Min.   :-1.040000  
##  1st Qu.:  0.100   1st Qu.:    0.01   1st Qu.:-0.030000  
##  Median :  0.300   Median :    0.09   Median : 0.030000  
##  Mean   :  1.504   Mean   :  133.95   Mean   :-0.005337  
##  3rd Qu.:  0.700   3rd Qu.:    0.51   3rd Qu.: 0.110000  
##  Max.   :176.600   Max.   :31183.24   Max.   : 2.200000  
##  NA's   :15374     NA's   :15374                         
##   gyros_belt_y       gyros_belt_z      accel_belt_x       accel_belt_y  
##  Min.   :-0.64000   Min.   :-1.4600   Min.   :-120.000   Min.   :-69.0  
##  1st Qu.: 0.00000   1st Qu.:-0.2000   1st Qu.: -21.000   1st Qu.:  3.0  
##  Median : 0.02000   Median :-0.1000   Median : -15.000   Median : 34.0  
##  Mean   : 0.03924   Mean   :-0.1316   Mean   :  -5.695   Mean   : 30.2  
##  3rd Qu.: 0.11000   3rd Qu.:-0.0200   3rd Qu.:  -5.000   3rd Qu.: 61.0  
##  Max.   : 0.64000   Max.   : 1.6200   Max.   :  85.000   Max.   :164.0  
##                                                                         
##   accel_belt_z     magnet_belt_x    magnet_belt_y   magnet_belt_z   
##  Min.   :-275.00   Min.   :-52.00   Min.   :354.0   Min.   :-623.0  
##  1st Qu.:-162.00   1st Qu.:  9.00   1st Qu.:581.0   1st Qu.:-375.0  
##  Median :-152.00   Median : 35.00   Median :601.0   Median :-319.0  
##  Mean   : -72.62   Mean   : 55.48   Mean   :593.8   Mean   :-345.3  
##  3rd Qu.:  27.00   3rd Qu.: 59.00   3rd Qu.:610.0   3rd Qu.:-306.0  
##  Max.   : 105.00   Max.   :485.00   Max.   :673.0   Max.   : 293.0  
##                                                                     
##     roll_arm        pitch_arm          yaw_arm          total_accel_arm
##  Min.   :-180.0   Min.   :-88.800   Min.   :-180.0000   Min.   : 1.00  
##  1st Qu.: -31.5   1st Qu.:-26.000   1st Qu.: -42.7000   1st Qu.:17.00  
##  Median :   0.0   Median :  0.000   Median :   0.0000   Median :27.00  
##  Mean   :  17.7   Mean   : -4.758   Mean   :  -0.5804   Mean   :25.55  
##  3rd Qu.:  77.4   3rd Qu.: 11.100   3rd Qu.:  45.6500   3rd Qu.:33.00  
##  Max.   : 180.0   Max.   : 88.500   Max.   : 180.0000   Max.   :66.00  
##                                                                        
##  var_accel_arm      avg_roll_arm     stddev_roll_arm    var_roll_arm     
##  Min.   :  0.000   Min.   :-166.67   Min.   :  0.000   Min.   :    0.00  
##  1st Qu.:  9.682   1st Qu.: -38.31   1st Qu.:  1.643   1st Qu.:    2.70  
##  Median : 40.562   Median :   0.00   Median :  5.455   Median :   29.75  
##  Mean   : 51.126   Mean   :  13.42   Mean   : 10.298   Mean   :  329.58  
##  3rd Qu.: 70.608   3rd Qu.:  76.25   3rd Qu.: 13.929   3rd Qu.:  194.01  
##  Max.   :331.699   Max.   : 160.78   Max.   :161.452   Max.   :26066.58  
##  NA's   :15374     NA's   :15374     NA's   :15374     NA's   :15374     
##  avg_pitch_arm     stddev_pitch_arm var_pitch_arm       avg_yaw_arm      
##  Min.   :-77.019   Min.   : 0.000   Min.   :   0.000   Min.   :-173.440  
##  1st Qu.:-21.041   1st Qu.: 2.518   1st Qu.:   6.341   1st Qu.: -30.206  
##  Median :  0.000   Median : 8.219   Median :  67.546   Median :   0.000  
##  Mean   : -3.090   Mean   :10.758   Mean   : 204.430   Mean   :   2.987  
##  3rd Qu.:  9.755   3rd Qu.:16.813   3rd Qu.: 282.666   3rd Qu.:  41.600  
##  Max.   : 75.659   Max.   :43.097   Max.   :1857.367   Max.   : 152.000  
##  NA's   :15374     NA's   :15374    NA's   :15374      NA's   :15374     
##  stddev_yaw_arm     var_yaw_arm        gyros_arm_x        gyros_arm_y    
##  Min.   :  0.000   Min.   :    0.00   Min.   :-6.37000   Min.   :-3.440  
##  1st Qu.:  3.965   1st Qu.:   15.72   1st Qu.:-1.32000   1st Qu.:-0.790  
##  Median : 16.520   Median :  272.91   Median : 0.08000   Median :-0.240  
##  Mean   : 22.118   Mean   : 1062.50   Mean   : 0.03828   Mean   :-0.256  
##  3rd Qu.: 32.775   3rd Qu.: 1074.19   3rd Qu.: 1.54000   3rd Qu.: 0.140  
##  Max.   :177.044   Max.   :31344.57   Max.   : 4.87000   Max.   : 2.840  
##  NA's   :15374     NA's   :15374                                         
##   gyros_arm_z       accel_arm_x       accel_arm_y       accel_arm_z     
##  Min.   :-2.3300   Min.   :-404.00   Min.   :-315.00   Min.   :-636.00  
##  1st Qu.:-0.0700   1st Qu.:-242.00   1st Qu.: -54.00   1st Qu.:-144.00  
##  Median : 0.2300   Median : -44.00   Median :  14.00   Median : -47.00  
##  Mean   : 0.2674   Mean   : -60.31   Mean   :  32.57   Mean   : -71.47  
##  3rd Qu.: 0.7200   3rd Qu.:  83.00   3rd Qu.: 139.00   3rd Qu.:  24.00  
##  Max.   : 3.0200   Max.   : 437.00   Max.   : 308.00   Max.   : 292.00  
##                                                                         
##   magnet_arm_x     magnet_arm_y   magnet_arm_z    kurtosis_roll_arm
##  Min.   :-584.0   Min.   :-392   Min.   :-597.0           :15374   
##  1st Qu.:-304.0   1st Qu.:  -9   1st Qu.: 134.0   #DIV/0! :   56   
##  Median : 290.0   Median : 202   Median : 443.0   -0.02438:    1   
##  Mean   : 190.6   Mean   : 157   Mean   : 306.1   -0.04190:    1   
##  3rd Qu.: 637.0   3rd Qu.: 324   3rd Qu.: 544.0   -0.05051:    1   
##  Max.   : 782.0   Max.   : 583   Max.   : 694.0   -0.05695:    1   
##                                                   (Other) :  265   
##  kurtosis_picth_arm kurtosis_yaw_arm skewness_roll_arm skewness_pitch_arm
##          :15374             :15374           :15374            :15374    
##  #DIV/0! :   58     #DIV/0! :   10   #DIV/0! :   55    #DIV/0! :   58    
##  -0.00484:    1     -0.01548:    1   -0.00051:    1    -0.00184:    1    
##  -0.02967:    1     -0.01749:    1   -0.00696:    1    -0.01247:    1    
##  -0.07394:    1     -0.04059:    1   -0.01884:    1    -0.02063:    1    
##  -0.10385:    1     -0.04626:    1   -0.03359:    1    -0.02652:    1    
##  (Other) :  263     (Other) :  311   (Other) :  266    (Other) :  263    
##  skewness_yaw_arm  max_roll_arm    max_picth_arm      max_yaw_arm   
##          :15374   Min.   :-71.90   Min.   :-173.00   Min.   : 4.00  
##  #DIV/0! :   10   1st Qu.:  0.00   1st Qu.:  -5.30   1st Qu.:29.00  
##  -0.00311:    1   Median :  8.40   Median :  27.30   Median :34.00  
##  -0.04470:    1   Mean   : 13.14   Mean   :  36.03   Mean   :35.04  
##  -0.04866:    1   3rd Qu.: 28.10   3rd Qu.: 100.00   3rd Qu.:41.00  
##  -0.05413:    1   Max.   : 85.50   Max.   : 180.00   Max.   :65.00  
##  (Other) :  311   NA's   :15374    NA's   :15374     NA's   :15374  
##   min_roll_arm   min_pitch_arm      min_yaw_arm    amplitude_roll_arm
##  Min.   :-89.1   Min.   :-180.00   Min.   : 1.00   Min.   :  0.00    
##  1st Qu.:-41.4   1st Qu.: -75.30   1st Qu.: 8.00   1st Qu.:  9.70    
##  Median :-21.7   Median : -32.80   Median :13.00   Median : 28.64    
##  Mean   :-20.3   Mean   : -32.86   Mean   :14.59   Mean   : 33.44    
##  3rd Qu.:  0.0   3rd Qu.:   0.00   3rd Qu.:19.00   3rd Qu.: 51.90    
##  Max.   : 66.4   Max.   : 152.00   Max.   :38.00   Max.   :119.50    
##  NA's   :15374   NA's   :15374     NA's   :15374   NA's   :15374     
##  amplitude_pitch_arm amplitude_yaw_arm roll_dumbbell     pitch_dumbbell   
##  Min.   :  0.0       Min.   : 0.00     Min.   :-153.71   Min.   :-149.59  
##  1st Qu.: 14.6       1st Qu.:13.00     1st Qu.: -18.90   1st Qu.: -40.82  
##  Median : 55.4       Median :21.00     Median :  48.20   Median : -20.90  
##  Mean   : 68.9       Mean   :20.45     Mean   :  23.94   Mean   : -10.74  
##  3rd Qu.:110.8       3rd Qu.:27.00     3rd Qu.:  67.73   3rd Qu.:  17.44  
##  Max.   :360.0       Max.   :52.00     Max.   : 153.55   Max.   : 129.82  
##  NA's   :15374       NA's   :15374                                        
##   yaw_dumbbell      kurtosis_roll_dumbbell kurtosis_picth_dumbbell
##  Min.   :-148.766          :15374                 :15374          
##  1st Qu.: -77.592   #DIV/0!:    5          -0.5464:    2          
##  Median :  -2.282   -0.3705:    2          -0.9334:    2          
##  Mean   :   1.865   -0.5855:    2          -2.0833:    2          
##  3rd Qu.:  79.998   -2.0851:    2          -2.0851:    2          
##  Max.   : 154.952   -2.0889:    2          -2.0889:    2          
##                     (Other):  312          (Other):  315          
##  kurtosis_yaw_dumbbell skewness_roll_dumbbell skewness_pitch_dumbbell
##         :15374                :15374                 :15374          
##  #DIV/0!:  325         #DIV/0!:    4          -0.3521:    2          
##                        0.1110 :    2          0.1090 :    2          
##                        1.0312 :    2          1.0326 :    2          
##                        -0.0082:    1          -0.0053:    1          
##                        -0.0096:    1          -0.0166:    1          
##                        (Other):  315          (Other):  317          
##  skewness_yaw_dumbbell max_roll_dumbbell max_picth_dumbbell
##         :15374         Min.   :-70.10    Min.   :-112.90   
##  #DIV/0!:  325         1st Qu.:-26.90    1st Qu.: -67.80   
##                        Median : 16.50    Median :  42.60   
##                        Mean   : 14.21    Mean   :  32.91   
##                        3rd Qu.: 50.60    3rd Qu.: 133.00   
##                        Max.   :129.80    Max.   : 155.00   
##                        NA's   :15374     NA's   :15374     
##  max_yaw_dumbbell min_roll_dumbbell min_pitch_dumbbell min_yaw_dumbbell
##         :15374    Min.   :-149.6    Min.   :-147.00           :15374   
##  0.2    :   17    1st Qu.: -59.2    1st Qu.: -92.00    0.2    :   17   
##  -0.6   :   16    Median : -39.8    Median : -62.70    -0.6   :   16   
##  -0.4   :   13    Mean   : -39.5    Mean   : -31.46    -0.4   :   13   
##  -0.8   :   13    3rd Qu.: -19.3    3rd Qu.:  23.00    -0.8   :   13   
##  -0.3   :   12    Max.   :  73.2    Max.   : 120.90    -0.3   :   12   
##  (Other):  254    NA's   :15374     NA's   :15374      (Other):  254   
##  amplitude_roll_dumbbell amplitude_pitch_dumbbell amplitude_yaw_dumbbell
##  Min.   :  0.00          Min.   :  0.00                  :15374         
##  1st Qu.: 13.41          1st Qu.: 16.50           #DIV/0!:    5         
##  Median : 33.22          Median : 41.52           0.00   :  320         
##  Mean   : 53.71          Mean   : 64.36                                 
##  3rd Qu.: 76.14          3rd Qu.: 97.47                                 
##  Max.   :256.48          Max.   :270.84                                 
##  NA's   :15374           NA's   :15374                                  
##  total_accel_dumbbell var_accel_dumbbell avg_roll_dumbbell
##  Min.   : 0.00        Min.   :  0.000    Min.   :-128.96  
##  1st Qu.: 4.00        1st Qu.:  0.374    1st Qu.: -11.03  
##  Median :10.00        Median :  0.932    Median :  47.20  
##  Mean   :13.73        Mean   :  4.588    Mean   :  24.97  
##  3rd Qu.:19.00        3rd Qu.:  3.466    3rd Qu.:  65.01  
##  Max.   :58.00        Max.   :230.428    Max.   : 125.99  
##                       NA's   :15374      NA's   :15374    
##  stddev_roll_dumbbell var_roll_dumbbell  avg_pitch_dumbbell
##  Min.   :  0.00       Min.   :    0.00   Min.   :-70.73    
##  1st Qu.:  4.51       1st Qu.:   20.34   1st Qu.:-40.23    
##  Median : 11.31       Median :  127.98   Median :-15.61    
##  Mean   : 20.66       Mean   : 1043.02   Mean   :-10.96    
##  3rd Qu.: 26.18       3rd Qu.:  685.21   3rd Qu.: 15.44    
##  Max.   :123.78       Max.   :15321.01   Max.   : 94.28    
##  NA's   :15374        NA's   :15374      NA's   :15374     
##  stddev_pitch_dumbbell var_pitch_dumbbell avg_yaw_dumbbell  
##  Min.   : 0.000        Min.   :   0.00    Min.   :-117.950  
##  1st Qu.: 3.108        1st Qu.:   9.66    1st Qu.: -76.640  
##  Median : 7.938        Median :  63.01    Median :   4.815  
##  Mean   :12.930        Mean   : 349.00    Mean   :   1.297  
##  3rd Qu.:18.291        3rd Qu.: 334.57    3rd Qu.:  72.140  
##  Max.   :82.680        Max.   :6836.02    Max.   : 130.879  
##  NA's   :15374         NA's   :15374      NA's   :15374     
##  stddev_yaw_dumbbell var_yaw_dumbbell   gyros_dumbbell_x   
##  Min.   :  0.000     Min.   :    0.00   Min.   :-204.0000  
##  1st Qu.:  3.643     1st Qu.:   13.27   1st Qu.:  -0.0300  
##  Median :  9.587     Median :   91.92   Median :   0.1300  
##  Mean   : 16.192     Mean   :  565.21   Mean   :   0.1578  
##  3rd Qu.: 23.642     3rd Qu.:  558.93   3rd Qu.:   0.3500  
##  Max.   :107.088     Max.   :11467.91   Max.   :   2.2200  
##  NA's   :15374       NA's   :15374                         
##  gyros_dumbbell_y   gyros_dumbbell_z   accel_dumbbell_x  accel_dumbbell_y 
##  Min.   :-2.10000   Min.   : -2.3800   Min.   :-419.00   Min.   :-189.00  
##  1st Qu.:-0.14000   1st Qu.: -0.3100   1st Qu.: -50.00   1st Qu.:  -8.00  
##  Median : 0.05000   Median : -0.1300   Median :  -8.00   Median :  42.00  
##  Mean   : 0.04931   Mean   : -0.1244   Mean   : -28.53   Mean   :  52.82  
##  3rd Qu.: 0.21000   3rd Qu.:  0.0300   3rd Qu.:  11.00   3rd Qu.: 112.00  
##  Max.   :52.00000   Max.   :317.0000   Max.   : 234.00   Max.   : 315.00  
##                                                                           
##  accel_dumbbell_z  magnet_dumbbell_x magnet_dumbbell_y magnet_dumbbell_z
##  Min.   :-284.00   Min.   :-643.0    Min.   :-744      Min.   :-262.0   
##  1st Qu.:-142.00   1st Qu.:-535.0    1st Qu.: 232      1st Qu.: -45.0   
##  Median :  -1.00   Median :-480.0    Median : 311      Median :  13.0   
##  Mean   : -38.42   Mean   :-327.6    Mean   : 221      Mean   :  45.2   
##  3rd Qu.:  38.00   3rd Qu.:-301.0    3rd Qu.: 391      3rd Qu.:  94.0   
##  Max.   : 318.00   Max.   : 592.0    Max.   : 633      Max.   : 451.0   
##                                                                         
##   roll_forearm      pitch_forearm     yaw_forearm     
##  Min.   :-180.000   Min.   :-72.50   Min.   :-180.00  
##  1st Qu.:  -0.865   1st Qu.:  0.00   1st Qu.: -69.10  
##  Median :  21.500   Median :  9.22   Median :   0.00  
##  Mean   :  33.564   Mean   : 10.74   Mean   :  19.32  
##  3rd Qu.: 140.000   3rd Qu.: 28.60   3rd Qu.: 110.00  
##  Max.   : 180.000   Max.   : 88.70   Max.   : 180.00  
##                                                       
##  kurtosis_roll_forearm kurtosis_picth_forearm kurtosis_yaw_forearm
##         :15374                :15374                 :15374       
##  #DIV/0!:   70         #DIV/0!:   70          #DIV/0!:  325       
##  -0.8079:    2         -0.0489:    1                              
##  -0.0227:    1         -0.0523:    1                              
##  -0.0359:    1         -0.0891:    1                              
##  -0.0567:    1         -0.0920:    1                              
##  (Other):  250         (Other):  251                              
##  skewness_roll_forearm skewness_pitch_forearm skewness_yaw_forearm
##         :15374                :15374                 :15374       
##  #DIV/0!:   69         #DIV/0!:   70          #DIV/0!:  325       
##  -0.1912:    2         0.0000 :    4                              
##  -0.0004:    1         -0.6992:    2                              
##  -0.0013:    1         -0.0113:    1                              
##  -0.0088:    1         -0.0131:    1                              
##  (Other):  251         (Other):  247                              
##  max_roll_forearm max_picth_forearm max_yaw_forearm min_roll_forearm 
##  Min.   :-66.60   Min.   :-151.00          :15374   Min.   :-72.500  
##  1st Qu.:  0.00   1st Qu.:   0.00   #DIV/0!:   70   1st Qu.: -6.000  
##  Median : 26.90   Median : 112.00   -1.2   :   26   Median :  0.000  
##  Mean   : 24.34   Mean   :  81.33   -1.3   :   23   Mean   : -0.007  
##  3rd Qu.: 47.20   3rd Qu.: 175.00   -1.5   :   22   3rd Qu.: 12.600  
##  Max.   : 89.80   Max.   : 180.00   -1.6   :   21   Max.   : 62.100  
##  NA's   :15374    NA's   :15374     (Other):  163   NA's   :15374    
##  min_pitch_forearm min_yaw_forearm amplitude_roll_forearm
##  Min.   :-180.0           :15374   Min.   :  0.00        
##  1st Qu.:-175.0    #DIV/0!:   70   1st Qu.:  1.07        
##  Median : -65.5    -1.2   :   26   Median : 17.84        
##  Mean   : -59.1    -1.3   :   23   Mean   : 24.34        
##  3rd Qu.:   0.0    -1.5   :   22   3rd Qu.: 40.20        
##  Max.   : 167.0    -1.6   :   21   Max.   :120.30        
##  NA's   :15374     (Other):  163   NA's   :15374         
##  amplitude_pitch_forearm amplitude_yaw_forearm total_accel_forearm
##  Min.   :  0.0                  :15374         Min.   :  0.00     
##  1st Qu.:  1.8           #DIV/0!:   70         1st Qu.: 29.00     
##  Median : 85.6           0.00   :  255         Median : 36.00     
##  Mean   :140.4                                 Mean   : 34.67     
##  3rd Qu.:350.0                                 3rd Qu.: 41.00     
##  Max.   :360.0                                 Max.   :108.00     
##  NA's   :15374                                                    
##  var_accel_forearm avg_roll_forearm   stddev_roll_forearm
##  Min.   :  0.000   Min.   :-177.234   Min.   :  0.000    
##  1st Qu.:  6.768   1st Qu.:  -2.985   1st Qu.:  0.428    
##  Median : 20.892   Median :   5.499   Median :  8.455    
##  Mean   : 33.659   Mean   :  31.379   Mean   : 43.048    
##  3rd Qu.: 51.253   3rd Qu.: 104.021   3rd Qu.: 87.099    
##  Max.   :172.606   Max.   : 174.714   Max.   :179.171    
##  NA's   :15374     NA's   :15374      NA's   :15374      
##  var_roll_forearm   avg_pitch_forearm stddev_pitch_forearm
##  Min.   :    0.00   Min.   :-68.17    Min.   : 0.000      
##  1st Qu.:    0.18   1st Qu.:  0.00    1st Qu.: 0.298      
##  Median :   71.48   Median : 12.24    Median : 5.552      
##  Mean   : 5367.40   Mean   : 11.95    Mean   : 7.875      
##  3rd Qu.: 7586.30   3rd Qu.: 29.55    3rd Qu.:12.954      
##  Max.   :32102.24   Max.   : 72.09    Max.   :39.561      
##  NA's   :15374      NA's   :15374     NA's   :15374       
##  var_pitch_forearm  avg_yaw_forearm   stddev_yaw_forearm
##  Min.   :   0.000   Min.   :-155.06   Min.   :  0.00    
##  1st Qu.:   0.089   1st Qu.: -26.87   1st Qu.:  0.52    
##  Median :  30.825   Median :   0.00   Median : 26.16    
##  Mean   : 134.153   Mean   :  17.18   Mean   : 45.41    
##  3rd Qu.: 167.818   3rd Qu.:  84.15   3rd Qu.: 87.95    
##  Max.   :1565.055   Max.   : 169.24   Max.   :197.51    
##  NA's   :15374      NA's   :15374     NA's   :15374     
##  var_yaw_forearm    gyros_forearm_x    gyros_forearm_y   
##  Min.   :    0.00   Min.   :-22.0000   Min.   : -6.6200  
##  1st Qu.:    0.27   1st Qu.: -0.2200   1st Qu.: -1.4800  
##  Median :  684.62   Median :  0.0500   Median :  0.0300  
##  Mean   : 4710.02   Mean   :  0.1575   Mean   :  0.0747  
##  3rd Qu.: 7735.10   3rd Qu.:  0.5600   3rd Qu.:  1.6100  
##  Max.   :39009.33   Max.   :  3.9700   Max.   :311.0000  
##  NA's   :15374                                           
##  gyros_forearm_z   accel_forearm_x   accel_forearm_y  accel_forearm_z  
##  Min.   : -8.090   Min.   :-498.00   Min.   :-632.0   Min.   :-446.00  
##  1st Qu.: -0.180   1st Qu.:-178.00   1st Qu.:  54.0   1st Qu.:-182.00  
##  Median :  0.080   Median : -57.00   Median : 199.0   Median : -42.00  
##  Mean   :  0.153   Mean   : -61.26   Mean   : 162.5   Mean   : -56.63  
##  3rd Qu.:  0.490   3rd Qu.:  77.00   3rd Qu.: 312.0   3rd Qu.:  25.00  
##  Max.   :231.000   Max.   : 477.00   Max.   : 923.0   Max.   : 291.00  
##                                                                        
##  magnet_forearm_x  magnet_forearm_y magnet_forearm_z classe  
##  Min.   :-1280.0   Min.   :-896.0   Min.   :-973     A:4464  
##  1st Qu.: -615.0   1st Qu.:  -7.0   1st Qu.: 197     B:3038  
##  Median : -377.0   Median : 587.0   Median : 512     C:2738  
##  Mean   : -311.4   Mean   : 376.5   Mean   : 396     D:2573  
##  3rd Qu.:  -70.0   3rd Qu.: 735.0   3rd Qu.: 653     E:2886  
##  Max.   :  666.0   Max.   :1480.0   Max.   :1090             
## 

Wow. That’s a lot of variables. Let’s see if we can remove anything that is not going to be informative

We probably don’t want to predict with training$X. It is just the observations numbered from 1 to 19622.

summary(training$X)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1    4892    9821    9813   14730   19620

Remove ’em! And we’ll do this for all of our data sets training for model building, testing for cross-validation, and testing_final for the set we’re making our actual predictions on.

training$X <- NULL 
testing$X <- NULL 
testing_final$X <- NULL 

There are a lot of missing variables. In almost all of the cases where there are NAs, the number of NAs is 19216. Most of these variables are statistics of other columns. e.g., kurtosis, skewness, var, min, max, etc.

Let’s remove those variables with tons of missing values. We’ll use regular expressions to find all of the columns that you don’t want.

var_names <- grep("^(var_|stddev_|avg_|min_|max_|skewness_|kurtosis_|amplitude_)",names(training))

# remove those columns from the training and testing sets and the final testing set too 
training02 <- training[,-var_names]
testing02 <- testing[,-var_names]
testing_final02 <- testing_final[,-var_names]

We’ll now call our data sets training02, testing02, and testing_final02.



Model Fitting: Random Forest

We’re going to fit a random forest model to our data. Random forests are a good method for classification, especially in cases with non-linear relationships between variables.

library(caret)

# run the model 
set.seed(1235)
modelFit <- train(classe ~ . , 
                    method="rf",
                    verbose=TRUE,
                    importance=TRUE,
                    data=training02)
## Loading required package: randomForest
## randomForest 4.6-10
## Type rfNews() to see new features/changes/bug fixes.

Now, let’s look at how this model performed

print(modelFit)
## Random Forest 
## 
## 15699 samples
##    58 predictor
##     5 classes: 'A', 'B', 'C', 'D', 'E' 
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## 
## Summary of sample sizes: 15699, 15699, 15699, 15699, 15699, 15699, ... 
## 
## Resampling results across tuning parameters:
## 
##   mtry  Accuracy   Kappa      Accuracy SD   Kappa SD   
##    2    0.9879858  0.9847964  0.0018194717  0.002303120
##   41    0.9987640  0.9984363  0.0007251904  0.000917370
##   80    0.9978913  0.9973324  0.0013177840  0.001665298
## 
## Accuracy was used to select the optimal model using  the largest value.
## The final value used for the model was mtry = 41.

We’ve built this model on the training02 set. Now, let’s see how well the model did not on testing02 set.

# predict new values
pred <- predict(modelFit, testing02)

# confusion matrix on the test data 
confusionMatrix(pred, testing02$classe)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   A   B   C   D   E
##          A 879   0   0   0   0
##          B   0 625   0   0   0
##          C   0   0 550   0   0
##          D   0   0   0 507   0
##          E   0   0   0   0 580
## 
## Overall Statistics
##                                      
##                Accuracy : 1          
##                  95% CI : (0.9988, 1)
##     No Information Rate : 0.2798     
##     P-Value [Acc > NIR] : < 2.2e-16  
##                                      
##                   Kappa : 1          
##  Mcnemar's Test P-Value : NA         
## 
## Statistics by Class:
## 
##                      Class: A Class: B Class: C Class: D Class: E
## Sensitivity            1.0000    1.000   1.0000   1.0000   1.0000
## Specificity            1.0000    1.000   1.0000   1.0000   1.0000
## Pos Pred Value         1.0000    1.000   1.0000   1.0000   1.0000
## Neg Pred Value         1.0000    1.000   1.0000   1.0000   1.0000
## Prevalence             0.2798    0.199   0.1751   0.1614   0.1847
## Detection Rate         0.2798    0.199   0.1751   0.1614   0.1847
## Detection Prevalence   0.2798    0.199   0.1751   0.1614   0.1847
## Balanced Accuracy      1.0000    1.000   1.0000   1.0000   1.0000

This looks pretty good. Accuracy is 100% i.e. out-of-bag (OOB) error rate is 0%!


Let’s see which variables were the most important predictors

varImp(modelFit)
## rf variable importance
## 
##   variables are sorted by maximum importance across the classes
##   only 20 most important variables shown (out of 80)
## 
##                                    A      B      C     D     E
## raw_timestamp_part_1           84.70 100.00 95.762 84.55 50.61
## roll_belt                      56.78  85.67 76.215 68.94 57.56
## pitch_forearm                  33.57  48.01 67.647 50.31 40.95
## num_window                     42.31  65.33 54.562 43.65 46.35
## magnet_dumbbell_z              58.64  39.55 49.873 36.95 32.59
## cvtd_timestamp30/11/2011 17:12 19.36  37.00 44.796 41.60 53.68
## yaw_belt                       20.32  32.37 42.259 34.80 21.49
## cvtd_timestamp28/11/2011 14:15 15.43  25.18 26.136 22.44 36.72
## magnet_dumbbell_y              36.50  32.78 34.115 31.81 28.93
## pitch_belt                     17.92  30.78 34.499 24.39 21.26
## cvtd_timestamp02/12/2011 14:58 19.42  26.72 16.135 30.54 19.52
## cvtd_timestamp05/12/2011 11:24 16.73  27.80 14.835 23.88 17.43
## cvtd_timestamp02/12/2011 13:33 23.36  21.88 26.459 24.95 25.29
## cvtd_timestamp05/12/2011 14:24 12.01  14.74 14.780 22.61 21.11
## cvtd_timestamp05/12/2011 11:25 16.17  22.55  9.316 16.70 16.17
## roll_forearm                   22.34  18.12 19.380 16.26 15.69
## roll_dumbbell                  16.02  19.07 21.494 19.99 16.69
## gyros_dumbbell_y               20.40  16.42 19.450 13.16 12.09
## cvtd_timestamp02/12/2011 13:35 13.68  20.29 16.588 17.75 10.80
## magnet_dumbbell_x              19.05  19.88 19.096 19.95 17.38

Final predictions

Finally, let’s make our predictions on the unknown testing_final02 cases.

predictions_final <- predict(modelFit,newdata=testing_final02)
predictions_final
##  [1] B A B A A E D B A A B C B A E E A B B B
## Levels: A B C D E

There you have it. Those are the predictions for how the excercise was performed in 20 final test cases.