Introduction

While R provides us a straightforward formula for calculating Standard Deviation, I often better understand a programming language by trying to understand what is “going on behind the scenes.” Therefore, I’m going to try and calculate the Standard Deviation of a vector of values from scratch, i.e. using the formula for calculating standard deviation, creating new vectors and variables, and using basic functions within R to come up with a solution.

The Formula

The formula I’m going to use to calculate the standard deviation of vector elements is:

(Standard deviation formulas, 2014)

The Vector

I’m going to calculate the standard deviation for Miles Per Gallon (mpg) available through the data set “mtcars” available under library “dplyr”. In short, I’m going to create a vector of numbers called “mpg” from mtcars that corresponds to the same variable in the data set. I will do this using the following R code:

require (dplyr)
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
mtcars
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
mpg <- mtcars[,1]
mpg
##  [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2
## [15] 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4
## [29] 15.8 19.7 15.0 21.4

Calculating mean

Next, I will need to calculate the mean of this vector for use in the formula above. I do this by using the “mean” function and creating a new variable mean_mpg. I will do this by using the following code:

mean_mpg <- mean(mpg)
mean_mpg
## [1] 20.09062

So the mean is 20.09062

Summing the Squared Differences

Summing the Squared Differences is a bit more complex and requires a bit more “behind the scenes” code to create. First we must subtract the mean from each element in mpg. I will create a new vector called “differences” using the following code:

differences <- mpg - mean_mpg
differences
##  [1]  0.909375  0.909375  2.709375  1.309375 -1.390625 -1.990625 -5.790625
##  [8]  4.309375  2.709375 -0.890625 -2.290625 -3.690625 -2.790625 -4.890625
## [15] -9.690625 -9.690625 -5.390625 12.309375 10.309375 13.809375  1.409375
## [22] -4.590625 -4.890625 -6.790625 -0.890625  7.209375  5.909375 10.309375
## [29] -4.290625 -0.390625 -5.090625  1.309375

Next I will need to square each one of these numbers and then sum them together. This can be easily done by using simple calculations and the “sum” function in R. The code and results would look like this:

square_differences <- differences ^ 2
square_differences
##  [1]   0.8269629   0.8269629   7.3407129   1.7144629   1.9338379
##  [6]   3.9625879  33.5313379  18.5707129   7.3407129   0.7932129
## [11]   5.2469629  13.6207129   7.7875879  23.9182129  93.9082129
## [16]  93.9082129  29.0588379 151.5207129 106.2832129 190.6988379
## [21]   1.9863379  21.0738379  23.9182129  46.1125879   0.7932129
## [26]  51.9750879  34.9207129 106.2832129  18.4094629   0.1525879
## [31]  25.9144629   1.7144629
sum_of_squares <- sum(square_differences)
sum_of_squares
## [1] 1126.047

However, this summation step is not necessary when using the “mean” function in R, which automatically calculates and divides that sum by the number of elements, leading us to:

mean_of_sq_differences <- mean(square_differences)
mean_of_sq_differences
## [1] 35.18897

Final Step

Finally we need to find the square root of the mean of differences we calculated, which is easily done through the following code:

Stnd_Dev_mpg <- sqrt(mean_of_sq_differences)
Stnd_Dev_mpg 
## [1] 5.93203

Checking work against R functions

I found it interesting that R returns a different value for the Standard Deviation of mpg than my calculations provided. By using the Standard Deviation function (sd) available through the program, the result was:

SD_of_mpg_calculated_by_R <- sd(mpg)
SD_of_mpg_calculated_by_R
## [1] 6.026948

I wondered why this was the case. It occured to me that R might be calculating a “Sample Standard Deviation” rather than calculating using the whole population that I was using. Therefore I calculated the results using “Bessel’s Correction” in the formula, which reads:

(Standard deviation formulas, 2014)

Therefore, subtracting 1 from N in this formula, I received the same result as R, which is shown by the following code and calculations:

N <- length(mpg)
N
## [1] 32
Sample_Stnd_Dev_mpg <- sqrt((sum_of_squares)/(N-1))
Sample_Stnd_Dev_mpg
## [1] 6.026948

Reference

Standard deviation formulas. (2014). Retrieved from https://www.mathsisfun.com/data/standard-deviation-formulas.html.