Calculating Variance and Standard Deviation in R

Step 1: Define the dataset (i.e., enter the data)

dataset <- c(5, 6, 10, 8, 6)

Step 2: Calculate the mean:

mean(dataset)

## [1] 7

Step 3: Calculate the differences between each data point and the mean.

dataset - mean(dataset)

## [1] -2 -1  3  1 -1

Step 4: Square all the differences

(dataset - mean(dataset))^2

## [1] 4 1 9 1 1

Step 5: Sum the squared differences, and divide by n-1 instead of n

sum((dataset - mean(dataset))^2) # take the sum

## [1] 16

(sum((dataset - mean(dataset))^2))/4 # take the sum and divide by 4 (n-1)

## [1] 4

Voila! The variance is 4!

Step 6: The standard deviation is the square root of the variance (which is 2).

sqrt((sum((dataset - mean(dataset))^2))/4)

## [1] 2

It is a little nicer (easier to follow) if you were to do the above in the following way:

diffs       <- dataset - mean(dataset) # i am choosing to call these differences "diffs"
diffsquared <- diffs^2
variance    <- sum(diffs^2)/4  # alternatively, divide by "length(dataset) - 1"... same thing...
std_dev     <- sqrt(variance)

Note: For more info on what it means to calculate the variance and standard deviation by dividing by n-1 (instead of taking the mean, and dividing by n) google the lucid explanation available on Khan Academy.

Calculating Variance and Standard Deviation in R

CEU

September 21, 2016