Step 1: Define the dataset (i.e., enter the data)
dataset <- c(5, 6, 10, 8, 6)
Step 2: Calculate the mean:
mean(dataset)
## [1] 7
Step 3: Calculate the differences between each data point and the mean.
dataset - mean(dataset)
## [1] -2 -1 3 1 -1
Step 4: Square all the differences
(dataset - mean(dataset))^2
## [1] 4 1 9 1 1
Step 5: Sum the squared differences, and divide by n-1 instead of n
sum((dataset - mean(dataset))^2) # take the sum
## [1] 16
(sum((dataset - mean(dataset))^2))/4 # take the sum and divide by 4 (n-1)
## [1] 4
Voila! The variance is 4!
Step 6: The standard deviation is the square root of the variance (which is 2).
sqrt((sum((dataset - mean(dataset))^2))/4)
## [1] 2
It is a little nicer (easier to follow) if you were to do the above in the following way:
diffs <- dataset - mean(dataset) # i am choosing to call these differences "diffs"
diffsquared <- diffs^2
variance <- sum(diffs^2)/4 # alternatively, divide by "length(dataset) - 1"... same thing...
std_dev <- sqrt(variance)
Note: For more info on what it means to calculate the variance and standard deviation by dividing by n-1 (instead of taking the mean, and dividing by n) google the lucid explanation available on Khan Academy.