Degrees of Freedom play an important part in the Student’s t-test and the chi-squared test. For the t-test it is n-1 where n is the sample size. They were initially introduced as a bias correction term as a practical correction without any substantial theoretical justification and as the use of the t-distribution developed along with the use of t-tests the distribution became associated with degrees of freedom thanks to Fisher’s presentation of Gosset’s work.
Eventually a more coherent argument was made for its use showing that when you calculate statistics you duplicate the use of “information” and degrees of freedom correct for this reuse (Walker 1940).
It is easier to see with an example. I will go back to a simple set of eight weight measurements that I made about Mars bars. I have eight weights that I can put into r and then calculate the mean.
# Assign the data to a variable (x) - a column or one dimensional array# The <- symbol means assignment# R uses brackets around all the arguments for commands.# The c indicates a list of data points separated by commasx <-c(42.97, 43.08, 43.26, 43.18, 37.93, 39.62, 39.80, 40.18)# You can display x by typing xx
This is the sum of the weights divided by the sample size which is eight. Now imagine that I spill something on my notebook and I lose one of the weights. Let’s day the third one but I still have the mean that I calculated before I had the spill.
Now I have 42.97g, 43.08g, ???g, 43.18g, 37.93g, 39.62g, 39.80g, 40.18g.
The calculation of the mean makes it possible to calculate a missing number from the original data. It didn’t have to be the third value it could have been anyone of them. By calculating the mean you make part of the original data redundant. By knowing the mean you reduce the amount of free choices for the numbers in the dataset by one. These are the degrees of freedom for the data.
To calculate the sample standard deviation you need to know the sample mean. Therefore you need to take into account the decrease in the degrees of freedom when calculating the sample standard deviation which is why we divide by n-1 and not n. The same applies for the t-distribution. As you get to more complex cases such as ANOVA the calculations further decrease the freedom in the original dataset by reusing the same information multiple times and this reduces the degrees of freedom further.