Overview
In this R Studio tutorial, I want to assess whether or not a distribution is normal using QQ plots or Quantile-Quantile plots. Below are two examples to show how to use QQ plots:
Example #1
Start by first loading the data frame you want to analyze into R. There are 14 variables with 178 rows in this data frame. When you make a QQ plot you use the function qqnorm. This function plots the theoretical quantiles of a normal distribution along the x-axis and the actual quantiles of data on the y-axis.
Now let’s look at a QQ norm plot for malic acid to determine if the distribution of values in malic acid are normal or not.
So we see that this is a normal QQ plot with our sample quantiles on the y-axis and theoretical normal quantiles on the x-axis.
Now I’m going to add a theoretical line by using qqline along which the values should lie if the distribution is normal. I am also making the line red by adding col = ‘red’ to distinguish it from the points.
If the distribution were normal, all of our points would lie on this line which is in fact an x = y line where the sample quantiles and theoretical quantiles would be equal. Since a majority of the points are not lying on the line, we can conclude that the distribution of malic acid values is not normal.
Example #2
Let’s look at a different variable. This time we will analyze hue or the different colors of the wines.
This is the QQ plot. Now to assess whether it is normal or not, let’s add on a QQ line.
For this plot we see that while there are some points that do not specifically lie on the line at the ends, a vast majority of points lie on the line. Therefore, we can conclude that the hue variable is most likely normally distributed.
Conclusion
This tutorial only uses two examples with one data frame, but there are so many more ways QQ plots can help visualize the distribution of data. If two samples do differ, it is also useful to gain some understanding of the differences with QQ plots.
References