Intro

In class we saw how quantile-quantile plots can be used to visually assess if data are consistent with a normal distribution. In this write-up, I’ll describe how you could compute one “by hand”.

Rather than work with the large data set we saw in class, I’ll do this work with a toy data set of just six observations.

data <- c(282,289,313,318,347,355)

Compute the quantiles of the observed data

A quantile represents a cutoff value such that a specified fraction of a batch of numbers that is less than that value. One quantile that you are familiar with is the median. It tells you a number such that 1/2 of batch of numbers is less than that value. Another example are quartiles. They tell you values such that 1/4, 1/2, and 3/4 of a batch of numbers are less than that value.

There are actually a number of algorithms for determing quantiles of a batch of numbers. Here I’ll implement a simple one. We start by calculating a number called the \(f\) value for each observation \(i\):

\[f_i = \frac{i - 0.5}{n}\] The f values ranges from 0 to 1, and the observation associated with f of 0.5 is the median.

#If what is happening in this code block is unclear, 
#try copy and pasting individiual parts to the console to see what happens,

num_obs <- length(data)

f_val <- (1:num_obs - 0.5)/num_obs

data_with_f <- tibble(data, f_val)
data_with_f

## # A tibble: 6 x 2
##    data  f_val
##   <dbl>  <dbl>
## 1   282 0.0833
## 2   289 0.25  
## 3   313 0.417 
## 4   318 0.583 
## 5   347 0.75  
## 6   355 0.917

Calculate the expected values from a normal distribution for these quantiles.

Now that we have our quantiles (i.e. the values of \(f\)), we can can find the values of a normal distribution that divides the distribution up into those same amounts. In this case, we want the values such that 0.167, 0.5, and 0.83 of the normal distribution is less than this value. You can do this with any normal distribution. R (and most other software) uses the standard normal. We can find these values using the function qnorm

qnorm(f_val)

## [1] -1.3829941 -0.6744898 -0.2104284  0.2104284  0.6744898  1.3829941

standard_norm_quants <- qnorm(f_val)

data_with_f_norm <- tibble(data_with_f, standard_norm_quants)
data_with_f_norm

## # A tibble: 6 x 3
##    data  f_val standard_norm_quants
##   <dbl>  <dbl>                <dbl>
## 1   282 0.0833               -1.38 
## 2   289 0.25                 -0.674
## 3   313 0.417                -0.210
## 4   318 0.583                 0.210
## 5   347 0.75                  0.674
## 6   355 0.917                 1.38

Make a q-q plot

Now that we know the values of the standard normal that has the same quantile our data, we can make a plot:

data_with_f_norm %>%
  ggplot(aes(x = standard_norm_quants, y = data)) +
  geom_point()

We can compare the plot above the “automatically” generated qqplot:

data_with_f_norm %>%
  ggplot(aes(sample = data)) +
  geom_point(stat = "qq")

For more information, see this detailed discussion by Manny Gimond: https://mgimond.github.io/ES218/Week05a.html#Quantile_plots

About quantile-quantile plots

Dan Stoebel

3/23/2021

Intro

Compute the quantiles of the observed data

Calculate the expected values from a normal distribution for these quantiles.

Make a q-q plot