Definition of quantiles

Quantiles

Quantiles are cutoff points that divide a dataset into intervals with set probabilities. The 𝑞 th quantile is the value at which 𝑞 % of the observations are equal to or less than that value.

Using the quantile function Given a dataset data and desired quantile q, you can find the qth quantile of data with:

# quantile(data,q)

Percentiles

Percentiles are the quantiles that divide a dataset into 100 intervals each with 1% probability. You can determine all percentiles of a dataset data like this:

# p <- seq(0.01, 0.99, 0.01)
# quantile(data, p)

Quartiles

Quartiles divide a dataset into 4 parts each with 25% probability. They are equal to the 25th, 50th and 75th percentiles. The 25th percentile is also known as the 1st quartile, the 50th percentile is also known as the median, and the 75th percentile is also known as the 3rd quartile.

The summary() function returns the minimum, quartiles and maximum of a vector.

Examples

Load the heights dataset from the dslabs package:

library(dslabs)
data(heights)

Use summary() on the heights$height variable to find the quartiles:

summary(heights$height)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   50.00   66.00   68.50   68.32   71.00   82.68

Find the percentiles of heights$height:

p <- seq(0.01, 0.99, 0.01)
percentiles <- quantile(heights$height, p)

Confirm that the 25th and 75th percentiles match the 1st and 3rd quartiles. Note that quantile() returns a named vector. You can access the 25th and 75th percentiles like this (adapt the code for other percentile values):

percentiles[names(percentiles) == "25%"]
## 25% 
##  66
percentiles[names(percentiles) == "75%"]
## 75% 
##  71

Finding quantiles with qnorm

Definition of qnorm

The qnorm() function gives the theoretical value of a quantile with probability p of observing a value equal to or less than that quantile value given a normal distribution with mean mu and standard deviation sigma:

# qnorm(p, mu, sigma)

By default, mu=0 and sigma=1. Therefore, calling qnorm() with no arguments gives quantiles for the standard normal distribution.

qnorm(p)
##  [1] -2.32634787 -2.05374891 -1.88079361 -1.75068607 -1.64485363 -1.55477359
##  [7] -1.47579103 -1.40507156 -1.34075503 -1.28155157 -1.22652812 -1.17498679
## [13] -1.12639113 -1.08031934 -1.03643339 -0.99445788 -0.95416525 -0.91536509
## [19] -0.87789630 -0.84162123 -0.80642125 -0.77219321 -0.73884685 -0.70630256
## [25] -0.67448975 -0.64334541 -0.61281299 -0.58284151 -0.55338472 -0.52440051
## [31] -0.49585035 -0.46769880 -0.43991317 -0.41246313 -0.38532047 -0.35845879
## [37] -0.33185335 -0.30548079 -0.27931903 -0.25334710 -0.22754498 -0.20189348
## [43] -0.17637416 -0.15096922 -0.12566135 -0.10043372 -0.07526986 -0.05015358
## [49] -0.02506891  0.00000000  0.02506891  0.05015358  0.07526986  0.10043372
## [55]  0.12566135  0.15096922  0.17637416  0.20189348  0.22754498  0.25334710
## [61]  0.27931903  0.30548079  0.33185335  0.35845879  0.38532047  0.41246313
## [67]  0.43991317  0.46769880  0.49585035  0.52440051  0.55338472  0.58284151
## [73]  0.61281299  0.64334541  0.67448975  0.70630256  0.73884685  0.77219321
## [79]  0.80642125  0.84162123  0.87789630  0.91536509  0.95416525  0.99445788
## [85]  1.03643339  1.08031934  1.12639113  1.17498679  1.22652812  1.28155157
## [91]  1.34075503  1.40507156  1.47579103  1.55477359  1.64485363  1.75068607
## [97]  1.88079361  2.05374891  2.32634787

Recall that quantiles are defined such that p is the probability of a random observation less than or equal to the quantile.

Relation to pnorm

The pnorm() function gives the probability that a value from a standard normal distribution will be less than or equal to a z-score value z. Consider:

pnorm(-1.96) ≈0.025

The result of pnorm() is the quantile. Note that:

qnorm(0.025) ≈−1.96

qnorm() and pnorm() are inverse functions:

pnorm(qnorm(0.025)) =0.025

Theoretical quantiles

You can use qnorm() to determine the theoretical quantiles of a dataset: that is, the theoretical value of quantiles assuming that a dataset follows a normal distribution. Run the qnorm() function with the desired probabilities p, mean mu and standard deviation sigma.

Suppose male heights follow a normal distribution with a mean of 69 inches and standard deviation of 3 inches. The theoretical quantiles are:

p <- seq(0.01, 0.99, 0.01)
theoretical_quantiles <- qnorm(p, 69, 3)

Theoretical quantiles can be compared to sample quantiles determined with the quantile function in order to evaluate whether the sample follows a normal distribution.

Quantile-Quantile Plots

Key points

Code

# define x and z
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.3
## ✓ tidyr   1.0.0     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ── Conflicts ─────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(dslabs)
data(heights)
index <- heights$sex=="Male"
x <- heights$height[index]
z <- scale(x)

# proportion of data below 69.5
mean(x <= 69.5)
## [1] 0.5147783
# calculate observed and theoretical quantiles
p <- seq(0.05, 0.95, 0.05)
observed_quantiles <- quantile(x, p)
theoretical_quantiles <- qnorm(p, mean = mean(x), sd = sd(x))

# make QQ-plot
plot(theoretical_quantiles, observed_quantiles)
abline(0,1)

# make QQ-plot with scaled values
observed_quantiles <- quantile(z, p)
theoretical_quantiles <- qnorm(p)
plot(theoretical_quantiles, observed_quantiles)
abline(0,1)

Percentiles

Key points

Boxplots

Key points

Distribution of Female Heights

Key points