Basic Probability Distributions in R

R comes with built-in implementations of many probability distributions. This document will show how to generate these distributions in R by focusing on making plots, and so give the reader an intuitive feel for what all the different R functions are actually calculating.

Each probability distribution in R is associated with four functions which follow a naming convention: the probability density function always begins with ‘d’, the cumulative distribution function always begins with ‘p’, the inverse cumulative distrobution (or quantile function) always beings with ‘q’, and a function that produces random variables always begins with ‘r’. Each function takes a single argument at which to evaluate the function followed by specific parameters that define the particular distribution function to evaluate.

In the following we will demonstrate usage for the density, disribution and quantile functions only. Each demonstration will include plots and simple examples.

Table 1: Common Probability Distribution Functions in R

Name	Probability Density	Cumulative Distribution	Quantile
Normal	dnorm(Z,mean,sd)	pnorm(Z,mean,sd)	qnorm(Q,mean,sd)
Poisson	dnorm(N,lambda)	pnorm(N,lambda)	qnorm(Q,lambda)
Binomial	dbinom(N,size,prob)	pbinom(N,size,prob)	qbinom(Q,size,prob)
Exponential	dexp(N,rate)	pexp(N,rate)	qexp(Q,rate)
\(\chi^2\)	dchisq(X,df)	pchisq(X.df)	qchisq(X,df)

Normal Distribution

The normal distribution N(\(\mu,\sigma\)) is represented R by dnorm, pnorm, and qnorm, where \(\mu\) is the mean and \(\sigma\) is the standard deviation. The probability density dnorm and cumulative distribution pnorm are defined on the entire real axis.

For the example, we will use the standard normal distribution, given by \(\mu=0\) and \(\sigma=1\). The density is very small outside of the interval (-3.5,3.5), so we will restrict the plots to this domain.

  z<-seq(-3.5,3.5,0.1)  # 71 points from -3.5 to 3.5 in 0.1 steps
  q<-seq(0.001,0.999,0.001)  # 1999 points from 0.1% to 99.9% on 0.1% steps
  dStandardNormal <- data.frame(Z=z, 
                               Density=dnorm(z, mean=0, sd=1),
                               Distribution=pnorm(z, mean=0, sd=1))  
  qStandardNormal <- data.frame(Q=q, 
                                Quantile=qnorm(q, mean=0, sd=1))  
  head(dStandardNormal)

##      Z      Density Distribution
## 1 -3.5 0.0008726827 0.0002326291
## 2 -3.4 0.0012322192 0.0003369293
## 3 -3.3 0.0017225689 0.0004834241
## 4 -3.2 0.0023840882 0.0006871379
## 5 -3.1 0.0032668191 0.0009676032
## 6 -3.0 0.0044318484 0.0013498980

  head(qStandardNormal)

##       Q  Quantile
## 1 0.001 -3.090232
## 2 0.002 -2.878162
## 3 0.003 -2.747781
## 4 0.004 -2.652070
## 5 0.005 -2.575829
## 6 0.006 -2.512144

Figure 1: Standard Normal Distribution Functions

Examples

Q: Assume a random variable Z is distributed according to the normal distribution with mean 6 and standard deviation 4. What is the probability that Z takes on a value between -1 and 3 ?

# A: subtract the c.d. at -1 from the c.d. at 3
pnorm(3, 6, 4) - pnorm(-1, 6, 4)

## [1] 0.1865682

Q: Assume a random variable Z is distributed according to the normal distribution with mean 20 and standard deviation 10. What is the 90% confidence interval around the mean for the expected value of Z?

# A: Use the quantile function
upper <- qnorm(0.95, 20, 10)
lower <- qnorm(0.05, 20, 10)
c(lower, upper)

## [1]  3.551464 36.448536

Poisson Distribution

The Poisson distribution f(\(\lambda\)) is represented R by dpois, ppois, and qpois. The probability density dpois and cumulative distribution ppois are defined on non-negative integers.

For the example, we’ll use \(\lambda = 2.5\). To figure out a good range for plotting, we will use the qpois function to find out for a given mean, what is the least integer that bounds the cumulative Poisson distribution above 99.9%, and what is the greatest integer that bounds below at 0.1%.

  lower<-qpois(0.001, lambda=2.5)
  upper<-qpois(0.999, lambda=2,5)
  n<-seq(lower,upper,1)
  q<-seq(0.001,0.999,0.001)
  dPoisson25 <- data.frame(N=n, 
                          Density=dpois(n, lambda=2.5),
                          Distribution=ppois(n, lambda=2.5))  
  qPoisson25 <- data.frame(Q=q, Quantile=qpois(q, lambda=2.5))  
  head(dPoisson25)

##   N    Density Distribution
## 1 0 0.08208500    0.0820850
## 2 1 0.20521250    0.2872975
## 3 2 0.25651562    0.5438131
## 4 3 0.21376302    0.7575761
## 5 4 0.13360189    0.8911780
## 6 5 0.06680094    0.9579790

  head(qPoisson25)

##       Q Quantile
## 1 0.001        0
## 2 0.002        0
## 3 0.003        0
## 4 0.004        0
## 5 0.005        0
## 6 0.006        0

Figure 2: Poisson Distribution Functions, \(\lambda\) = 2.5

Examples

Q: Assume a a ball from the driving range next door lands in your yard at an average rate of 3 balls per hour during the day. What is the probability that 10 or fewer golf balls will land in your yard during the afternoon, assuming the afternoon is 5 hours long?

# A: mean is 15 = 3 * 5 for the entire afternoon
ppois(10, 15)

## [1] 0.1184644

Q: If a bird flies overhead at an average rate of 1 every 4 hours, what is the probability that at least one bird will fly overhead in the next hour?

# A: The mean is 0.25 birds per hour.  Subtract of the case that no birds 
#    will fly over
1 - ppois(0, 0.25)

## [1] 0.2211992

Binomial Distribution

The Binomial distribution f(n,p) is represented R by dbinom, pbinom, and qbinom. In the formula, n is the number of trials of some random process that can take on one of two discrete values, say 1 for success and 0 for failure, and p is the probability of success for each trial. The probability density dbinom and cumulative distribution pbinom are defined on the non-negative integers up to and including n.

For the example, we’ll look at n=100 and p=0.5, like 100 coin flips. To figure out a good range for plotting, we will use the qbinom function to find out for a given n and p, what is the least integer that bounds the cumulative Binomial distribution above 99.9%, and what is the greatest integer that bounds below at 0.1%.

  lower<-qbinom(0.001, size=100, prob=0.5)
  upper<-qbinom(0.999, size=100, prob=0.5)
  n<-seq(lower,upper,1)
  q<-seq(0.001,0.999,0.001)
  dBinom100 <- data.frame(N=n, 
                          Density=dbinom(n, size=100, prob=0.5),
                          Distribution=pbinom(n, size=100, prob=0.5))  
  qBinom100 <- data.frame(Q=q, Quantile=qbinom(q, size=100, prob=0.5))  
  head(dBinom100)

##    N      Density Distribution
## 1 35 0.0008638557  0.001758821
## 2 36 0.0015597394  0.003318560
## 3 37 0.0026979276  0.006016488
## 4 38 0.0044728800  0.010489368
## 5 39 0.0071107323  0.017600100
## 6 40 0.0108438667  0.028443967

  head(qBinom100)

##       Q Quantile
## 1 0.001       35
## 2 0.002       36
## 3 0.003       36
## 4 0.004       37
## 5 0.005       37
## 6 0.006       37

Figure 3: Binomial Distribution Functions, N=100, p=0.5

Examples

Q: Assume a coin is weighted so that it comes up heads 60% of the time. What is the prbability that you will obtain 25 or more heads after 50 flips?

# A: Use pbinom to get the probability of 25 or less heads, and subtract from 1 
1 - pbinom(25,50,0.6)

## [1] 0.9021926

Q: Assume a standard die is rolled 10 times. What it the probability that you will roll fewer than 5 sixes?

# A: Use pbinom to sum the cases 0, 1, 2, 3, and 4 and subtract from 1.
pbinom(4, 10, 0.16)

## [1] 0.9869899

Q: Assume you flip a fair coin 100 times. What is the number N that, 90% of the time, the number of heads is less than or equal to N?

# A: Use the quantile function.  
qbinom(0.9, 100, 0.5)

## [1] 56

Exponential Distribution

The Exponential distribution f(r) is represented R by dexp, pexp, and qexp. In the formula, r ia the decay rate of the exponential.The probability density dexp and cumulative distribution pexp are defined on the non-negative reals.

For the example, we’ll use r=0.2. To figure out a good range for plotting, we will use the qexp function to find out for a given rate, what is the least number that bounds the cumulative Exponential distribution above 99.9%, and what is the greatest integer that bounds below at 0.1%.

  lower <- floor(qexp(0.001, rate=0.2))
  upper <- ceiling(qexp(0.999, rate=0.2))
  t <- seq(lower,upper,0.1)
  q <- seq(0.001,0.999,0.001)
  dexp02 <- data.frame(T=t, 
                        Density=dexp(t, rate=0.2),
                        Distribution=pexp(t, rate=0.2))  
  qexp02 <- data.frame(Q=q, Quantile=qexp(q, rate=0.2))  
  head(dexp02)

##     T   Density Distribution
## 1 0.0 0.2000000   0.00000000
## 2 0.1 0.1960397   0.01980133
## 3 0.2 0.1921579   0.03921056
## 4 0.3 0.1883529   0.05823547
## 5 0.4 0.1846233   0.07688365
## 6 0.5 0.1809675   0.09516258

  head(qexp02)

##       Q    Quantile
## 1 0.001 0.005002502
## 2 0.002 0.010010013
## 3 0.003 0.015022545
## 4 0.004 0.020040107
## 5 0.005 0.025062709
## 6 0.006 0.030090362

Figure 4: Exponential Distribution Functions, r=0.2

Examples

Q: Assume the lifetime of a metastable nuclear isomer \(\zeta\) is exponentially distributed with a mean lifetime of 20 minutes. What is the probabilty that a \(\zeta\) nucleus will decay within the next 15 minutes?

# A: The rate is 1/20 = 0.05.  Use pexp to get the probability of decay
#    in 15 or less minutes 
pexp(15, 0.05)

## [1] 0.5276334

Q: Assume that a light bulb has a mean lifetime of 1000 hours. What is the probability that the light bulb survives to 2000 hours?

# A: The rate is 1/1000 = 0.001.  Use pexp to get the probability of burnout within
#    2000 hours, and subtract from 1.
1 - pexp(2000, 0.001)

## [1] 0.1353353

\(\chi^2\) Distribution

The \(\chi^2\)(df) distribution is represented R by dchisq, pchisq, and qchisq. In the formula, df is the number of degrees of freedom. The probability density dchisq and cumulative distribution pchisq are defined on the non-negative reals.

For the example, we’ll use df=10. To figure out a good range for plotting, we will use the qchisq function to find out for a given rate, what is the least number that bounds the cumulative \(\chi^2\) distribution above 99.9%, and what is the greatest integer that bounds below at 0.1%.

  lower <- floor(qchisq(0.001, df=10))
  upper <- ceiling(qchisq(0.999, df=10))
  x <- seq(lower,upper,0.1)
  q <- seq(0.001,0.999,0.001)
  dchisq10 <- data.frame(X=x, 
                        Density=dchisq(x, df=10),
                        Distribution=pchisq(x, df=10))  
  qchisq10 <- data.frame(Q=q, Quantile=qchisq(q, df=10))  
  head(dchisq10)

##     X      Density Distribution
## 1 1.0 0.0007897535 0.0001721156
## 2 1.1 0.0010998857 0.0002660262
## 3 1.2 0.0014817914 0.0003944860
## 4 1.3 0.0019414257 0.0005649763
## 5 1.4 0.0024839611 0.0007855354
## 6 1.5 0.0031137444 0.0010646778

  head(qchisq10)

##       Q Quantile
## 1 0.001 1.478743
## 2 0.002 1.734460
## 3 0.003 1.907676
## 4 0.004 2.042980
## 5 0.005 2.155856
## 6 0.006 2.253695

Basic Probability Distributions in R

Greg Graham

Sunday, August 16, 2015