Part I : Univariate Distribution Graphics

We will now pay attention to some univariate distributions. For that we will need to load some home-made functions (built by Prof. Daniel Carr).

source("classFunction.r")
source("classEcdf.r")
source("classEda.r")
source("classDensity.r")
source("panelFunctions.r")
source("gridPlotFunctions.r")

Now we can start plotting a density for a Normal distribution with mean 100 and standard deviation 16:

p <- ppoints(300)  # 300   cumulative probabilities
q <- qnorm(p, mean = 100, sd = 16)
d <- dnorm(q, mean = 100, sd = 16)
classFunction(q, d, main = "Normal Distribution: Mean = 100, SD = 16", col = rgb(51, 
    255, 51, maxColorValue = 255), ylab = "Density", xlab = "Quantiles")

plot of chunk unnamed-chunk-2

EXPONENTIAL FAMILY

One way to think about the exponential distribution is waiting time to first Poisson event. This family has one parameter. In R it is call the rate.

First, let's plot an exponential with rate = 1 (no need to write “1” since it is the default value):

p <- ppoints(300)
q <- qexp(p)  # Find quantiles from cum. probs.
d <- dexp(q)  # Find densities for quantiles
classFunction(q, d, main = "Exponential Distribution: Rate = 1", col = rgb(255, 
    51, 255, maxColorValue = 255), ylab = "Density", xlab = "Quantiles")

plot of chunk unnamed-chunk-3

Now, let's plot an exponential with rate = 2:

p <- ppoints(300)
q <- qexp(p, rate = 2)  # Poison mean 2 events per unit time (mean time to first event = 1/2).
d <- dexp(q, rate = 2)
classFunction(q, d, main = "Exponential Distribution: Rate = 2 (Mean = 1/2)", 
    col = rgb(255, 51, 255, maxColorValue = 255), ylab = "Density", xlab = "Quantiles")

plot of chunk unnamed-chunk-4

GAMMA FAMILY

This family has two parameters and contains the exponential family. For instance, shape = 3 means waiting time to 3rd Poisson event (1 Gives the exponential family);so rate = 2 means Poisson mean (2 events per unit time). When shape parameter is an integer the distribution is sometimes call the Erlang distribution. The shape parameter does not have to be an integer.

p <- ppoints(300)
q <- qgamma(p, shape = 3, rate = 2)
d <- dgamma(q, shape = 3, rate = 2)

classFunction(q, d, main = "Gamma Distribution: Shape = 3, Rate = 2", col = rgb(0, 
    153, 0, maxColorValue = 255), ylab = "Density", xlab = "Quantiles")

plot of chunk unnamed-chunk-5

This does have different shape than the exponential distribution with rate = 2. Suppose the shape was 30. The distribution could view as the distribution of the sum of 30 independent identically distributed wait times from and exponential distribution. By the Central Limit Thereom the sum is going more like a normal distribution:

p <- ppoints(300)
q <- qgamma(p, shape = 30, rate = 2)
d <- dgamma(q, shape = 30, rate = 2)
classFunction(q, d, main = "Gamma Distribution: Shape 30, Rate = 2", col = rgb(0, 
    153, 0, maxColorValue = 255), ylab = "Density", xlab = "Quantiles")

plot of chunk unnamed-chunk-6

SOME EDA

set.seed(131)
x <- rgamma(100, shape = 30, rate = 2)
classEda(x, varLab1 = "Gamma Sample Size=100", varLab2 = "Shape=30, Rate =2", 
    varUnits = "Diet Plan: Months to Lose 20 Pounds", densityCol = "red")
## Loading required package: car
## Loading required package: MASS
## Loading required package: nnet

plot of chunk unnamed-chunk-7

CUMMULATIVE PROBABILITIES FOR A SAMPLE

We use the function ppoints ():

set.seed(151)
x <- rgamma(20, shape = 3, rate = 2)
prob <- ppoints(x)
quant <- sort(x)
gPlot(quant, prob, type = "l", lwd = 3, xlab = "Quantiles", ylab = "Cumulative Probability", 
    fillCol = rgb(255, 255, 204, maxColorValue = 255), gridCol = rgb(229, 255, 
        204, maxColorValue = 255), main = "Sample size=100, Gamma shape=30, Rate=2")

plot of chunk unnamed-chunk-8

THE EMPIRICAL CUMMULATIVE DISTRIBUTION

This has jumps of 1/n at the n quantile of the sample. This is good for theorems about asympototic distributions. The distribution is continuous from the right but from the left.


classEcdf(x)

plot of chunk unnamed-chunk-9

Q-Q PLOTS

We present a Q-Q plot with a reference line through the two bivariate points with the first and third quartiles.When a straight line is a good fit the point,the distributions are related by a linear transform (y = ax+b) and we say they belong to the same location and scale family of distribution. The value a changes the scale and the value b changes the location.
First, let's see a simple version:

set1 <- rgamma(70, shape = 3, rate = 2)
set2 <- rgamma(100, shape = 3, rate = 4)
ans <- qqplot(set1, set2, plot.it = F)  # we create it, but will not be shown!
# And now we add more detail and the reference line:
gPlot(ans$x, ans$y, xlab = "Rate = 2 sample", ylab = "Rate = 4 sample", main = "Q-Q plot")
p <- c(0.25, 0.75)
xQuartiles <- quantile(set1, p)
yQuartiles <- quantile(set2, p)
ans <- lm(yQuartiles ~ xQuartiles)  # fit a linear model through the two points
abline(ans$coef, col = "blue", lwd = 3)  # abline() use this to draw the line.

plot of chunk unnamed-chunk-10