Learning Objectives

Students will learn about …

  • Useful distributions for regression including
    • Normal Distribution
    • Chi-squared Distribution
    • T-Distribution
    • F-Distribution

Normal Distribution

The normal distribution is the most popular/common distribution because it has really nice theoretical properties. We will explore this more in the following weeks as we prove properties of our regression estimators and build up to inference.

A normal random variable, \(X \sim N(\mu, \sigma^2)\) has the probability density function

\[f(x)=\frac{1}{\sqrt{2 \pi \sigma^2}}e^{-\frac{1}{2\sigma^2}(x-\mu)^2}, -\infty<x<\infty\]

Standard Normal Distribution

Let a random variable \(X\) be normally distributed with mean \(\mu\) and variance \(\sigma^2\). Then \(Z=\frac{X-\mu}{\sigma}\) has a Standard Normal distribution.

\[Z=\frac{X-\mu}{\sigma} \sim N(\mu=0, \sigma^2=1)\] This process is known as standardizing.

R Example:

Step 1: Generate

## GENERATE 500 FROM NORMAL
## MEAN = 9 
## SD = 2 --> VAR = 4
x<-rnorm(500, mean=9, sd=2)

Step 2: Visualize and Summary Statistics

## HISTOGRAM
hist(x)

## SUMMARY STATS
mean(x)
## [1] 9.003504
sd(x)
## [1] 2.080578

Step 3: Standardize

## STANDARDIZE
z<-(x-mean(x))/sd(x)
hist(z)

BONUS:

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
xz_df<-data.frame(rands=c(x, z),
                  vars=c(rep("X", 500), rep("Z", 500)))

ggplot(xz_df, aes(rands, fill=vars))+
  geom_density()+
  facet_grid(.~vars, scale="free_y")

Chi-Squared

Let a random \(Z\) have a standard normal distribution. Then \(Y=Z^2\) has a chi-squared distribution with 1 degree of freedom.

Step 1: Square Z

## USE Z FROM ABOVE
z2<-z^2

Step 2: Generate from Chi-Sqr

## CHISQR
c<-rchisq(n=500, df=1)

Step 3: Compare

library(tidyverse)
z2c_df<-data.frame(rands=c(z2, c),
                  vars=c(rep("Z2", 500), rep("C", 500)))

ggplot(z2c_df, aes(rands, fill=vars))+
  geom_density()+
  facet_grid(.~vars, scale="free_y")

Student’s T-Distribution

Let \(Z\) be a standard normal random variable. Let \(V\) be a \(\chi^2(df=\nu)\) random variable. If \(Z\) and \(V\) are independent then,

\[T=\frac{Z}{\sqrt{V/\nu}} \sim t_{df=\nu}\]

Step 1: Create T

### USE Z AND C FROM ABOVE
t_ratio<-z/sqrt(c/1)

Step 2: Generate from T

## T with df=1
t1<-rt(n=500, df=1)

Step 3: Compare

library(tidyverse)
t_df<-data.frame(rands=c(t_ratio, t1),
                  vars=c(rep("T Ratio", 500), rep("T df=1", 500)))

ggplot(t_df, aes(rands, fill=vars))+
  geom_density()+
  facet_grid(vars~., scale="free_y")

F-Distribution

Let \(U\) be a \(\chi^2(\nu_1)\) random variable. Let \(V\) be a \(\chi^2(\nu_2)\) random variable. If \(U\) and \(V\) are independent then

\[F=\frac{U/\nu_1}{V/\nu_2} \sim F_{\nu_1, \nu_2}\]

Step 1: Generate U and V

## GENERATE U and V
u <-rchisq(n=500, df=2)
v <-rchisq(n=500, df=3)

Step 2: Define F

## Create F
f23<-(u/2)/(v/3)

Step 3: Generate F

## GENERATE F
f<-rf(n=500, df1=2, df2=3)

Step 4: Compare

library(tidyverse)
f_df<-data.frame(rands=c(f23, f),
                  vars=c(rep("F Ratio", 500), rep("F df=(2, 3)", 500)))

ggplot(f_df, aes(rands, fill=vars))+
  geom_density()+
  facet_grid(vars~., scale="free_y")