Week 2: Intervals, Testing & P values

Module 5: Variability

An important characterization of a population is how spread out it is. One of the key measures of spread is variability. We measure population variability with the sample variance, or more often we consider the square root of both, called the standard deviation. The reason for taking the standard deviation is because that measure has the same units as the population. So if our population is a length measurement in meters, the standard deviation is in meters (whereas the variance is in meters squared).

Variability has many important uses in statistics. First, the population variance is itself an intrinsically interesting quantity that we want to estimate. Secondly, variability in our estimates is what makes them not imprecise. An important aspect of statistics is quantifying the variability in our estimates.

The Variance: the variance of a random variable is a measure of its spread.

\[Var(X)=E[(X-\mu)^2]\]

Here is a nice shortcut of Expected X squared minus Expected value of X quantity squared.

\[E[X^2]-E[X]^2\]

The square root of the variance is called the standard deviation.

Summarizing what we know about variances

The sample variance estimates the population variance
The distribution of the sample variance is centered at what its estimating
It gets more concentrated around the population variance with larger sample sizes
the variance of the sample mean is the poulation variance divided by \(n\).
\(Var(\bar{X})=\sigma^2/n\) (this is standard error).

Module 6: Distributions

Some probability distributions are so important that we need to internalize their characteristics. In these lectures we cover the most important probability distributions.

The Bernouli Distribution

This distribution arises as the result of a binary outcome (coin flip). Bernouli random variables take (only) the values 1 and 0 with probabilities of (say) \(p\) and \(1-p\) respectiviely. Here is the Bernouli mass dsitribution: \[P(X=x)=p^x(1-p)^{1-x}\]

The mean of a Bernouli random variable is \(p\) and the variance is \(p(1-p)\).

The Binomial Distribution (trials)

In specific, let \(X_1,...,X_n\) be iid Bernoulli\((p)\); Then \(X=\sum^n_{i=1} \; X_1\) is a binomial random variable Here is the binomial mass function: \[P(X=x)=(^n_x)p^x(1-p)^{n-x}\]

Here is an example: If each gender has an independant 50% probability for each birth, what is the probability of getting 7 or more girls out of 8 births?

\[(^8_7)0.5^7(1-0.5)^1+(^8_8)0.5^8(1-0.5)^0\approx 0.04\]

choose(8, 7) * 0.5 ^ 8 + choose(8, 8) * 0.5 ^ 8

## [1] 0.03515625

pbinom(6, size=8, prob = 0.5, lower.tail = FALSE)

## [1] 0.03515625

The Normal Distribution

Gaussian distribution with mean \(\mu\) and variance \(\sigma^2\) \[(2\pi\sigma^2)^{-1/2}e^{-(x-\mu)^2/2\sigma^2}\]

When \(\mu=0\) and \(\sigma=1\) the result is the standard normal distribution. Standard normal varibles are often labeled \(z\).

mean=0; sd=1
x <- seq(-3,3,length=100)*sd + mean
hx <- dnorm(x,mean,sd)
plot(x, hx, type="n",xlab="Standard Deviations from Mean of Zero",
     ylab="Probability",
     main="Normal Distribution")
lines(x,hx)

1 standard deviation from the mean = 68% of mass
2 standard deviations from the mean = 95% of mass
3 standard deviations from the mean = 99% of mass

Facts about normal density

If \(X \sim N(\mu,\sigma^2)\) then \(Z=\frac{X-\mu}{\sigma} \sim N(0,1)\)
If \(Z\) is standard normal \(X=\mu + \sigma Z \sim N(\mu, \sigma^2)\).

Quantiles to Commit to Memory

10% mass below -1.28 stadard deviations from the mean (90% above)
by symmetry 1.28 standard deviations is the opposite of -1.28

Question: What is the \(95^{th}\) percentile of a \(N(\mu, \sigma^2)\) distribution?

mu<-0;sd<-1
round(qnorm(0.95, mean = mu, sd = sd),3)

## [1] 1.645

Here is the answer to our above question: \[\mu + \sigma \: 1.645\]

Question: What is the probability that a \(N(\mu, \sigma^2)\)RV is larger than \(x\)?

mu<-0; sigma<-1; x<-0.95
round(pnorm(x, mean = mu, sd = sigma, lower.tail = FALSE),3)

## [1] 0.171

Here is an easier way… just subtract the mu from x and divide by the standard deviation like this: \[\frac{x - \mu}{\sigma}\]

Assume that a number of ad clicks for a company is (approximately) normally distributed with a mean of 1020 and a standard deviation of 50. What’s the probability of getting more than 1,160 clicks in a day?

# This is not very likely
round(pnorm(1160, mean = 1020, sd = 50, lower.tail = FALSE),4)

## [1] 0.0026

Assume that the number of daily ad clicks for a company is (approximately) normally distributed with a mean of 1020 and a standard deviation of 50. What numbr of daily ad clicks would represent the one where 75% of days have fewer clicks (assuming days are independant and identically distributed)?

qnorm(0.75, mean = 1020, sd = 50)

## [1] 1053.724

The Poisson Distribution

Used to model counts

The first place for distribution models is normal distribution by a landslide, but poisson distribution will be ina strong second place.

The Poisson mass formula is as follows: \[P(X=x;\lambda)=\frac{\lambda^xe^{-\lambda}}{x!}\]

The mean of a possion distribution is the \(\lambda\) perameter. The variance of this distribution is also \(\lambda\). This tells us the mean and the variance have to be equal. Some uses for the Piosson distribution are as follows:

Modeling count data (especially is these counts are unbounded)
Modeling event-time or survival data
Modeling contingency tables
Approximating binomials when \(n\) is large and \(p\) is small
Poisson random variables are used to model rates
\(X \sim \: Poisson(\lambda t)\) where
\(\lambda = E[X/t]\) is the expected count per unit of time
\(t\) is the total monitoring time

Example: The number of people that show up at the bus stop is Poisson with a mass of \(2.5\) per hour. If watching the bus stop for \(4\) hours, what is the probability that \(3\) or fewer people show up for the whole time?

ppois(3, lambda = 2.5 * 4)

## [1] 0.01033605

Poisson Approximation to the Binomial

When \(n\) is large and \(p\) is small, the Poisson distribution is an accurate approximation to thee binomial distribution. The notation is as follows:

\(X \sim Binomial(n,p)\)
\(\lambda = np\)
\(n\) gets large
\(p\) gets small

Example: We flip a coin with success probability \(0.01\) five hundred times. What is the probability of 2 or fewer success?

pbinom(2, size = 500, prob = 0.01)

## [1] 0.1233858

ppois(2, lambda = 500 * 0.01)

## [1] 0.124652

This input shows you the large \(n\) of \(500\) and the small \(p\) of \(0.01\). Our output shows the binom function (output of 12.3%) and the poisson function (output of 12.5%) was very close.

Module 7: Asymptotics

Asymptotics are an important topics in statistics. Asymptotics refers to the behavior of estimators as the sample size goes to infinity. Our very notion of probability depends on the idea of asymptotics. For example, many people define probability as the proportion of times an event would occur in infinite repetitions. That is, the probability of a head on a coin is 50% because we believe that if we were to flip it infinitely many times, we would get exactly 50% heads.

We can use asymptotics to help is figure out things about distributions without knowing much about them to begin with. A profound idea along these lines is the Central Limit Theorem. It states that the distribution of averages is often normal, even if the distribution that the data is being sampled from is very non-normal. This helps us create robust strategies for creating statistical inferences when we’re not willing to assume much about the generating mechanism of our data.

Asymptotics and LNN

Definition: it is the term for the behavior of statistics as the sample size (or some other relevant quantity) limits to infinity (or some other relevant number).

Asymptotics are incredibly useful for simple statistical inference and approximations. It is like a statistical swiss army knife to investigate the statistical properties of many statistics without having to dom much computing. Asymptotics also for the basis for frequency interpretation of probabilities (the long run proportion of times an event occurs).

Limits of Random Variables

These results allow usto talk about the large sample distribution of sample means of a collection of \(iid\) observations. The first of these results, the Law of Large Numbers: says that the avaerage limits to what its estimating, the population mean. The Law of Large Numbers in Action

n <-1000
means1 <- cumsum(rnorm(n)) / (1:n)
plot(means1,type="l")

What you see when you plot the cumulative means by the indexes \((1:n)\) is there is a lot of variability early on; but as the number of simulation goes on, we get closer and closer to the true propulation value, which is zero. Let’s do this again, but instead we will flip a coin instead of generating standard normals.

Law of Large Numbers in Action: Coin Flip

means2 <- cumsum(sample(0:1, n, replace = TRUE)) / (1:n)
plot(means2, type="l",xlab="Number of Coin Flips")

Discussion

An estimator is consistent if it converges to what you want to estimate
The LLN says that the sample mean of iid samples is consistent for the population mean
Typically good estimators are consistent; it’s not too much to ask that if we go to the trouble of collecting an infinite amount of data that we get the right answer.
The sample variance and the sample standard deviation of iid random variables are consistent as well.

Asymptotics and the CLT

The Central Limit Thoerem

This is the most important theorem in all statistics. For our purposes, the CLT states that the distribution of averages of iid variables (properly normalized) becomes that of a standard normal as the sample size increases.

\[\frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} = \frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} = \frac{Estimate - Mean \: of \: Estimate}{Std. \: Err. \: of \: Estimate}\]

…has a distribution like that of a standard normal for a large \(n\).

Examples

First we will simulate a standard normal random variable by rolling \(n\) (six-sided).

Let \(X_i\) be the outcome for die \(i\)
Then note that \(\mu = E[X_i]=3.5\)
\(Var(X_i)=2.92\)
SE \(sqrt{2.92/n}=1.71 / /sqrt{n}\)
Let’s roll \(n\) dice, take their mean, subtract off 3.5, and divide by \(1.71 / /sqrt{n}\)
If this thoerem is right, our results will look like a standard normal bell curve

Asymptotics and Confidence Intervals

\(\bar{X}\), is approximately normal with mean \(\mu\) and sd \(\sigma / \sqrt{n}\)
Probability \(\bar{X}\) is bigger than \(\mu + 2 \sigma / \sqrt{n}\) or smaller than \(\mu - 2 \sigma / \sqrt{n}\) is 5%.
\(\bar{X} \pm 2 \sigma / \sqrt{n}\) is called the 95% interval for \(\mu\)

Example

library(UsingR)

## Loading required package: MASS

## Loading required package: HistData

## Loading required package: Hmisc

## Loading required package: lattice

## Loading required package: survival

## Loading required package: Formula

## Loading required package: ggplot2

## 
## Attaching package: 'Hmisc'

## The following objects are masked from 'package:base':
## 
##     format.pval, units

## 
## Attaching package: 'UsingR'

## The following object is masked from 'package:survival':
## 
##     cancer

data(father.son)
x<-father.son$sheight
(mean(x) + c(-1,1) * qnorm(0.975) * sd(x) / sqrt(length(x))) / 12

## [1] 5.709670 5.737674

In this example we take the mean of x plus or minus the 0.975 normal quantile tims the standard error of the mean (which is the standard deviation of x divided by thhe square root of n (which is the length of the vector x)). It was divisible by 12 so that the confidence interval will be in feet rather than inches. The output or confidence interval 5.710 to 5.738. So if we were to assume that the sons from this data were an iid draw were from a popluation of interest, then the confidence interval of the average height of the sons would be 5.71 to 5.74.

Breif Summary

The LLN states that averages of iid samples converge to the population means that are estimating.
The CLT states that averages are approximately normal:
with distributions centered at the population mean and
with standard deviation equal to the standard error of the mean
The CLT gives no guarantee that n is large enough
Taking the mean and adding and subtracting the relevant normal quantile times the SE yeilds a confidence interval for the mean.
Adding and subtracting 2 SEs works for 95% intervals
Confidence intervals get wider as the coverage increases (why?)
The Poisson and binomial case have exact intervals that do not require CLT
But a quick fix for small sample size binomial calculations is to add 2 successes and failures.