An important characterization of a population is how spread out it is. One of the key measures of spread is variability. We measure population variability with the sample variance, or more often we consider the square root of both, called the standard deviation. The reason for taking the standard deviation is because that measure has the same units as the population. So if our population is a length measurement in meters, the standard deviation is in meters (whereas the variance is in meters squared).
Variability has many important uses in statistics. First, the population variance is itself an intrinsically interesting quantity that we want to estimate. Secondly, variability in our estimates is what makes them not imprecise. An important aspect of statistics is quantifying the variability in our estimates.
The Variance: the variance of a random variable is a measure of its spread.
\[Var(X)=E[(X-\mu)^2]\]
Here is a nice shortcut of Expected X squared minus Expected value of X quantity squared.
\[E[X^2]-E[X]^2\]
The square root of the variance is called the standard deviation.
Summarizing what we know about variances
Some probability distributions are so important that we need to internalize their characteristics. In these lectures we cover the most important probability distributions.
This distribution arises as the result of a binary outcome (coin flip). Bernouli random variables take (only) the values 1 and 0 with probabilities of (say) \(p\) and \(1-p\) respectiviely. Here is the Bernouli mass dsitribution: \[P(X=x)=p^x(1-p)^{1-x}\]
The mean of a Bernouli random variable is \(p\) and the variance is \(p(1-p)\).
The Binomial Distribution (trials)
In specific, let \(X_1,...,X_n\) be iid Bernoulli\((p)\); Then \(X=\sum^n_{i=1} \; X_1\) is a binomial random variable Here is the binomial mass function: \[P(X=x)=(^n_x)p^x(1-p)^{n-x}\]
Here is an example: If each gender has an independant 50% probability for each birth, what is the probability of getting 7 or more girls out of 8 births?
\[(^8_7)0.5^7(1-0.5)^1+(^8_8)0.5^8(1-0.5)^0\approx 0.04\]
choose(8, 7) * 0.5 ^ 8 + choose(8, 8) * 0.5 ^ 8
## [1] 0.03515625
pbinom(6, size=8, prob = 0.5, lower.tail = FALSE)
## [1] 0.03515625
Gaussian distribution with mean \(\mu\) and variance \(\sigma^2\) \[(2\pi\sigma^2)^{-1/2}e^{-(x-\mu)^2/2\sigma^2}\]
When \(\mu=0\) and \(\sigma=1\) the result is the standard normal distribution. Standard normal varibles are often labeled \(z\).
mean=0; sd=1
x <- seq(-3,3,length=100)*sd + mean
hx <- dnorm(x,mean,sd)
plot(x, hx, type="n",xlab="Standard Deviations from Mean of Zero",
ylab="Probability",
main="Normal Distribution")
lines(x,hx)
Facts about normal density
Quantiles to Commit to Memory
Question: What is the \(95^{th}\) percentile of a \(N(\mu, \sigma^2)\) distribution?
mu<-0;sd<-1
round(qnorm(0.95, mean = mu, sd = sd),3)
## [1] 1.645
Here is the answer to our above question: \[\mu + \sigma \: 1.645\]
Question: What is the probability that a \(N(\mu, \sigma^2)\)RV is larger than \(x\)?
mu<-0; sigma<-1; x<-0.95
round(pnorm(x, mean = mu, sd = sigma, lower.tail = FALSE),3)
## [1] 0.171
Here is an easier way… just subtract the mu from x and divide by the standard deviation like this: \[\frac{x - \mu}{\sigma}\]
Assume that a number of ad clicks for a company is (approximately) normally distributed with a mean of 1020 and a standard deviation of 50. What’s the probability of getting more than 1,160 clicks in a day?
# This is not very likely
round(pnorm(1160, mean = 1020, sd = 50, lower.tail = FALSE),4)
## [1] 0.0026
Assume that the number of daily ad clicks for a company is (approximately) normally distributed with a mean of 1020 and a standard deviation of 50. What numbr of daily ad clicks would represent the one where 75% of days have fewer clicks (assuming days are independant and identically distributed)?
qnorm(0.75, mean = 1020, sd = 50)
## [1] 1053.724
Used to model counts
The first place for distribution models is normal distribution by a landslide, but poisson distribution will be ina strong second place.
The Poisson mass formula is as follows: \[P(X=x;\lambda)=\frac{\lambda^xe^{-\lambda}}{x!}\]
The mean of a possion distribution is the \(\lambda\) perameter. The variance of this distribution is also \(\lambda\). This tells us the mean and the variance have to be equal. Some uses for the Piosson distribution are as follows:
Example: The number of people that show up at the bus stop is Poisson with a mass of \(2.5\) per hour. If watching the bus stop for \(4\) hours, what is the probability that \(3\) or fewer people show up for the whole time?
ppois(3, lambda = 2.5 * 4)
## [1] 0.01033605
Poisson Approximation to the Binomial
When \(n\) is large and \(p\) is small, the Poisson distribution is an accurate approximation to thee binomial distribution. The notation is as follows:
Example: We flip a coin with success probability \(0.01\) five hundred times. What is the probability of 2 or fewer success?
pbinom(2, size = 500, prob = 0.01)
## [1] 0.1233858
ppois(2, lambda = 500 * 0.01)
## [1] 0.124652
This input shows you the large \(n\) of \(500\) and the small \(p\) of \(0.01\). Our output shows the binom function (output of 12.3%) and the poisson function (output of 12.5%) was very close.
Asymptotics are an important topics in statistics. Asymptotics refers to the behavior of estimators as the sample size goes to infinity. Our very notion of probability depends on the idea of asymptotics. For example, many people define probability as the proportion of times an event would occur in infinite repetitions. That is, the probability of a head on a coin is 50% because we believe that if we were to flip it infinitely many times, we would get exactly 50% heads.
We can use asymptotics to help is figure out things about distributions without knowing much about them to begin with. A profound idea along these lines is the Central Limit Theorem. It states that the distribution of averages is often normal, even if the distribution that the data is being sampled from is very non-normal. This helps us create robust strategies for creating statistical inferences when we’re not willing to assume much about the generating mechanism of our data.
Definition: it is the term for the behavior of statistics as the sample size (or some other relevant quantity) limits to infinity (or some other relevant number).
Asymptotics are incredibly useful for simple statistical inference and approximations. It is like a statistical swiss army knife to investigate the statistical properties of many statistics without having to dom much computing. Asymptotics also for the basis for frequency interpretation of probabilities (the long run proportion of times an event occurs).
Limits of Random Variables
These results allow usto talk about the large sample distribution of sample means of a collection of \(iid\) observations. The first of these results, the Law of Large Numbers: says that the avaerage limits to what its estimating, the population mean. The Law of Large Numbers in Action
n <-1000
means1 <- cumsum(rnorm(n)) / (1:n)
plot(means1,type="l")
What you see when you plot the cumulative means by the indexes \((1:n)\) is there is a lot of variability early on; but as the number of simulation goes on, we get closer and closer to the true propulation value, which is zero. Let’s do this again, but instead we will flip a coin instead of generating standard normals.
Law of Large Numbers in Action: Coin Flip
means2 <- cumsum(sample(0:1, n, replace = TRUE)) / (1:n)
plot(means2, type="l",xlab="Number of Coin Flips")
Discussion
The Central Limit Thoerem
This is the most important theorem in all statistics. For our purposes, the CLT states that the distribution of averages of iid variables (properly normalized) becomes that of a standard normal as the sample size increases.
\[\frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} = \frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} = \frac{Estimate - Mean \: of \: Estimate}{Std. \: Err. \: of \: Estimate}\]
…has a distribution like that of a standard normal for a large \(n\).
Examples
First we will simulate a standard normal random variable by rolling \(n\) (six-sided).
Example
library(UsingR)
## Loading required package: MASS
## Loading required package: HistData
## Loading required package: Hmisc
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
##
## Attaching package: 'UsingR'
## The following object is masked from 'package:survival':
##
## cancer
data(father.son)
x<-father.son$sheight
(mean(x) + c(-1,1) * qnorm(0.975) * sd(x) / sqrt(length(x))) / 12
## [1] 5.709670 5.737674
In this example we take the mean of x plus or minus the 0.975 normal quantile tims the standard error of the mean (which is the standard deviation of x divided by thhe square root of n (which is the length of the vector x)). It was divisible by 12 so that the confidence interval will be in feet rather than inches. The output or confidence interval 5.710 to 5.738. So if we were to assume that the sons from this data were an iid draw were from a popluation of interest, then the confidence interval of the average height of the sons would be 5.71 to 5.74.