## Updating HTML index of packages in '.Library'
## Making 'packages.html' ... done
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

Introduction

A confidence interval describes an interval about the sample mean in which there is a certain confidence that the actual population mean lies within that interval. For example, if we have a sample of the population, we can identify a 95% confidence interval. This means that there is a 95% chance that the interval contains the true population mean.

Definition

A confidence interval is given by:

\(\bar{x} \pm Z\frac{s}{\sqrt{n}}\)

where

  • \(\bar{x}\) = sample mean
  • Z = confidence level value
  • s = sample standard deviation
  • n = sample size

Confidence Intervals from Different Samples

We can plot how the confidence intervals can change given different samples from the same population, and compare them to the population mean.

library(ggplot2)

z = rnorm(10000, 0, 2.5)
df=sample(z, 200)
i=1
while (i < 10) {
  df = cbind(df, sample(z,200))
  i = i + 1
}
xbarSD = 1.96*2.5 / sqrt(200)
a = colMeans(df)
df2 = data.frame(grp=1:10, fit=a, se=xbarSD)

Confidence Intervals for Different Samples

plot_ly(data=df2, x=~grp, y=~fit, type='scatter', mode='markers',
        error_y = ~list(type="data", array=rep(xbarSD, 10)))

Distribution of sample means

If we have \(x \sim \mathcal{N}(\mu, \sigma^{2})\) then \(\bar{x} \sim \mathcal{N}(\mu, \frac{\sigma^{2}}{n})\)

Example with \(\mu = 0\) and \(\sigma = 2.5\)

Confidence Intervals on a Linear Plot

We can use a data set to find confidence intervals on a linear fit line.

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Confidence Intervals on a Linear Plot

For each x-value, our 95% confidence interval for the expected value of y signifies that 95% of the samples taken from the population will produce an interval that contains the actual value of y. Increasing the confidence level will produce a larger interval.

g + geom_smooth(method="lm", level=0.95)
## `geom_smooth()` using formula 'y ~ x'