2026-04-09

Some Setup, before Diving In

## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

First Glance at Data, via Plot

  • Here we see a scatter plot of various flipper lengths of penguins from our data set, organized by body mass.

Explain Process of Interval Estimation

We’ll be using R to find the interval estimate of a population mean where the variance is unknown. An interval estimate is necessary when we’ve found a point estimate of the population mean, but we want to know how accurate it is.

  • We’ll denote the 100(1-\(\alpha\)/2) percentile of our distribution with n-1 degrees of freedom as \(t_{\alpha/2}\). With a random sample of sufficiently large size, and calling our standard deviation s, the equation giving the end points of the interval estimate at a confidence level of (1-\(\alpha\)) is shown as: \[\bar{x}\pm t_{\alpha/2}\frac{s}{\sqrt{n}}\]

Process of Interval Estimation, Cont.d

Our n will be found via taking the length of our desired variable while omitting pesky NA values. Then, we will compute the sample standard deviation - which is not a challenge at all!

n <- length(df$flipper_len)
s <- sd(df$flipper_len)      # this is our sample standard deviation,
se <- s/sqrt(n)              # this is our standard error estimate,
xbar <- mean(df$flipper_len) # and this is our sample mean! :D

Second Glance at Data, via Plot

This is a slightly cleaner way to view the distribution of data shown in the first plot. I do love me a good histogram.

More Math required for Interval Estimation

So we have our n, our standard deviation s, and our estimate of standard error se. If we use a 95% confidence level, we’ll want to take the 97.5th percentile of the t-distribution at the upper tail. This means we should use the command qt(.975, df=n-1) for our \(t_{\alpha/2}\).

We can find our margin of error by multiplying this by our estimate of standard error se. Then, we do a little add/subtract with our sample mean, and we’ve found our interval of estimation for the true mean.

E <- qt(.975,df=n-1)*se  # this is our margin of error :D
xbar+c(-E,E)
## [1] 199.4561 202.4778

Thank you for reading! :)

I do enjoy a good box plot as a way to visualize the range of data, especially where the first and third quartiles are in the context of the rest of the data.