Point Estimation using R’s built in car data

  • In this project, we are going to estimate the population mean \(\mu\) of miles-per-gallon.
  • R has a sample called ‘mtcars’ which contains 32 cars
  • Here is the formula for \(\hat{\mu}\).

\[\hat{\mu}\;=\;\frac{1}{n}\sum_{i=1}^{n}X_i\]

Visualization of dataset mtcars

  • Here’s the distribution of MPG for the 32 cars in the data set.(count = # of cars, dashed line marks the sample average)

How much can μ̂ vary?

  • Now lets do 32 cars 1000 times with replacement in the dataset(repeats to mimic sampling distribution) Each re-sample gives a new sample mean.(plotted below)

Graph analysis/ explanation

  • The first raw data (first plot) are the 32 values.
  • Their point estimate is the sample mean (20.1 MPG) which can be viewed as our initial guess for MPG of all cars.
  • A different sample of 32 cars would give a different average which is why we did a second graph of 1000 values.
  • The second plot gives us the standard error which shows us how accurate our first guess was. (explained further in next slide)

Calculating standard error

  • The goal of standard error is to turn the result of 1000 means into one number.

  • Standard error = standard deviation of those means. \[SE \;=\; \sqrt{\frac{1}{999}\sum_{j=1}^{1000}\bigl(\hat{\mu}_j-\overline{\hat{\mu}}\bigr)^2}\;\;\;=\;\;\; \boxed{\;1.0\ \text{MPG}\;}\]

  • The point estimate 20.1 MPG will miss the true average by roughly 1 MPG on average

MPG vs car weight

ggplot(mtcars,aes(wt,mpg))+geom_point()+geom_smooth(method="lm",se=F)
## `geom_smooth()` using formula = 'y ~ x'

MPG vs weight analysis

  • 20 MPG estimate is average of everything despite weight.

  • The downward trend in the scatter plot shows why the MPG varies so much. Heavier cars pull MPG average down and lighter cars pull the average up.

  • The key takeaway is that the 20.1 MPG average is still good, but doesnt reveal the whole story about car MPG.

Conclusion

  • Point estimate (in this context) is the average MPG across the 32 car models (20.1 MPG)
  • The standard error found from the re-sampling is 1 MPG
  • The spread exists because the weight differences in the cars affects MPG.