Homework 3

2025-03-23

Introduction to Presentation

This presentation will use the built in car dataset. I will be doing an interval estimation for the cars dataset. I will also create at least two ggplots and one plotly plot to visualize the topic.

Slide with Bullets

Step 1: Loading data into R and exploring the data.
Step 2: Creating different graphs to see what would make a good simple regression model.
Step 3: Apply statistics to the data and generate graphs.

Slide with R Output

library(ggplot2)
library(ggdist)
library(modelr)
library(broom)

## 
## Attaching package: 'broom'

## The following object is masked from 'package:modelr':
## 
##     bootstrap

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(plotly)

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

Slide visualizing data set.

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Visualizing the data and looking at the packages used throughout the presentation.

Slide with cyl and hp plot

This graph shows that the more cylinders a car has the higher the horsepower for that car is. There are some outliers for the 6 cylinders and 8 cylinders, but this can be because of weight of the car or or engine type.

Plot for MPG vs HorsePower

## `geom_smooth()` using formula = 'y ~ x'

Slide for graph code

ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point(color = "steelblue", size = 3) +
  geom_smooth(method = "lm", se = TRUE, color = "darkred") +
  labs(title = "MPG vs Horsepower with Regression Line",
       x = "Horsepower (hp)",
       y = "Miles per Gallon (mpg)") +
  theme_minimal()

Summary of statistical data of the MPG vs Horsepower Graph

## # A tibble: 2 × 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)  30.1       1.63       18.4  6.64e-18
## 2 hp           -0.0682    0.0101     -6.74 1.79e- 7

## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic     p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>       <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.602         0.589  3.86      45.5 0.000000179     1  -87.6  181.  186.
## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

The stats show that the linear regression model is not the most correlated because the r squared value is 0.60.

Linear Regression Equation

The simple linear regression model is:

\[ \hat{\text{mpg}} = 30.099 - 0.068 \cdot \text{hp} \]

Where: - \(\hat{\text{mpg}}\): predicted miles per gallon
- \(\text{hp}\): horsepower

95 Percent Confidence Interval

This plot shows the observed data fitted to an interactive plotly plot to see which data points are within the 95 percent confidence interval. It displays the strength of the linear regression model.

Confidence interval equation

The confidence interval for 95 percent can be given as:

\[ \hat{y} \pm t_{(1 - \alpha/2, \, n - 2)} \cdot SE(\hat{y}) \] For the data set it is: \[ \hat{\text{mpg}} = \beta_0 + \beta_1 \cdot \text{hp} \]

\[ \text{CI}_{95\%} = \hat{\text{mpg}} \pm t^* \cdot SE(\hat{\text{mpg}}) \]

Thank You

Thank you for making it to the end of the presentation!