This presentation will use the built in car dataset. I will be doing an interval estimation for the cars dataset. I will also create at least two ggplots and one plotly plot to visualize the topic.
2025-03-23
1
This presentation will use the built in car dataset. I will be doing an interval estimation for the cars dataset. I will also create at least two ggplots and one plotly plot to visualize the topic.
library(ggplot2) library(ggdist) library(modelr) library(broom)
## ## Attaching package: 'broom'
## The following object is masked from 'package:modelr': ## ## bootstrap
library(dplyr)
## ## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats': ## ## filter, lag
## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union
library(plotly)
## ## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2': ## ## last_plot
## The following object is masked from 'package:stats': ## ## filter
## The following object is masked from 'package:graphics': ## ## layout
## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Visualizing the data and looking at the packages used throughout the presentation.
This graph shows that the more cylinders a car has the higher the horsepower for that car is. There are some outliers for the 6 cylinders and 8 cylinders, but this can be because of weight of the car or or engine type.
## `geom_smooth()` using formula = 'y ~ x'
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point(color = "steelblue", size = 3) +
geom_smooth(method = "lm", se = TRUE, color = "darkred") +
labs(title = "MPG vs Horsepower with Regression Line",
x = "Horsepower (hp)",
y = "Miles per Gallon (mpg)") +
theme_minimal()
## # A tibble: 2 × 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 30.1 1.63 18.4 6.64e-18 ## 2 hp -0.0682 0.0101 -6.74 1.79e- 7
## # A tibble: 1 × 12 ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 0.602 0.589 3.86 45.5 0.000000179 1 -87.6 181. 186. ## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
The stats show that the linear regression model is not the most correlated because the r squared value is 0.60.
The simple linear regression model is:
\[ \hat{\text{mpg}} = 30.099 - 0.068 \cdot \text{hp} \]
Where: - \(\hat{\text{mpg}}\): predicted miles per gallon
- \(\text{hp}\): horsepower
This plot shows the observed data fitted to an interactive plotly plot to see which data points are within the 95 percent confidence interval. It displays the strength of the linear regression model.
The confidence interval for 95 percent can be given as:
\[ \hat{y} \pm t_{(1 - \alpha/2, \, n - 2)} \cdot SE(\hat{y}) \] For the data set it is: \[ \hat{\text{mpg}} = \beta_0 + \beta_1 \cdot \text{hp} \]
\[ \text{CI}_{95\%} = \hat{\text{mpg}} \pm t^* \cdot SE(\hat{\text{mpg}}) \]
Thank you for making it to the end of the presentation!