Introduction to Regression Modeling in R
2025-02-05
HW 4 is due 2/12/2025
Review of Simple Linear Regression
Function vs. Model
Examining Real Data
Creating a Model
Interpreting an Regression Model
In-class Polling (Session ID: bua345s25)
Many people think that the best movies come at the end of the year, but there are always summer blockbuster movies too.
Based on this scatterplot created from 2024 data, do you think there is a linear correlation between time of year and the daily gross from top 10 movies?
In this course we will use R and RStudio for the predictive analytics lectures.
You will access R and RStudio through Posit Cloud.
I will post R/RStudio files on Posit Cloud that you can access in provided links.
I will also provide demo videos that show how to access files and complete exercises.
NOTE: The free Posit Cloud account is limited to 25 hours per month.
We will also use Posit cloud for quiz questions of predictive analytics skills.
For those who want to download R and RStudio (not required):
\[y = mx + b\]
m is the slope of the line
b is the y-intercept
Examples:
Positive slope: \(y = 2x + 3\)
Negative slope: \(y = -3x + 7\)
Notice the Y axis is each plot.
Favorite Quote attributed to George Box:
“All models are wrong, but some are useful.”
Common student query:
If all models are wrong, why do we bother modeling?
Models are considered ‘wrong’ because they simplify the ‘messiness’ of the real world to a mathematical relationship.
Models can’t (and shouldn’t) include all the noise of real world data
The following is an example of a recipe for Russian Tea Cakes
To make Russian Tea Cake Cookies, you need 6 tablespoons of powdered sugar to make 3 dozen cookies.
Here is the full recipe.
Here is the equation (y-intercept = 0):
\(y = 6x\)
Is this a function or a model?
Star Wars Character Data Example
True Population Model
\[y_{i} = \beta_{0} + \beta_{1}x_{i} + e_{i}\]
\(\beta_{0}\) is the y-intercept
\(\beta_{1}\) is the slope
\(e\) is the unexplained variability in Y
Estimated Sample Data Model
\[\hat{y} = b_{0} + b_{1}x\]
\(\hat{y}\) is model estimate of y from x
\(b_{0}\) is model estimate of y-intercept
\(b_{1}\) is model estimate of slope
\[\hat{y} = b_{0} + b_{1}x\]
\(\hat{y}\) is regression est. of y
\(b_{0}\) is value of y when X = 0
\(b_{1}\) is change in y due to 1 unit change in x.
NOTE:
Model is only valid for the range of X values used to estimate it.
Using a model to outside of this range is extrapolation.
Regression Model:
\[\hat{y} = 33.8641 - 0.022417x\]
Question 4. Based on this model, if Horsepower (x) is increased by 1, what is the change in Highway MPG?
Simple linear regression (SLR) models are similar in format to the function of line.
The interpretation is very different because SLR models are simplification of the real world.
Box said “All models are wrong, but some are useful”
This refers to the inherent simplication of modeling that leaves out the noise of the real world.
Despite this simplfication, models provide valuable insight.
A model is only valid for the range data used to create it.
HW 4 is due 2/12/2025
To submit an Engagement Question or Comment about material from Lecture 8: Submit it by midnight today (day of lecture).