Understanding Linear Regression Analysis

Introduction to Linear Regression

A Practical Guide to Predictive Modeling

Statistical Methods for Data Analysis

What is Linear Regression Anyway?

Linear regression helps us answer questions like:

“If I study one more hour, how much will my grade improve?”
“How does temperature affect ice cream sales?”
“What’s the relationship between car weight and fuel efficiency?”

Basic idea: Find a straight line that best fits scattered data points

Real-world use: Predicting outcomes based on patterns in data

The Math Behind the Magic

A simple linear model looks like this:

\[Y_i = \alpha + \beta X_i + \varepsilon_i\]

Breaking it down:

\(Y_i\) = What we’re trying to predict (dependent variable)
\(X_i\) = What we’re using to make predictions (independent variable)
\(\alpha\) = Where the line crosses the Y-axis (intercept)
\(\beta\) = How steep the line is (slope)
\(\varepsilon_i\) = The part we can’t explain (error)

To find the best line, we minimize:

\[\sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2\]

Where \(\hat{Y}_i = \hat{\alpha} + \hat{\beta}X_i\) is our prediction

Let’s Look at Some Real Data

We’ll examine how a car’s horsepower relates to its miles per gallon (MPG).

Understanding Residuals

Diagnostic Checks

Multiple Regression in 3D

R Code Example: Building Your First Model

# Load and explore the data
data(mtcars)
head(mtcars)

# Create a scatter plot
plot(mtcars$hp, mtcars$mpg, 
     main = "Quick Scatter Plot",
     xlab = "Horsepower", 
     ylab = "MPG",
     pch = 19, 
     col = "blue")

# Build the linear model
my_model <- lm(mpg ~ hp, data = mtcars)

# View the results
summary(my_model)

Making Predictions

# Display model equation
cat("Model equation: MPG =", 
    round(coef(my_model)[1], 2), "+",
    round(coef(my_model)[2], 2), "* HP\n")

# Make predictions for new cars
new_cars <- data.frame(hp = c(100, 150, 200))
predictions <- predict(my_model, newdata = new_cars)
print(predictions)

# Get confidence intervals
conf_intervals <- predict(my_model, 
                         newdata = new_cars,
                         interval = "confidence")
print(conf_intervals)

Model Diagnostics

# Create diagnostic plots
par(mfrow = c(2, 2))
plot(my_model)
par(mfrow = c(1, 1))

What to look for:

Residuals vs Fitted: Should show random scatter
Q-Q Plot: Points should follow the diagonal line
Scale-Location: Should show constant variance
Residuals vs Leverage: Identifies influential points

Key Takeaways

What we learned:

Linear regression finds the best-fitting line through data points
The model equation is: \(Y = \alpha + \beta X + \varepsilon\)
Residuals help us check if our model fits well
Multiple regression extends to multiple predictors
Always check diagnostic plots before trusting your model

Next steps:

Practice with different datasets
Learn about multiple regression in depth
Explore polynomial and non-linear models
Study model validation techniques

Questions?

Thank you for your attention!

Resources: R Documentation, “An Introduction to Statistical Learning”