Statistical Methods for Data Analysis
Linear regression helps us answer questions like:
Basic idea: Find a straight line that best fits scattered data points
Real-world use: Predicting outcomes based on patterns in data
A simple linear model looks like this:
\[Y_i = \alpha + \beta X_i + \varepsilon_i\]
Breaking it down:
To find the best line, we minimize:
\[\sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2\]
Where \(\hat{Y}_i = \hat{\alpha} + \hat{\beta}X_i\) is our prediction
We’ll examine how a car’s horsepower relates to its miles per gallon (MPG).
# Load and explore the data
data(mtcars)
head(mtcars)
# Create a scatter plot
plot(mtcars$hp, mtcars$mpg,
main = "Quick Scatter Plot",
xlab = "Horsepower",
ylab = "MPG",
pch = 19,
col = "blue")
# Build the linear model
my_model <- lm(mpg ~ hp, data = mtcars)
# View the results
summary(my_model)
# Display model equation
cat("Model equation: MPG =",
round(coef(my_model)[1], 2), "+",
round(coef(my_model)[2], 2), "* HP\n")
# Make predictions for new cars
new_cars <- data.frame(hp = c(100, 150, 200))
predictions <- predict(my_model, newdata = new_cars)
print(predictions)
# Get confidence intervals
conf_intervals <- predict(my_model,
newdata = new_cars,
interval = "confidence")
print(conf_intervals)
# Create diagnostic plots par(mfrow = c(2, 2)) plot(my_model) par(mfrow = c(1, 1))
What to look for:
What we learned:
Next steps:
Thank you for your attention!
Resources: R Documentation, “An Introduction to Statistical Learning”