Linear regression is a statistical method used to model the relationship between two variables by fitting a straight line through the data.
- Dependent variable (Y): What we want to predict
- Independent variable (X): What we use to make predictions
Two real-world examples using the ggplot2movies dataset:
Budget → Votes: Can a movie’s production budget predict how many audience votes it receives? More spend often means wider release and more viewers.
Budget → Rating: Does a bigger budget lead to a higher audience rating? Not always — this tests whether money buys quality.