During this presentation, we will look at linear regression and analyze the correlation between Old Faithful’s eruption duration and waiting times. We are using the built in dataset that contains 272 observations of Old Faithful eruptions.
During this presentation, we will look at linear regression and analyze the correlation between Old Faithful’s eruption duration and waiting times. We are using the built in dataset that contains 272 observations of Old Faithful eruptions.
We can define the linear regression model as:
\[Y_i = \beta_0 + \beta_1X_i + \epsilon_i\]
Where:
Here’s the code for the previous Scatter plot
ggplot(faithful, aes(x = eruptions, y = waiting)) +
geom_point(color = 'green') +
theme_minimal() + geom_smooth(method = "lm",
se = TRUE, color = "red") +
labs(
title = "Eruption Duration vs. Waiting Time",
x = "Eruption Duration (min)",
y = "Waiting Time (min)"
)
p <- ggplot(faithful, aes(x = eruptions, y = waiting)) +
geom_point(alpha = 0.5) +
geom_density_2d(color = "red") +
theme_minimal() +
labs(
title = "Interactive Density Contours of Eruption Patterns",
x = "Eruption Duration (minutes)",
y = "Waiting Time (minutes)"
)
ggplotly(p, tooltip = "text")
The \(R^2\) measures how well eruption duration predicts waiting time:
\[R^2 = 1 - \frac{\sum(y_i - \hat{y_i})^2}{\sum(y_i - \bar{y})^2}\] For Old Faithful:
Our model’s \(R^2 = 0.811\), so roughly 81.1% of waiting time.