This presentation applies simple linear regression to a dataset containing SAT scores and GPA. We aim to predict a student’s GPA based on their SAT score.
This presentation applies simple linear regression to a dataset containing SAT scores and GPA. We aim to predict a student’s GPA based on their SAT score.
The dataset contains two variables: - SAT: Student SAT scores - GPA: Corresponding GPA values. Below is a preview of the data:
# Load the dataset
sat_gpa <- read.csv("1.01. Simple linear regression.csv")
head(sat_gpa)
## SAT GPA ## 1 1714 2.40 ## 2 1664 2.52 ## 3 1760 2.54 ## 4 1685 2.74 ## 5 1693 2.83 ## 6 1670 2.91
library(ggplot2)
ggplot(sat_gpa, aes(x = SAT, y = GPA)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = FALSE, color = "red") +
ggtitle("Linear Regression: SAT vs GPA") +
xlab("SAT Score") +
ylab("GPA")
## `geom_smooth()` using formula = 'y ~ x'
The equation of the linear regression model is given by:
\[ \hat{GPA} = \beta_0 + \beta_1 \times SAT + \epsilon \]
The residuals, or the differences between the observed and predicted values, are calculated as:
\[ e_i = y_i - \hat{y}_i \]
library(plotly)
plot_ly(
sat_gpa, x = ~SAT, y = ~GPA, z = ~SAT,
type = 'scatter3d', mode = 'markers') %>%
layout(scene = list(xaxis = list(title = 'SAT'),
yaxis = list(title = 'GPA'),
zaxis = list(title = 'SAT')))