Simple Linear Regression: SAT vs GPA

Introduction

This presentation applies simple linear regression to a dataset containing SAT scores and GPA. We aim to predict a student’s GPA based on their SAT score.

The Dataset

The dataset contains two variables: - SAT: Student SAT scores - GPA: Corresponding GPA values. Below is a preview of the data:

# Load the dataset
sat_gpa <- read.csv("1.01. Simple linear regression.csv")
head(sat_gpa)

##    SAT  GPA
## 1 1714 2.40
## 2 1664 2.52
## 3 1760 2.54
## 4 1685 2.74
## 5 1693 2.83
## 6 1670 2.91

Scatter Plot SAT vs GPA

Simple Linear Regression Model

library(ggplot2)
ggplot(sat_gpa, aes(x = SAT, y = GPA)) + 
  geom_point(color = "blue") +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  ggtitle("Linear Regression: SAT vs GPA") +
  xlab("SAT Score") +
  ylab("GPA")

Simple Linear Regression SAT vs GPA

## `geom_smooth()` using formula = 'y ~ x'

Linear Regression Equation (LaTeX Math)

The equation of the linear regression model is given by:

\[ \hat{GPA} = \beta_0 + \beta_1 \times SAT + \epsilon \]

The residuals, or the differences between the observed and predicted values, are calculated as:

\[ e_i = y_i - \hat{y}_i \]

3D Plot (Plotly)

library(plotly)

plot_ly(
  sat_gpa, x = ~SAT, y = ~GPA, z = ~SAT, 
  type = 'scatter3d', mode = 'markers') %>%
  layout(scene = list(xaxis = list(title = 'SAT'),
                      yaxis = list(title = 'GPA'),
                      zaxis = list(title = 'SAT')))