Introduction to Simple Linear Regression

2024-10-20

Slide 1: Introduction

Simple Linear Regression

In this presentation, we will explore the concept of Simple Linear Regression. We’ll use an example from real-world data to illustrate the process.

Slide 2: What is Simple Linear Regression?

Simple Linear Regression is a statistical method used to model the relationship between two variables:

Dependent Variable (Y): The variable we are trying to predict.
Independent Variable (X): The predictor variable.

The regression line is defined by the equation: \[ Y = \beta_0 + \beta_1 X + \epsilon \] where: - \(\beta_0\) is the intercept - \(\beta_1\) is the slope - \(\epsilon\) is the error term

Slide 3: Example Data

Let’s assume we have data on the relationship between study hours (X) and exam scores (Y) we will be using study hours vs exam scores. The code for this data is shown below:

# Sample Data
set.seed(123)
study_hours <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
exam_scores <- c(55, 57, 61, 64, 66, 70, 74, 78, 81, 85)
data <- data.frame(study_hours, exam_scores)

Slide 4: Plotting the Data - We will now plot the relationship between study hours and exam scores.

Scatter plot with regression line

## `geom_smooth()` using formula = 'y ~ x'

## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

3D plot using plotly, very cool.

# The following R code fits a simple linear regression model

model <- lm(exam_scores ~ study_hours, data=data)
summary(model)

## 
## Call:
## lm(formula = exam_scores ~ study_hours, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.41212 -0.25455  0.02424  0.43030  1.09091 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 50.53333    0.52647   95.98 1.55e-13 ***
## study_hours  3.37576    0.08485   39.79 1.75e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7707 on 8 degrees of freedom
## Multiple R-squared:  0.995,  Adjusted R-squared:  0.9943 
## F-statistic:  1583 on 1 and 8 DF,  p-value: 1.752e-10