Introduction

Simple linear regression is a statistical method that models the relationship between a dependent variable and one independent variable.

Example

Consider a dataset where we want to predict a student’s score based on the number of hours studied.

Data

# Simulated data
set.seed(42)
hours <- 1:10
scores <- 10 + 5 * hours + rnorm(10, 0, 3)
data <- data.frame(hours, scores)

##Scatter Plot{.smaller}

library(ggplot2)
ggplot(data, aes(x = hours, y = scores)) +
    geom_point() +
    labs(title = "Hours Studied vs. Scores", x = "Hours Studied", y = "Scores")

Fitting The Model

model <- lm(scores ~ hours, data = data)
summary(model)
## 
## Call:
## lm(formula = scores ~ hours, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.017 -2.117 -0.354  2.165  4.094 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.1403     1.8050   6.172 0.000267 ***
## hours         5.0912     0.2909  17.502 1.16e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.642 on 8 degrees of freedom
## Multiple R-squared:  0.9745, Adjusted R-squared:  0.9714 
## F-statistic: 306.3 on 1 and 8 DF,  p-value: 1.159e-07

##Predictions

new_data <- data.frame(hours = 8)
predicted_score <- predict(model, new_data)
predicted_score
##        1 
## 51.86987

##3D Plot

library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
plot_ly(data, x = ~hours, y = ~scores, z = ~fitted(model), type = "scatter3d", mode = "markers")