2024-01-29

Defining Simple Linear Regression

Simple linear regression is a statistical method used to model the relationship between a single independent variable and a dependent variable. The single independent variable is known as the predictor or explanatory variable and the dependent one as the response variable.

The purpose of this method is to predict the value of the dependent variable based on the value of the independent variable. A linear relationship that best describes the correlation between the two variables is the end goal.

The Regression Model Equation

Expected change in Y per unit X

\(Å·=a+bX\)

\(Å·\) represents the dependent variable and the predicted average of Y at a given X

\(X\) represents the independent variable

\(a\) represents the line’s intercept

\(b\) represents the line’s slope (the change in Y for one-unit change in X)

The Least Squares Method

The least squared method minimizes the sum of squared residuals between observed and predicted values

\(b=\frac{SS_(xy)}{SS_(xx)}\)

\(a=ȳ-bx̅\)

Creating a Scatter Plot with the Regression Line - Code

library(ggplot2)
set.seed(123)
x = rnorm(100)
y = 2 * x + rnorm(100)
data = data.frame(x, y)
invisible(ggplot(data, aes(x, y)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "red")+
  labs(titel = "Scatter Plot with Regression Line", 
       x= "Independent Variable (x)",
       y= "Dependent Variable (Y)"))

A Scatter Plot with the Regression Line

## `geom_smooth()` using formula = 'y ~ x'

Creating a Scatter Plot with the Least Squares Regression Line - Code

library(ggplot2)
set.seed(123)
x = rnorm(100)
y = 2 * x + rnorm(100)
data = data.frame(x, y)
model = lm(y ~ x, data = data)
data = data.frame(x, y)
invisible(ggplot(data, aes(x, y)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "blue")+
  labs(titel = "Scatter Plot with Least Squared Regression Line", 
       x= "Independent Variable (x)",
       y= "Dependent Variable (Y)"))

A Scatter Plot with the Least Squares Regression Line

## `geom_smooth()` using formula = 'y ~ x'

Simple Regression Line and Least Squares Regression Line - Code

library(plotly)
library(ggplot2)
set.seed(123)
data = data.frame(x = seq(-3, 3, length.out = 100))
data$y = 2 * data$x^2 + rnorm(100)
simple_model = lm(y ~ x, data = data)
ls_model = lm(y ~ poly(x, 2, raw = TRUE), data = data)
invisible(plot_ly(data, x = ~x, y = ~y, mode = "markers", 
                  type = "scatter",
                  marker = list(color = "grey", opacity = 0.5)) %>%
  add_lines(x = data$x, y = predict(simple_model), 
            name = "Simple Regression", 
            line = list(color = "red"), mode = "lines+markers") %>%
  add_lines(x = data$x, y = predict(ls_model), 
            name = "Least Squares Regression (Degree 2)", 
            line = list(color = "blue"), mode = "lines+markers") %>%
  layout(title = "Scatter Plot with Simple and Least Squares Regression 
         Lines",
         xaxis = list(title = "Indepenent Variable"),
         yaxis = list(title = "Dependent Variable")))

Simple Regression Line and Least Squares Regression Line - Plot

Assumptions of Simple Linear Regression

  • Linearity: The relationship between variables is linear.
  • Independence: Observations are independent.
  • Normality: Residuals are normally distributed.
  • Homoscedasticity: Residuals have constant variance.

Conclusion

Simple linear regression is a statistical method used to model the relationship between a single independent variable and a dependent variable. It provides insights into the correlation between the variables. In daily life, simple linear regression is commonly used for tasks like predicting house prices, understanding the relationship between number of study hours and exam scores, or estimating the impact of advertising spending on company sales. By identifying and quantifying these relationships, simple linear regression helps make informed decisions and predictions in various aspects of our daily routines.