Simple Linear Regression

2023-10-15

Topic Choice

The topic I have chosen for this presentation is simple linear regression.

I have chosen this topic because it is at the root of statistical analysis and is one of the most commonly used techniquies that everyone should learn a bit about.

What is Linear Regression?

Linear regression is a statistical technique which involves using an independent variable to predict a dependent variable. It can be a simple linear regression with two variables, but there is also multivariate regression which uses multiple variables to predict another.

It is often used to predict a number such as a stock price or the amount of something given another variable.This can help us understand a relationship between two variables.

How is Linear Regression Calculated

Linear Regression models are represented by the equation: \[Y = \beta_0 + \beta_1X + \epsilon\]

This equation contains a few parts:

\(Y\) represents the dependent variable
\(\beta_0\) is the intercept for the plot
\(\beta_1\) is a coefficient for the slope of the plot
\(X\) is the independent variable in the equation.

R Code for Regression Model

library(MASS)

data("Boston")

lmModel <- lm(medv ~ rm, data = Boston)

Here is the code used to create the linear regression model. I used the “Boston” data set in order to get housing data for this particular regression. This model will then be plotted as a linear regression plot.

Regression Plot

## `geom_smooth()` using formula = 'y ~ x'

Linear Regression on Boston Housing

Plotly Plot

This is an interactive plotly plot showing predicted vs observed values with the regression model made.

Code for Residuals

linearResiduals <- resid(lmModel)

residual_data <- data.frame(Predicted = predict(lmModel), 
                            Residuals = linearResiduals)

ggplot(data = residual_data, aes(x=Predicted, y = Residuals)) 
+ geom_point() + labs(title = "Residual Plot for Linear Regression Model", 
                      x = "Predicted Values", y = "Residuals")

This code allows us to plot the residuals for the linear regression model which are the differences between observed and predicted values.

Residual Plot

Here is a residual plot of the residuals from the model.

Residual Equation

The equation used for residuals is this:

\[ \epsilon_i = y_i - \hat{y}_i \]

It is the observed - predicted for a data point!

Thank You for Your Attention

Thanks for going through the presentation. I hope you learned something interesting.