2023-10-15

Topic Choice

The topic I have chosen for this presentation is simple linear regression.

I have chosen this topic because it is at the root of statistical analysis and is one of the most commonly used techniquies that everyone should learn a bit about.

What is Linear Regression?

Linear regression is a statistical technique which involves using an independent variable to predict a dependent variable. It can be a simple linear regression with two variables, but there is also multivariate regression which uses multiple variables to predict another.

It is often used to predict a number such as a stock price or the amount of something given another variable.This can help us understand a relationship between two variables.

How is Linear Regression Calculated

Linear Regression models are represented by the equation: \[Y = \beta_0 + \beta_1X + \epsilon\]

This equation contains a few parts:

  • \(Y\) represents the dependent variable
  • \(\beta_0\) is the intercept for the plot
  • \(\beta_1\) is a coefficient for the slope of the plot
  • \(X\) is the independent variable in the equation.

R Code for Regression Model

library(MASS)

data("Boston")

lmModel <- lm(medv ~ rm, data = Boston)

Here is the code used to create the linear regression model. I used the “Boston” data set in order to get housing data for this particular regression. This model will then be plotted as a linear regression plot.

Regression Plot

## `geom_smooth()` using formula = 'y ~ x'
Linear Regression on Boston Housing

Linear Regression on Boston Housing

Plotly Plot

This is an interactive plotly plot showing predicted vs observed values with the regression model made.

Code for Residuals

linearResiduals <- resid(lmModel)

residual_data <- data.frame(Predicted = predict(lmModel), 
                            Residuals = linearResiduals)

ggplot(data = residual_data, aes(x=Predicted, y = Residuals)) 
+ geom_point() + labs(title = "Residual Plot for Linear Regression Model", 
                      x = "Predicted Values", y = "Residuals")
  • This code allows us to plot the residuals for the linear regression model which are the differences between observed and predicted values.

Residual Plot

  • Here is a residual plot of the residuals from the model.

Residual Equation

The equation used for residuals is this:

\[ \epsilon_i = y_i - \hat{y}_i \]

It is the observed - predicted for a data point!

Thank You for Your Attention

Thanks for going through the presentation. I hope you learned something interesting.