2025-02-25

1) Introduction

Linear Regression is an extremely important statistical technique that is used to model the relationship between a dependent variable and one or more independent variables. In my presentation I will demonstrate:

  • The mathematical formula for simple linear regression
  • Simulate and fit a simple linear regression model in R
  • Visualize the fitted data and model using ggplot2
  • Demonstrate a 3D visualization of a multiple linear regression using plotly

2) Simple Linear Regression Formula

The formula for simple linear regression is as follows: \[ y = \beta_0 + \beta_1 x + \epsilon \] Where:

  • \(y\) is the dependent variable
  • \(x\) is the independent variable
  • \(\beta_0\) is the intercept
  • \(\beta_1\) is the slope
  • \(\epsilon\) is the random error term

3) Example Dataset for Linear Regression

Here we create a simple example dataset to illustrate how simple linear regression works. We will:

  • Generate \(x\) from a normal distribution with mean 50 and standard deviation 10
  • Add random noise \(\epsilon\) from a normal distribution with mean 0 and standard deviation 5
  • Combine them to form \(y = 6 + 0.7 \times x + \epsilon\)
##          x        y
## 1 44.39524 33.52464
## 2 47.69823 40.67318
## 3 65.58708 50.67750
## 4 50.70508 39.75585
## 5 51.29288 37.14692
## 6 67.15065 52.78032

4) R Code for Model Fitting

Next, we fit a simple linear regression model using the dataset we generated. This means to find the best-fitting line that describes the relationship between the independent variable \(x\) and the dependent variable \(y\). In this model, \(y\) is the response variable and \(x\) is the predictor. The lm() function is used to fit the model, and we then display the summary to view the estimated coefficients and diagnostic statistics.

model <- lm(y ~ x, data = data)

5) Model Summary

Below is the summary of the model we just fit.

## 
## Call:
## lm(formula = y ~ x, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.5367 -3.4175 -0.4375  2.9032 16.4520 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.79778    2.76324    2.46   0.0156 *  
## x            0.67376    0.05344   12.61   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.854 on 98 degrees of freedom
## Multiple R-squared:  0.6186, Adjusted R-squared:  0.6147 
## F-statistic:   159 on 1 and 98 DF,  p-value: < 2.2e-16

6) Scatterplot with Regression Line using ggplot

ggplot(data, aes(x = x, y = y)) +
  geom_point(color = "darkblue") + # blue color for dots
  geom_smooth(method = "lm", color = "darkred", se = FALSE) + # Red color for line
  labs(
    title = "Scatterplot with Regression Line using ggplot", # our title
    x = "Independent Variable: x", # x label
    y = "Dependent Variable: y" # y label
  )

7) Residual Analysis using ggplot

Now that we have fitted our model, we check for any obvious patterns using ggplot.

8) 3D Plot with Plotly

We can extend our analysis to multiple predictors. Here, we add a second predictor, also known as another independent variable \(x_2\), simulate a new dataset, and visualize the fitted regression plane in 3D.

9) Conclusion

In this presentation, we explored:

  • Simple Linear Regression:
    • Formula: \(y = \beta_0 + \beta_1 x + \epsilon\)
    • Simulation of data and fitting with lm()
    • Visualizing results with ggplot2
  • Residual Analysis:
    • Checking for patterns or issues in the model fit
  • Multiple Linear Regression:
    • Extending to two predictors and visualizing in 3D with plotly

Linear regression is a powerful yet relatively simple tool for exploring relationships between variables in many different professional fields, such as economics and engineering. Thank you for listening to my presentation!