2025-03-13

Slide 1: Introduction

  • This presentation covers a simple linear regression using a dataset of house prices over the years
  • We also include a second variable, house size, to visualize a 3D plot using Plotly
  • We will cover:
    1. The Basic concepts and formula of Simple Linear Regression
    2. Data Simulation and Overview
    3. Model fitting in R
    4. Plots with ggplot2 and plotly

Slide 2: Concept of Simple Linear Regression

  • Simple Linear Regression go into the relationship between two variables \((X, Y)\)
  • Usually, \(X\) is considered the predictor, which is an independent variable and \(Y\) the response, which is a dependent variable
  • We assume a model of the form

\[ Y = \beta_0 + \beta_1 X + \varepsilon \]

where
- \(\beta_0\) is the intercept
- \(\beta_1\) is the slope
- \(\varepsilon\) is the random error term

Slide 3: Estimating Model Parameters

  • We check to estimate \(\beta_0\) and \(\beta_1\) by least squares, which minimizes the sum of squared residuals:

\[ \min_{\beta_0, \beta_1} \sum_{i=1}^n (y_i - \beta_0 - \beta_1 x_i)^2 \]

  • The solutions in simple linear regression are given by:
    \[ \hat{\beta}_1 = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2} \quad\text{and}\quad \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}. \]

Slide 4: Data Simulation and Overview

To simulate some data for house prices across years and house size

First 6 rows
Year Size (sq ft) Price (USD)
2000 2177 $80,129
2001 2676 $115,752
2002 1795 $103,306
2003 2714 $103,614
2004 1481 $72,451
2005 1481 $96,411

Slide 5: Fitting a Linear Model in R

Call:
lm(formula = Price ~ Year, data = house_data)

Residuals:
   Min     1Q Median     3Q    Max 
-32140 -18009   4497  12412  38992 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept) -1439277.9  1424271.7  -1.011    0.325
Year             767.2      708.6   1.083    0.292

Residual standard error: 19660 on 19 degrees of freedom
Multiple R-squared:  0.05812,   Adjusted R-squared:  0.008542 
F-statistic: 1.172 on 1 and 19 DF,  p-value: 0.2925

Slide 6: First ggplot (Scatter + Regression Line)

library(ggplot2)
ggplot(house_data, aes(x=Year, y=Price)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE) +
  ggtitle("Scatter Plot of House Price Over the Years with Regression Line")

Slide 7: Second ggplot (Density of Residuals)

Slide 8: Plotly (3D Plot with an Extra Predictor)

  • A 3D Plot using plotly to visualize the relationship of Price Year + Size

Slide 9: Conclusion

  • We showed that simple linear regression using a house price dataset, including:
    • Model expectations and parameter estimation
    • By fitting a linear model to see how price changes with year
    • Visualizing residuals and analyzing distribution with ggplot2
    • By looking into a 3D scatter plot with plotly to include house size