2025-04-13

Overview

  • What is Simple Linear Regression?
  • Usage of the diamonds dataset
  • Fitting a simple linear regression model
  • Visuals using ggplot and plotly
  • Examples of code and applications

What is Simple Linear Regression?

Simple linear regression is used to determine the relationship between two quatatative variables:

  • 1 independent variable [X]
  • 1 dependent variable [Y]

Math behind the model: \[ \widehat{y} = \widehat{\beta}_0 + \widehat{\beta}_1 x \] \[ \widehat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2} \quad \text{and} \quad \widehat{\beta}_0 = \bar{y} - \widehat{\beta}_1 \bar{x} \]

Test Using Slope

We often test if the slope \(\beta_1\) is zero; \[ H_0: \beta_1 = 0 \quad \text{vs.} \quad H_a: \beta_1 \neq 0. \]

To test this, we calculate the test statistic: \[ t = \frac{\widehat{\beta}_1 - 0}{\mathrm{SE}(\widehat{\beta}_1)}. \]

If the p-value is small enough, we reject \(H_0\) and conclude \(\beta_1 \neq 0\).

Example with Diamonds

The diamonds dataset (from the ggplot2 package) contains info about diamonds Analyzing the relationship in value in dimaonds based on carat sizing:

  • X = carat (diamond weight)
  • Y = price (diamond price)
# Showing R code:
head(diamonds[, c("carat", "price")], 5)
# A tibble: 5 × 2
  carat price
  <dbl> <int>
1  0.23   326
2  0.21   326
3  0.23   327
4  0.29   334
5  0.31   335

Price vs. Carat using ggplot

`geom_smooth()` using formula = 'y ~ x'

Residual Plot using ggplot

Visual using Plotly

plot_ly(data = diamonds,x = ~carat,y = ~price,
        type = "scatter",mode = "markers")

Applications of Simple Linear Regression

There are several possible applications of Simple Linear Regression, such as:

  • Risk Analysis: Evaluation how 1 variable can impact possible losses or gains
  • Economics: Model the relationship between supply and demand
  • Forecasting: Using historical trends to predict future outcomes