- What is Simple Linear Regression?
- Usage of the
diamondsdataset - Fitting a simple linear regression model
- Visuals using ggplot and plotly
- Examples of code and applications
2025-04-13
diamonds datasetSimple linear regression is used to determine the relationship between two quatatative variables:
Math behind the model: \[ \widehat{y} = \widehat{\beta}_0 + \widehat{\beta}_1 x \] \[ \widehat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2} \quad \text{and} \quad \widehat{\beta}_0 = \bar{y} - \widehat{\beta}_1 \bar{x} \]
We often test if the slope \(\beta_1\) is zero; \[ H_0: \beta_1 = 0 \quad \text{vs.} \quad H_a: \beta_1 \neq 0. \]
To test this, we calculate the test statistic: \[ t = \frac{\widehat{\beta}_1 - 0}{\mathrm{SE}(\widehat{\beta}_1)}. \]
If the p-value is small enough, we reject \(H_0\) and conclude \(\beta_1 \neq 0\).
The diamonds dataset (from the ggplot2 package) contains info about diamonds Analyzing the relationship in value in dimaonds based on carat sizing:
# Showing R code:
head(diamonds[, c("carat", "price")], 5)
# A tibble: 5 × 2 carat price <dbl> <int> 1 0.23 326 2 0.21 326 3 0.23 327 4 0.29 334 5 0.31 335
`geom_smooth()` using formula = 'y ~ x'
plot_ly(data = diamonds,x = ~carat,y = ~price,
type = "scatter",mode = "markers")
There are several possible applications of Simple Linear Regression, such as: