2024-09-21

Introduction to Simple Linear Regression

Simple Linear Regression is a statistical method that allows us to summarize and study relationships between two continuous variables:

  • One variable (X) is considered to be the predictor, explanatory, or independent variable.
  • The other variable (Y) is considered to be the response, outcome, or dependent variable.

This relationship is modeled using a straight line.

The Linear Model

The simple linear regression model can be expressed as:

\[Y = \beta_0 + \beta_1X + \epsilon\]

Where:

  • \(\beta_0\) is the y-intercept

  • \(\beta_1\) is the slope

  • \(\epsilon\) is the error term

Assumptions of Simple Linear Regression

  1. Linearity: The relationship between X and Y is linear.
  2. Independence: Observations are independent of each other.
  3. Homoscedasticity: The variance of residual is constant.
  4. Normality: The residuals are normally distributed.

Example Dataset

We will create and visualize a sample dataset to illustrate simple linear regression:

set.seed(123)
X <- seq(1, 100, by = 2)
Y <- 2 + 0.5 * X + rnorm(length(X), mean = 0, sd = 5)
data <- data.frame(X = X, Y = Y)

Visualizing the Data

Fitting the Model

We will fit a simple linear regression model below:

model <- lm(Y ~ X, data = data)
summary(model)
## 
## Call:
## lm(formula = Y ~ X, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.0560  -3.1111  -0.4097   3.3295  10.7983 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.34169    1.32284    1.77    0.083 .  
## X            0.49661    0.02291   21.67   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.676 on 48 degrees of freedom
## Multiple R-squared:  0.9073, Adjusted R-squared:  0.9054 
## F-statistic: 469.7 on 1 and 48 DF,  p-value: < 2.2e-16

Visualizing the Fitted Line

## `geom_smooth()` using formula = 'y ~ x'

3D Visualization of Residuals

Interpreting the Results

From our model summary:

  • The estimated intercept (\(\beta_0\)) is approximately 2.34
  • The estimated slope (\(\beta_1\)) is approximately 0.5

This means: - When X = 0, the predicted value of Y is 2.34 - For each unit increase in X, Y is expected to increase by 0.5 units

Conclusion

Simple linear regression is a powerful tool for:

  1. Understanding relationships between variables
  2. Making predictions
  3. Assessing the strength of relationships