2023-10-16

Introduction

Simple Linear Regression is a statistical method for modeling the relationship between two variables: a dependent variable (Y) and an independent variable (X).

It is the simplest form of regression analysis and is used to predict the value of the dependent variable based on the value of the independent variable.

Mathematical Formulation

The equation for a Simple Linear Regression model is:

\[ \text{\(Y\)} = \beta_0 + \beta_1 \text(X) + \epsilon \] \(Y\) is the dependant variable.

\(X\) is the independant variable.

\(\beta_0\) is the intercept.

\(\beta_1\) is the slope

\(\varepsilon\) represents the error term.

Scatterplot

Let’s create a scatterplot to visualize the relationship between the variables.

Linear Model for Scatterplot

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.67150 -0.66829  0.03513  0.63645  2.74429 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.04255    0.09169  -0.464    0.643    
## x            2.06376    0.09691  21.297   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.025 on 123 degrees of freedom
## Multiple R-squared:  0.7867, Adjusted R-squared:  0.7849 
## F-statistic: 453.5 on 1 and 123 DF,  p-value: < 2.2e-16

Coefficients

The Linear Regression model provides coefficients:

\[ \hat{\beta}_0 = 0.0322, \quad \hat{\beta}_1 = 1.9501 \]

These coefficients represent the estimated intercept and slope respectively.

Evaluation

Let’s evaluate the performance of our linear regression model.

Mean Squared Error (MSE)

The Mean Squared Error (MSE) measures the average squared difference between the observed and predicted values. A lower MSE indicates a better-fitting model.

# Mean Squared Error (MSE)
mse <- mean(lm_model$residuals^2)
mse
## [1] 1.033605

Evaluation (Continued)

R-Squared

The R-squared (R²) statistic measures the proportion of variance in the dependent variable that is explained by the independent variable(s). A higher R-squared indicates a better fit.

# Calculate R-squared (R²)
rsq <- summary(lm_model)$r.squared
rsq
## [1] 0.7866591

Residual Plot

Prediction

We can use the regression model to make predictions. The formula is:

\[ \hat{Y} = \hat{\beta}_0 + \hat{\beta}_1X \]

For a given value of \(X\), we can predict \(\hat{Y}\).

Assumptions

Simple Linear Regression makes several assumptions about the data, including linearity, independence of errors, and homoscedasticity.

Conclusion

Simple Linear Regression is a very useful tool for modeling relationships between two variables.

It provides insights into the strength and direction of the relationship.

It can be used to study, predict and understand the impact of the independent variable on the dependent variable.