2024-09-10

What is Simple Linear Regression?

Simple Linear Regression is used to predict the value of a dependent variable based on the value of an independent variable. The model assumes a linear relationship between the two variables.

Equation: \(Y = \beta_0 + \beta_1 X + \epsilon\)

Where: - \(Y\): Dependent variable - \(X\): Independent variable - \(\beta_0\): Intercept - \(\beta_1\): Slope - \(\epsilon\): Error term

A Plotly scatter plot with a regression line

Plot 1: Scatter Plot with Regression Line

This plot shows the relationship between the predictor \(X\) and the response \(Y\), with a fitted regression line.

## `geom_smooth()` using formula = 'y ~ x'

### Plot 2: Residuals Plot

```markdown ## Plot 2: Residuals Plot

This plot shows the residuals from the linear regression model, which helps assess the fit of the model.

Plot 2: Residuals Plot

This plot shows the residuals from the linear regression model, which helps assess the fit of the model.

Sum of Squared Errors (SSE)

The Sum of Squared Errors (SSE) measures the total deviation of the response values from the fit of the model:

\[ \text{SSE} = \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2 \]

Where: - \(Y_i\) is the observed value of the response variable. - \(\hat{Y}_i\) is the predicted value of the response variable using the regression model. - \(n\) is the number of observations.

The SSE is used to assess the goodness of fit of the regression model; a smaller SSE indicates a better fit.

Estimating Parameters

The parameters \(\beta_0\) and \(\beta_1\) are estimated using the least squares method:

\[ \hat{\beta}_1 = \frac{\sum_{i=1}^{n} (X_i - \bar{X})(Y_i - \bar{Y})}{\sum_{i=1}^{n} (X_i - \bar{X})^2} \]

\[ \hat{\beta}_0 = \bar{Y} - \hat{\beta}_1 \bar{X} \]

Where: - \(\hat{\beta}_1\) is the estimated slope. - \(\hat{\beta}_0\) is the estimated intercept. - \(\bar{X}\) and \(\bar{Y}\) are the means of the predictor and response variables, respectively.

R Code

```r #Load the necessary library library(ggplot2)

#Sample data x <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100) y <- c(15, 25, 35, 45, 55, 70, 85, 100, 120, 130) data <- data.frame(x = x, y = y)

#Create the scatter plot with a regression line ggplot(data, aes(x = x, y = y)) + geom_point(color = “blue”) + geom_smooth(method = “lm”, se = FALSE, color = “red”) + labs(title = “Scatter Plot with Regression Line”, x = “Predictor (X)”, y = “Response (Y)”)