2024-10-20

Introduction to Simple Linear Regression

  • Simple linear regression is a statistical method used to model the relationship between a dependent variable and an independent variable.

  • The equation of a simple linear regression model is given by:

    \[y = \beta_0 + \beta_1 x + \epsilon\]

  • Where:

    • \(y\) is the dependent variable.
    • \(x\) is the independent variable.
    • \(\beta_0\) is the y-intercept.
    • \(\beta_1\) is the slope of the line.
    • \(\epsilon\) is the error term.

Example Dataset

  • Using the mtcars dataset, which includes information about different car models.
  • We will model the relationship between Miles Per Gallon (mpg) (dependent variable) and Horsepower (hp) (independent variable).

Mathematical Concept 1: Mean and Variance

  • The mean of a dataset is defined as:

    \[\mu = \frac{1}{N} \sum_{i=1}^{N} x_i\]

  • The variance is defined as:

    \[\sigma^2 = \frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2\]

Where: - \(N\) is the number of observations. - \(x_i\) represents each individual observation in the dataset.

Mathematical Concept 2: Coefficient of Determination (R²)

  • The coefficient of determination (R²) measures how well the independent variable explains the variability of the dependent variable:

    \[R^2 = 1 - \frac{SS_{res}}{SS_{tot}}\]

Where: - \(SS_{res}\) is the sum of squares of residuals. - \(SS_{tot}\) is the total sum of squares.

Scatter Plot of MPG vs Horsepower

Regression Line for MPG vs Horsepower

Histogram of Miles Per Gallon (mpg)

3D Plot of MPG Predictions

R Code Example

```r # Create scatter plot with regression line library(ggplot2)

ggplot(mtcars, aes(x=hp, y=mpg)) + geom_point() + geom_smooth(method=“lm”, se=FALSE, color=“blue”) + labs(title=“Regression Line for MPG vs Horsepower”, x=“Horsepower (hp)”, y=“Miles Per Gallon (mpg)”) + theme_minimal()