2026-06-07

What is linear regression?

Linear regression is a statistical method used to model the relationship between a dependent variable (the outcome) and one or more independent variables (the predictors) by fitting a straight line to the data.

Example:

  • Car weight -> Fuel efficiency
  • Study Hours -> Exam score
  • Advertising Spending -> Sales

Regression 1

The simple regression model is:

\[ y = \beta_0 + \beta_1x + \epsilon \] Where :

  • \(y\) = response variable
  • \(x\) = predictor variable
  • \(\beta_0\) = intercept
  • \(\beta_1\) = slope
  • \(\epsilon\) = random error

Assumptions

For linear regression to work well:

  1. Linearity
  2. Independent observations
  3. Continuous variables
  4. Normality of residuals

Residual variance:

\[ Var(\epsilon)=\sigma^2 \]

Example Dataset

We use the built-in mtcars dataset(a built-in data frame in R that contains fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–1974 models)). Variables: - mpg = miles per gallon - wt = vehicle weight

Goal: Predict fuel efficiency from vehicle weight.

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Scatterplot of Weight vs MPG

Regression Line

## `geom_smooth()` using formula = 'y ~ x'

Fitting the Model

model <- lm(mpg ~ wt, data = mtcars)
summary(model)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Interactive Plot

plot_ly(
  data = mtcars,
  x = ~wt,
  y = ~mpg,
  type = "scatter",
  mode = "markers"
)

Conclusion

Key observations:

  • Heavier cars usually have lower mpg
  • The relationship appears approximately linear
  • Linear regression helps assess the relationship
  • Visualization help understanding