Linear Models and Least Squares

The Linear Regression Model

Least Squares (LS) Model Fitting

MLE and LS

Example: Linear Model for Scottish Hill Races

Races<-read.table("http://stat4ds.rwth-aachen.de/data/ScotsRaces.dat", header=TRUE)
head(Races,3) # timeM for men, timeW for women
              race distance climb  timeM  timeW
1       AnTeallach     10.6 1.062  74.68  89.72
2     ArrocharAlps     25.0 2.400 187.32 222.03
3 BaddinsgillRound     16.4 0.650  87.18 102.48
matrix(cbind(mean(Races$timeW),sd(Races$timeW),mean(Races$climb),sd(Races$climb),mean(Races$distance), sd(Races$distance)),nrow=2)
          [,1]      [,2]     [,3]
[1,] 100.72221 0.8872353 16.63382
[2,]  72.58686 0.5376686 11.81921
pairs(~timeW+distance+climb, data=Races) #scatter plot matrix

fit.d<-lm(timeW~distance,data=Races)# model E(timeW) as a linear function of distance
summary(fit.d)

Call:
lm(formula = timeW ~ distance, data = Races)

Residuals:
    Min      1Q  Median      3Q     Max 
-42.821 -11.945  -3.642   8.618  72.211 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.1076     4.5367   0.685    0.496    
distance      5.8684     0.2229  26.330   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 21.56 on 66 degrees of freedom
Multiple R-squared:  0.9131,    Adjusted R-squared:  0.9118 
F-statistic: 693.3 on 1 and 66 DF,  p-value: < 2.2e-16

The Correlation

Correlation Game (shinyapps.io)

Alternatively,

cor(Races[,c("timeW","distance","climb")]) # correlation matrix
             timeW  distance     climb
timeW    1.0000000 0.9555494 0.6852924
distance 0.9555494 1.0000000 0.5144714
climb    0.6852924 0.5144714 1.0000000
cor(cbind(Races[-41,]$timeW,Races[-41,]$distance,Races[-41,]$climb))
          [,1]      [,2]      [,3]
[1,] 1.0000000 0.9205392 0.8515990
[2,] 0.9205392 1.0000000 0.6617137
[3,] 0.8515990 0.6617137 1.0000000

Regression!

Why are regression problems called “regression” problems?

Linear Models and Reality

Exercises