The Linear Regression Model



Least Squares (LS) Model Fitting



MLE and LS

Example: Linear Model for Scottish Hill Races

Races<-read.table("http://stat4ds.rwth-aachen.de/data/ScotsRaces.dat", header=TRUE)
head(Races,3) # timeM for men, timeW for women
race distance climb timeM timeW
1 AnTeallach 10.6 1.062 74.68 89.72
2 ArrocharAlps 25.0 2.400 187.32 222.03
3 BaddinsgillRound 16.4 0.650 87.18 102.48
matrix(cbind(mean(Races$timeW),sd(Races$timeW),mean(Races$climb),sd(Races$climb),mean(Races$distance), sd(Races$distance)),nrow=2)
[,1] [,2] [,3]
[1,] 100.72221 0.8872353 16.63382
[2,] 72.58686 0.5376686 11.81921
pairs(~timeW+distance+climb, data=Races) #scatter plot matrix
fit.d<-lm(timeW~distance,data=Races)# model E(timeW) as a linear function of distance
summary(fit.d)
Call:
lm(formula = timeW ~ distance, data = Races)
Residuals:
Min 1Q Median 3Q Max
-42.821 -11.945 -3.642 8.618 72.211
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.1076 4.5367 0.685 0.496
distance 5.8684 0.2229 26.330 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 21.56 on 66 degrees of freedom
Multiple R-squared: 0.9131, Adjusted R-squared: 0.9118
F-statistic: 693.3 on 1 and 66 DF, p-value: < 2.2e-16
The Correlation
Correlation Game (shinyapps.io)
Alternatively,


cor(Races[,c("timeW","distance","climb")]) # correlation matrix
timeW distance climb
timeW 1.0000000 0.9555494 0.6852924
distance 0.9555494 1.0000000 0.5144714
climb 0.6852924 0.5144714 1.0000000
cor(cbind(Races[-41,]$timeW,Races[-41,]$distance,Races[-41,]$climb))
[,1] [,2] [,3]
[1,] 1.0000000 0.9205392 0.8515990
[2,] 0.9205392 1.0000000 0.6617137
[3,] 0.8515990 0.6617137 1.0000000
Regression!
Why are regression problems called “regression” problems?


Linear Models and Reality
