Birds Regression

Author

Andrew Dalby

Introduction

This is a set of data from Statistical Methods in Biology Second Edition that describes the relationship between altitude and tail length. The file was digitized using plot digitizer 2.6.8 and saved to a csv file.

data <- read.csv("bird regression.csv")
library(ggplot2)
ggplot(data, aes(x=Altitude, y=Tail)) +
  geom_point(size=2, shape=23) +
  geom_smooth(method=lm) +
  labs(y="Tail Length (mm)", x="Altitude (1000m)")
`geom_smooth()` using formula = 'y ~ x'

Linear Regression

You can then fit the linear regression model to the data and carry out the model diagnostics.

The general equation for a straight line is:

\[y=\alpha+\beta{x}\]

This sample can then be used to generate estimates for \(\alpha\) and \(\beta\) that are \(a\) and \(b\).

\[ b=\dfrac{\sum(x-\bar{x})(y-\bar{y})}{\sum(x-\bar{x})^{2}} \]

\[ a=\bar{y}-b \bar{x} \]

We can use R to calculate the values according to the formula. This is simple when you carry out the operations on the vectors.

x <- data$Altitude
y <- data$Tail
xbar <- mean(x)
ybar <- mean(y)
xbar
[1] 1.37244
ybar
[1] 41.53142
x1 <- x-xbar
y1 <- y-ybar
n <- x1*y1
d <- x1^2
sum(n)
[1] 127.0051
sum(d)
[1] 35.41282
b <- sum(n)/sum(d)
b
[1] 3.586416
a <- ybar - b*xbar
a
[1] 36.60928

Alternatively you can use the built in linear model function in R which also produces a full set of diagnostic plots as well as a data object containing the residuals.

model <- lm(Tail~Altitude, data)
summary(model)

Call:
lm(formula = Tail ~ Altitude, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.9688 -2.4378 -0.0019  2.4307  9.3799 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  36.6093     0.9946  36.809  < 2e-16 ***
Altitude      3.5864     0.6434   5.574 4.64e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.829 on 68 degrees of freedom
Multiple R-squared:  0.3136,    Adjusted R-squared:  0.3035 
F-statistic: 31.07 on 1 and 68 DF,  p-value: 4.64e-07
plot(model,1)

plot(model, 2)

plot(model, 3)

plot(model, 4)

plot(model, 5)

plot(model, 6)

More Diagnostic Plots

You can also calculate extra diagnostic plots of you wish. For example the residual plot and the standardised residual plot

plot(model$residuals, main="Plot of the Residuals", ylab="Residual (mm)")

res <- model$residuals
m1 <- mean(res)
s1 <- sd(res)
std_res <- (res-m1)/s1
plot(std_res, main="Plot of the Standardised Residuals", ylab="Standard Deviations")

lev <- hatvalues(model)
plot(lev, main="Plot of the Leverage", ylab="Leverage")