Illustrate simple linear regression using the crabs data from Deborah Nolan’s StatLabs web sight.
setwd("~/Desktop/R/regression") # Set wd to directory containing the data
crabs <- read.table("~/Desktop/R/regression/crabs.data.txt",header=T)
head(crabs)
## presz postsz inc year lf
## 1 113.6 127.7 14.1 NA 0
## 2 118.1 133.2 15.1 NA 0
## 3 119.9 135.3 15.4 NA 0
## 4 126.2 143.3 17.1 NA 0
## 5 126.7 139.3 12.6 NA 0
## 6 127.3 140.2 12.9 NA 0
str(crabs)
## 'data.frame': 472 obs. of 5 variables:
## $ presz : num 114 118 120 126 127 ...
## $ postsz: num 128 133 135 143 139 ...
## $ inc : num 14.1 15.1 15.4 17.1 12.6 12.9 15.6 15.1 17.1 13.2 ...
## $ year : int NA NA NA NA NA NA NA NA NA NA ...
## $ lf : int 0 0 0 0 0 0 0 0 0 0 ...
par(mfrow=c(2,1))
hist(crabs$presz)
hist(crabs$postsz)
The next thing to do is to look at a scatter plot.
with(crabs, plot(postsz, presz))
title(main="Postmolt carapace size in millimeters vs Premolt size")
3. Fitting a Simple Linear Regression
# Fit the regression using the function lm()
fit.lm <- lm(presz~postsz, data=crabs)
# Use summary() to inspect the results
summary(fit.lm)
##
## Call:
## lm(formula = presz ~ postsz, data = crabs)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1557 -1.3052 0.0564 1.3174 14.6750
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -25.21370 1.00089 -25.19 <2e-16 ***
## postsz 1.07316 0.00692 155.08 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.199 on 470 degrees of freedom
## Multiple R-squared: 0.9808, Adjusted R-squared: 0.9808
## F-statistic: 2.405e+04 on 1 and 470 DF, p-value: < 2.2e-16
Add the best fit line (regression line) to to the scatter plot:
par(mfrow=c(1,1))
with(crabs, plot(presz~postsz))
abline(-25.2137,1.07316)