Doing Regression Analysis in R

Synopsis

Illustrate simple linear regression using the crabs data from Deborah Nolan’s StatLabs web sight.

1. Load Data

setwd("~/Desktop/R/regression")            # Set wd to directory containing the data
crabs <- read.table("~/Desktop/R/regression/crabs.data.txt",header=T)
head(crabs)
##   presz postsz  inc year lf
## 1 113.6  127.7 14.1   NA  0
## 2 118.1  133.2 15.1   NA  0
## 3 119.9  135.3 15.4   NA  0
## 4 126.2  143.3 17.1   NA  0
## 5 126.7  139.3 12.6   NA  0
## 6 127.3  140.2 12.9   NA  0
str(crabs)
## 'data.frame':    472 obs. of  5 variables:
##  $ presz : num  114 118 120 126 127 ...
##  $ postsz: num  128 133 135 143 139 ...
##  $ inc   : num  14.1 15.1 15.4 17.1 12.6 12.9 15.6 15.1 17.1 13.2 ...
##  $ year  : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ lf    : int  0 0 0 0 0 0 0 0 0 0 ...
  1. Inspect the Data After downloading the data check histograms of columns of interest:
par(mfrow=c(2,1))
hist(crabs$presz)
hist(crabs$postsz)

The next thing to do is to look at a scatter plot.

with(crabs, plot(postsz, presz))
title(main="Postmolt carapace size in millimeters vs Premolt size")

3. Fitting a Simple Linear Regression

# Fit the regression using the function lm()
fit.lm <- lm(presz~postsz, data=crabs)
# Use summary() to inspect the results
summary(fit.lm)
## 
## Call:
## lm(formula = presz ~ postsz, data = crabs)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.1557 -1.3052  0.0564  1.3174 14.6750 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -25.21370    1.00089  -25.19   <2e-16 ***
## postsz        1.07316    0.00692  155.08   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.199 on 470 degrees of freedom
## Multiple R-squared:  0.9808, Adjusted R-squared:  0.9808 
## F-statistic: 2.405e+04 on 1 and 470 DF,  p-value: < 2.2e-16

Add the best fit line (regression line) to to the scatter plot:

par(mfrow=c(1,1))
with(crabs, plot(presz~postsz))
abline(-25.2137,1.07316)