Width of Horseshoe Crabs

This dataset comes from a study of female horseshoe crabs on an island in the Gulf of Mexico. I seek to predict the carapace width ‘W’ in (cm) based on a quadratic of the weight in kg, and the spine condition (1, both good; 2, one worn or broken; 3, both worn or broken), as well as an interaction between the spine conditions and weights.

Crabs <- read.table("Crabs.dat", header=T)
attach(Crabs)
head(Crabs)
##   crab y weight width color spine
## 1    1 8   3.05  28.3     2     3
## 2    2 0   1.55  22.5     3     3
## 3    3 9   2.30  26.0     1     1
## 4    4 0   2.10  24.8     3     3
## 5    5 4   2.60  26.0     3     3
## 6    6 0   2.10  23.8     2     3
str(Crabs)
## 'data.frame':    173 obs. of  6 variables:
##  $ crab  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ y     : int  8 0 9 0 4 0 0 0 0 0 ...
##  $ weight: num  3.05 1.55 2.3 2.1 2.6 2.1 2.35 1.9 1.95 2.15 ...
##  $ width : num  28.3 22.5 26 24.8 26 23.8 26.5 24.7 23.7 25.6 ...
##  $ color : int  2 3 1 3 3 2 1 3 2 3 ...
##  $ spine : int  3 3 1 3 3 3 1 2 1 3 ...
mod1 <- lm(width~I(weight^2) + weight*factor(spine), Crabs)
summary(mod1)
## 
## Call:
## lm(formula = width ~ I(weight^2) + weight * factor(spine), data = Crabs)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6999 -0.5675 -0.0188  0.4717  3.4986 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            20.3001     1.3742  14.772   <2e-16 ***
## I(weight^2)             0.1224     0.1452   0.843   0.4007    
## weight                  2.1905     0.9045   2.422   0.0165 *  
## factor(spine)2         -2.2341     1.4360  -1.556   0.1217    
## factor(spine)3         -1.3739     0.8599  -1.598   0.1120    
## weight:factor(spine)2   0.6284     0.6275   1.002   0.3180    
## weight:factor(spine)3   0.5544     0.3329   1.665   0.0977 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9613 on 166 degrees of freedom
## Multiple R-squared:  0.7995, Adjusted R-squared:  0.7923 
## F-statistic: 110.3 on 6 and 166 DF,  p-value: < 2.2e-16

From the model the weight itself is the only explanatory variable with any statistical significance.

cor(width, weight)
## [1] 0.8868715

Indeed there is a strong positive correlation between weight and width of 0.8868715, as would be expected.

The dichotomous spine condition predictors have a negative coefficient, also expected as spine = 2 and =3 refer to poor condition and lesser health.

The quadratic has the highest p-value, and is thus the least significant, as it likely exhibits multicollinearity with weight.

The Multiple R-squared of this HW model is 0.7995, which is slightly better than a univariate model that considers weight only (Multiple R-squared: 0.7865). Nevertheless a parsimonious linear model with only weight is still much more reasonable:

mod2 <- lm(width~weight, Crabs)
summary(mod2)
## 
## Call:
## lm(formula = width ~ weight, data = Crabs)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3954 -0.5817 -0.0370  0.4942  3.8874 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  18.3985     0.3234   56.89   <2e-16 ***
## weight        3.2416     0.1291   25.10   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9773 on 171 degrees of freedom
## Multiple R-squared:  0.7865, Adjusted R-squared:  0.7853 
## F-statistic: 630.1 on 1 and 171 DF,  p-value: < 2.2e-16

The residuals appear good but for three outlying narrow crabs.

hist(resid(mod1))

The residuals indeed seem more or less normal despite the arbitrary nature of this model.