CUNY DATA605 Wk12

Width of Horseshoe Crabs

This dataset comes from a study of female horseshoe crabs on an island in the Gulf of Mexico. I seek to predict the carapace width ‘W’ in (cm) based on a quadratic of the weight in kg, and the spine condition (1, both good; 2, one worn or broken; 3, both worn or broken), as well as an interaction between the spine conditions and weights.

Crabs <- read.table("Crabs.dat", header=T)
attach(Crabs)
head(Crabs)

##   crab y weight width color spine
## 1    1 8   3.05  28.3     2     3
## 2    2 0   1.55  22.5     3     3
## 3    3 9   2.30  26.0     1     1
## 4    4 0   2.10  24.8     3     3
## 5    5 4   2.60  26.0     3     3
## 6    6 0   2.10  23.8     2     3

str(Crabs)

## 'data.frame':    173 obs. of  6 variables:
##  $ crab  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ y     : int  8 0 9 0 4 0 0 0 0 0 ...
##  $ weight: num  3.05 1.55 2.3 2.1 2.6 2.1 2.35 1.9 1.95 2.15 ...
##  $ width : num  28.3 22.5 26 24.8 26 23.8 26.5 24.7 23.7 25.6 ...
##  $ color : int  2 3 1 3 3 2 1 3 2 3 ...
##  $ spine : int  3 3 1 3 3 3 1 2 1 3 ...

mod1 <- lm(width~I(weight^2) + weight*factor(spine), Crabs)
summary(mod1)

## 
## Call:
## lm(formula = width ~ I(weight^2) + weight * factor(spine), data = Crabs)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6999 -0.5675 -0.0188  0.4717  3.4986 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            20.3001     1.3742  14.772   <2e-16 ***
## I(weight^2)             0.1224     0.1452   0.843   0.4007    
## weight                  2.1905     0.9045   2.422   0.0165 *  
## factor(spine)2         -2.2341     1.4360  -1.556   0.1217    
## factor(spine)3         -1.3739     0.8599  -1.598   0.1120    
## weight:factor(spine)2   0.6284     0.6275   1.002   0.3180    
## weight:factor(spine)3   0.5544     0.3329   1.665   0.0977 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9613 on 166 degrees of freedom
## Multiple R-squared:  0.7995, Adjusted R-squared:  0.7923 
## F-statistic: 110.3 on 6 and 166 DF,  p-value: < 2.2e-16

From the model the weight itself is the only explanatory variable with any statistical significance.

cor(width, weight)

## [1] 0.8868715

Indeed there is a strong positive correlation between weight and width of 0.8868715, as would be expected.

The dichotomous spine condition predictors have a negative coefficient, also expected as spine = 2 and =3 refer to poor condition and lesser health.

The quadratic has the highest p-value, and is thus the least significant, as it likely exhibits multicollinearity with weight.

The Multiple R-squared of this HW model is 0.7995, which is slightly better than a univariate model that considers weight only (Multiple R-squared: 0.7865). Nevertheless a parsimonious linear model with only weight is still much more reasonable:

mod2 <- lm(width~weight, Crabs)
summary(mod2)

## 
## Call:
## lm(formula = width ~ weight, data = Crabs)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3954 -0.5817 -0.0370  0.4942  3.8874 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  18.3985     0.3234   56.89   <2e-16 ***
## weight        3.2416     0.1291   25.10   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9773 on 171 degrees of freedom
## Multiple R-squared:  0.7865, Adjusted R-squared:  0.7853 
## F-statistic: 630.1 on 1 and 171 DF,  p-value: < 2.2e-16

The residuals appear good but for three outlying narrow crabs.

hist(resid(mod1))

The residuals indeed seem more or less normal despite the arbitrary nature of this model.

CUNY DATA605 Wk12

Zachary Herold

April 22, 2019

Width of Horseshoe Crabs