crabdat <-read.csv("http://www.cknudson.com/data/crabs.csv")
attach(crabdat)
library(faraway)
## Warning: package 'faraway' was built under R version 3.6.3
female<- subset(crabdat, y==1)
hist(satell)

The number of satelllite crabs surronding a female crab is skewed right with a mean of around 3

boxplot(satell~color)

The darker the crab the less satellite crabs are surronding it

boxplot(satell~spine)

The crabs with a midddle spine are unlikely to have satellite crabs surronding them and crabs with good back on average attract the most satellite crabs. Intersetingly though crabs with bad spines attract more sattelitle crabs than middle spines and can on occasion atttract more satellite crabs than good backs.

plot(satell,log(width))

There appears to be no real relation between log(width) and satell. There is a slight increase in satell with an increase in log(width) but not by much.

1e. the plots show that poisson regression is appropriate because

1f. Color and Spine appear to be good predictors because their boxplots show different mean levels for each possible response.

colormod<-glm(satell~color,family = 'poisson')
summary(colormod)
## 
## Call:
## glm(formula = satell ~ color, family = "poisson")
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.8577  -2.1106  -0.1649   0.8721   4.7491  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.80078    0.10102   7.927 2.24e-15 ***
## colordarker -0.08516    0.18007  -0.473 0.636279    
## colorlight   0.60614    0.17496   3.464 0.000532 ***
## colormedium  0.39155    0.11575   3.383 0.000718 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 609.14  on 169  degrees of freedom
## AIC: 972.44
## 
## Number of Fisher Scoring iterations: 6
spinemod<-glm(satell~spine,family = 'poisson')
summary(spinemod)
## 
## Call:
## glm(formula = satell ~ spine, family = "poisson")
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7014  -2.3706  -0.5097   1.1252   5.0859  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  1.03316    0.05423  19.051   <2e-16 ***
## spinegood    0.26120    0.10173   2.568   0.0102 *  
## spinemiddle -0.34001    0.19045  -1.785   0.0742 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 621.16  on 170  degrees of freedom
## AIC: 982.46
## 
## Number of Fisher Scoring iterations: 5
widthmod<-glm(crabdat$satell~crabdat$width, family = 'poisson')
summary(widthmod)
## 
## Call:
## glm(formula = crabdat$satell ~ crabdat$width, family = "poisson")
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.8526  -1.9884  -0.4933   1.0970   4.9221  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -3.30476    0.54224  -6.095  1.1e-09 ***
## crabdat$width  0.16405    0.01997   8.216  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 567.88  on 171  degrees of freedom
## AIC: 927.18
## 
## Number of Fisher Scoring iterations: 6

This boxplot of width shows one outlier. The residual plot shows that this model over estimates points with fitted values around 2-4. Our half norm plot shows that entry 56 and 15 are outliers. our dispersion factor shows a little overdispersion.

mod1<-glm(crabdat$satell~crabdat$width+crabdat$color)
mod2<-glm(crabdat$satell~crabdat$width+crabdat$spine)
mod3<-glm(crabdat$satell~crabdat$width+crabdat$weight)
pvals<-c(anova(widthmod,mod1,test="Chisq")[5],+ anova(widthmod,mod2,test="Chisq")[5],+anova(widthmod,mod3,test="Chisq")[5])
finalmod<-glm(satell~weight + width +color, family = poisson,data=crabdat)
mode<-widthmod
pmod<-glm(satell~width+weight,family=poisson,data=crabdat)
teststat<-deviance(widthmod)-deviance(pmod)
pchisq(teststat,df=1,lower.tail=FALSE)
## [1] 0.004734998
halfnorm(residuals(finalmod))

dp2<-sum(residuals(finalmod, type="pearson")^2/finalmod$df.residual)
dp2
## [1] 3.204836