SATELLITE CRABS

crabby <- read.csv("http://www.cknudson.com/data/crabs.csv")
summary(crabby)

##     color              spine               width          satell      
##  Length:173         Length:173         Min.   :21.0   Min.   : 0.000  
##  Class :character   Class :character   1st Qu.:24.9   1st Qu.: 0.000  
##  Mode  :character   Mode  :character   Median :26.1   Median : 2.000  
##                                        Mean   :26.3   Mean   : 2.919  
##                                        3rd Qu.:27.7   3rd Qu.: 5.000  
##                                        Max.   :33.5   Max.   :15.000  
##      weight           y         
##  Min.   :1200   Min.   :0.0000  
##  1st Qu.:2000   1st Qu.:0.0000  
##  Median :2350   Median :1.0000  
##  Mean   :2437   Mean   :0.6416  
##  3rd Qu.:2850   3rd Qu.:1.0000  
##  Max.   :5200   Max.   :1.0000

library(ggplot2)
attach(crabby)
library(remotes)
library(cats)

Question 1:

As you can see from the plot below, the histogram of the number of satellite crabs is right skewed, with the majority of the data contained below 5, looking like a poisson distribution. This means that most female crabs have very few satellite crabs, less that 2.5.

ggplot(data = crabby, aes(satell)) + add_cat() + geom_histogram(bins = 30)

As you can see from the plots below, there isn’t too much variance of satellite crabs based off color. The biggest difference is between the light and darker crabs, with more satellite crabs going to light and less going to darker. Even satellite crabs are colorist.

crabby$color <- factor(crabby$color, levels = c("darker", "dark", "medium", "light"))
ggplot(data = crabby, aes(x = satell, y = color, color = color)) + add_cat() + geom_boxplot(outlier.color = "red", outlier.size = 4)

As you can see from the plots below, satellite crabs prefer female crabs with a good spine. Surprisingly, bad spined female crabs are there next choice, which seems not correct.

crabby$spine <- factor(crabby$spine, levels = c("bad", "middle", "good"))
ggplot(data = crabby, aes(x = satell, y = spine, color = spine), fill = spine) + add_cat() + geom_boxplot(outlier.color = "red", outlier.size = 4)

As you can see below, it appears that there is a weak connection between satellite crabs choosing wider female crabs. It also looks like there is heteroscedasticity, because the farther you go out, the more spread apart the points are. This points to poisson.

ggplot(data = crabby, aes(x = satell, y = width, color = width)) + add_cat() + geom_point(outlier.color = "red", outlier.size = 4)

## Warning: Ignoring unknown parameters: outlier.colour, outlier.size

Our first indicator of poisson regression is that the data we are dealing with uses counts. Seen in part a, the data is right skewed has the general shape of a low lambda poisson distribution. Another argument for poission is from the heteroscedasticity seen in the plot in part d. Our final indicator is that the homework is called “Poisson Regression”… low hanging fruit, come on. Hahaha.
Based on our graphs, we think that color will have the strongest relationship with the number of satellite crabs. Then we think width will be a good predictor somehow because of the variance relationship.

Question 2:

From our summary of our model, we can see that the color “darker” adds the least to the number of satellite crabs, and it increases up to “light.” The term for “dark” appears to not be very significant either since it has a low p-value. Maybe satellite crabs color discernment is not great between dark and darker. Perhaps they could be grouped together. There is a very large deviance, so it looks like we might have to change the dispersion parameter.

mod1 <- glm(satell ~ factor(color), data = crabby, family = "poisson")
summary(mod1)

## 
## Call:
## glm(formula = satell ~ factor(color), family = "poisson", data = crabby)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.8577  -2.1106  -0.1649   0.8721   4.7491  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          0.71562    0.14907   4.801 1.58e-06 ***
## factor(color)dark    0.08516    0.18007   0.473 0.636279    
## factor(color)medium  0.47671    0.15943   2.990 0.002789 ** 
## factor(color)light   0.69129    0.20647   3.348 0.000814 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 609.14  on 169  degrees of freedom
## AIC: 972.44
## 
## Number of Fisher Scoring iterations: 6

Surprisingly, satellite crabs least like the crabs with middle decent spines since it has a negative coefficient, and they like the good spines the best, unsurprisingly. Both variables are fairly significant, but not super strong. There is a very large deviance, so it looks like we might have to change the dispersion parameter.

mod2 <- glm(satell ~ factor(spine), data = crabby, family = "poisson")
summary(mod2)

## 
## Call:
## glm(formula = satell ~ factor(spine), family = "poisson", data = crabby)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7014  -2.3706  -0.5097   1.1252   5.0859  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)          1.03316    0.05423  19.051   <2e-16 ***
## factor(spine)middle -0.34001    0.19045  -1.785   0.0742 .  
## factor(spine)good    0.26120    0.10173   2.568   0.0102 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 621.16  on 170  degrees of freedom
## AIC: 982.46
## 
## Number of Fisher Scoring iterations: 5

Width is very significant to predicting number of satellite crabs. The bigger the width, the more satellite crabs there are since it has a positive coefficient. There is a very large deviance, so it looks like we might have to change the dispersion parameter.

mod3 <- glm(satell ~ width, data = crabby, family = "poisson")
summary(mod3)

## 
## Call:
## glm(formula = satell ~ width, family = "poisson", data = crabby)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.8526  -1.9884  -0.4933   1.0970   4.9221  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -3.30476    0.54224  -6.095  1.1e-09 ***
## width        0.16405    0.01997   8.216  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 567.88  on 171  degrees of freedom
## AIC: 927.18
## 
## Number of Fisher Scoring iterations: 6

Question 3:

If we were to choose models based on AIC, the model with the width only would be substantially the best, while the model with the spines would be the worst. We will have to see how this changes when we alter the dispersion parameter. We were correct that the model with the spine would be the worst, but we thought color would be the best. This might be because the AIC for color is penalized by having more variables since color is a factor.

Question 4:

As you can see below, the residuals for our model have a pattern, which is not good. There also are a few outliers at 149, 15, and 56 which could be having a bad effect. Also, the qqnorm plot is not great at all :/ Our dispersion factor is only about 3.18, so that is not that bad. This makes sense since the variance increases as the numbers of crabs increases, so variance may match the mean. The relationship looks fairly linear, which isn’t too bad.

plot(mod3)

sum(residuals(mod3, type = "pearson")^2/mod3$df.residual)

## [1] 3.182205

xvar <- 0:18
ypreds <- -3.30476 + .16405*xvar
with(crabby, (plot(satell, width)))

## NULL

lines(xvar, ypreds)

Question 5:

This first part finds the best model using forward selection. It chooses the model with width and color.

library(MASS)
mod0 <- glm(satell~1, data = crabby, family = "poisson")
mod4 <- stepAIC(mod3, direction = "forward", scope = list(upper = ~width + factor(spine) + factor(color), lower = ~1))

## Start:  AIC=927.18
## satell ~ width
## 
##                 Df Deviance    AIC
## + factor(color)  3   559.34 924.64
## <none>               567.88 927.18
## + factor(spine)  2   566.60 929.90
## 
## Step:  AIC=924.64
## satell ~ width + factor(color)
## 
##                 Df Deviance    AIC
## <none>               559.34 924.64
## + factor(spine)  2   558.63 927.93

summary(mod4)

## 
## Call:
## glm(formula = satell ~ width + factor(color), family = "poisson", 
##     data = crabby)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.0415  -1.9581  -0.5575   0.9830   4.7523  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -3.09740    0.55755  -5.555 2.77e-08 ***
## width                0.14934    0.02084   7.166 7.73e-13 ***
## factor(color)dark    0.01100    0.18041   0.061   0.9514    
## factor(color)medium  0.24767    0.16316   1.518   0.1290    
## factor(color)light   0.44736    0.20912   2.139   0.0324 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 559.34  on 168  degrees of freedom
## AIC: 924.64
## 
## Number of Fisher Scoring iterations: 6

Below we do the drop in deviance test with a model adding the variable weight and we find the pvalue to be .8279, so it is not worth it to add weight

mod5 <- glm(satell~width + factor(color) + weight, data = crabby, family = "poisson")
summary(mod5)

## 
## Call:
## glm(formula = satell ~ width + factor(color) + weight, family = "poisson", 
##     data = crabby)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9806  -1.8987  -0.5392   0.9015   4.8274  
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)   
## (Intercept)         -1.0787808  0.9106215  -1.185  0.23615   
## width                0.0309119  0.0473407   0.653  0.51378   
## factor(color)dark   -0.0001622  0.1805138  -0.001  0.99928   
## factor(color)medium  0.2375227  0.1635298   1.452  0.14637   
## factor(color)light   0.4420867  0.2090142   2.115  0.03442 * 
## weight               0.0004516  0.0001602   2.818  0.00483 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 551.38  on 167  degrees of freedom
## AIC: 918.68
## 
## Number of Fisher Scoring iterations: 6

teststat <- (559.34 - 551.38)/(nrow(crabby)- 5)
pf(teststat, df1 = 1, df2 = (nrow(crabby)- 5), lower.tail = FALSE)

## [1] 0.8279491

Below we do the drop in deviance test with a model adding the variable spine and we find the pvalue to be .9481, so it is not worth it to add spine

mod6 <- glm(satell~width + factor(color) + factor(spine), data = crabby, family = "poisson")
summary(mod6)

## 
## Call:
## glm(formula = satell ~ width + factor(color) + factor(spine), 
##     family = "poisson", data = crabby)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.0591  -1.9468  -0.4864   0.9620   4.7608  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -3.00566    0.58360  -5.150 2.60e-07 ***
## width                0.14596    0.02189   6.669 2.58e-11 ***
## factor(color)dark    0.02508    0.18106   0.139   0.8898    
## factor(color)medium  0.26386    0.16513   1.598   0.1101    
## factor(color)light   0.48544    0.22824   2.127   0.0334 *  
## factor(spine)middle -0.16241    0.19655  -0.826   0.4086    
## factor(spine)good   -0.02363    0.11729  -0.201   0.8403    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 558.63  on 166  degrees of freedom
## AIC: 927.93
## 
## Number of Fisher Scoring iterations: 6

teststat <- (559.34 - 558.63)/(nrow(crabby)- 6)
pf(teststat, df1 = 1, df2 = (nrow(crabby)- 6), lower.tail = FALSE)

## [1] 0.94809

After doing the drop-in deviance test, we stick with our initial model from forward selection. Our regression equation is: sattel = -3.0864 + .14934(width) - .011(darker) + .43636(light) + .23668(medium) This equation tells us that if the female crab is dark and has a width of 0 (which is impossible), it would attract -3.0864 sattelite crabs. Obviously our width will be much higher than zero as a minimum, so it will never be negative. An increase of one centimeter of width a female crab is, about an additional .15 crabs are attracted, holding all else constant. Additionally, if a crab is darker, the log mean number of crabs decreases by .011. If the crab is medium, the log mean number of satellite crabs increases by .23668. If a crab is light, this increases the log mean number of satellite crabs the most (excusing width) by .43636. Yes, satellite crabs are racist and this is not okay >:(

Question 6:

As we can see from the plot below, our residuals are not in a line, but isn’t what we look for. Unfortunately, our biggest residuals do not follow the line. Before we had outliers, 149, 15, and 56, but now 56 follows the trend, so this is an improvement. But still, not great.

library(faraway)
halfnorm(residuals(mod4))

Question 7:

The dispersion parameter is calculated below and is found to be 3.233625

sum(residuals(mod4, type = "pearson")^2/mod4$df.residual)

## [1] 3.233625

–ELEPHANT MATING–

sexyelephants <- read.csv("http://www.cknudson.com/data/elephant.csv")
summary(sexyelephants)

##       AGE           MATINGS     
##  Min.   :27.00   Min.   :0.000  
##  1st Qu.:29.00   1st Qu.:1.000  
##  Median :34.00   Median :2.000  
##  Mean   :35.85   Mean   :2.683  
##  3rd Qu.:42.00   3rd Qu.:3.000  
##  Max.   :52.00   Max.   :9.000

attach(sexyelephants)

Part a:

From looking at the histogram below, there is a potential that this could be modeled as a Poisson response. It uses a count, is right skewed, and the peak of the histogram is at approximately the mean. The variance is about double the mean, which is not great evidence, but mayeb everything else outweighs it.

ggplot(sexyelephants, aes(x = MATINGS, fill = "red")) + add_cat() + geom_histogram(bins = 30)

mean(sexyelephants$MATINGS)

## [1] 2.682927

var(sexyelephants$MATINGS)

## [1] 5.071951

Part b:

As we can see by our plots below, it doesn’t look like linear regression will be the way to go. There is heteroscedasticity in out plot, and the variance increases and AGE increases, which points to poisson. Additionally, our residual plots follow a smiley shape, which points to poisson with a quadratic factor.

mod <- lm(MATINGS ~ AGE)
plot(AGE, MATINGS)
abline(coefficients(mod), col = "red", lwd = 2, lty = 2)

plot(mod)

Part c:

This plot has a more even variance throughout opposed to our previous plot. We do not see any evidence of a quadratic trend, it seems to be fairly linear.

arrmeans <- c()
ages <- unique(AGE)
for(age in ages)
{
  arrmeans <- c(arrmeans, mean(subset(sexyelephants, subset = AGE == age)$MATINGS))
}
logmeans <- log(arrmeans)
plot(ages, logmeans)

Part d:

After we exponentiate, we can interpret that for each unit increase of one year in AGE, the mean number of MATINGS increases by 1.071104. This makes sense, because of course older elephants have mated more often, but we would expect this to level off and head to zero the older the elephant gets.

mod1 <- glm(MATINGS ~ AGE, data = sexyelephants, family = "poisson")
summary(mod1)

## 
## Call:
## glm(formula = MATINGS ~ AGE, family = "poisson", data = sexyelephants)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.80798  -0.86137  -0.08629   0.60087   2.17777  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.58201    0.54462  -2.905  0.00368 ** 
## AGE          0.06869    0.01375   4.997 5.81e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 75.372  on 40  degrees of freedom
## Residual deviance: 51.012  on 39  degrees of freedom
## AIC: 156.46
## 
## Number of Fisher Scoring iterations: 5

exp(.06869)

## [1] 1.071104

Part e:

Our confidence interval is pretty tight, ranging from about 1.04 to 1.1 for our coefficient prediction of AGE.

exp(.06869 + c(-1, 1)*.01375*1.96)

## [1] 1.042623 1.100363

Part f:

From looking at our summary of the model, the p-value of AGE is very statistically significant at 5.81e-07. This would indicate matings are related to age.
Conducting the drop in deviance test finds a pvalue of 7.99e-07 which is very statistically significant. This indicates that matings are significantly related to age.

summary(mod1)

## 
## Call:
## glm(formula = MATINGS ~ AGE, family = "poisson", data = sexyelephants)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.80798  -0.86137  -0.08629   0.60087   2.17777  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.58201    0.54462  -2.905  0.00368 ** 
## AGE          0.06869    0.01375   4.997 5.81e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 75.372  on 40  degrees of freedom
## Residual deviance: 51.012  on 39  degrees of freedom
## AIC: 156.46
## 
## Number of Fisher Scoring iterations: 5

mod0 <- glm(MATINGS ~ 1, data = sexyelephants, family = "poisson")
summary(mod0)

## 
## Call:
## glm(formula = MATINGS ~ 1, family = "poisson", data = sexyelephants)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3164  -1.1799  -0.4368   0.1899   3.0252  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.98691    0.09535   10.35   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 75.372  on 40  degrees of freedom
## Residual deviance: 75.372  on 40  degrees of freedom
## AIC: 178.82
## 
## Number of Fisher Scoring iterations: 5

teststat <- 75.372 - 51.012
pchisq(teststat, 1, lower.tail = FALSE)

## [1] 7.991084e-07

Part g:

It does not appear that a quadratic model is preffered over a linear model. Let’s run some tests. i) From the Wald test, we find the pvalue for our quadratic term to be .669 (nice), which means it is NOT significant all. ii) The drop in deviance test finds a pvalue of .666 (nice), which indicates that the quadratic term pretty much sucks.

agesq <- AGE^2
mod2 <- glm(MATINGS ~ AGE + agesq, data = sexyelephants, family = "poisson")
summary(mod2)

## 
## Call:
## glm(formula = MATINGS ~ AGE + agesq, family = "poisson", data = sexyelephants)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.8470  -0.8848  -0.1122   0.6580   2.1134  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.8574060  3.0356383  -0.941    0.347
## AGE          0.1359544  0.1580095   0.860    0.390
## agesq       -0.0008595  0.0020124  -0.427    0.669
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 75.372  on 40  degrees of freedom
## Residual deviance: 50.826  on 38  degrees of freedom
## AIC: 158.27
## 
## Number of Fisher Scoring iterations: 5

teststat <- 51.012 - 50.826
pchisq(teststat, 1, lower.tail = FALSE)

## [1] 0.6662668

Part h:

Honestly, this model with age as the only predictor seems pretty good. It has a low pvalue and a decent AIC, comparatively.

We find our dispersion parameter below to be about 1.157334

dp <- sum(residuals(mod1, type = "pearson")^2/mod1$df.residual)

Part f dispersion:

From looking at our summary of the model, the p-value of AGE is very statistically significant at 3.4e-06. The dispersion parameter did cause the pvalue to increase, but not by much
Conducting the drop in deviance test finds a pvalue of 7.99e-07 which is very statistically significant. The dispersion parameter didn’t cause much of a difference. This indicates that matings are significantly related to age.

mod3 <- glm(MATINGS ~ AGE, data = sexyelephants, family = "poisson")
mod4 <- glm(MATINGS ~ 1, data = sexyelephants, family = "poisson")
summary(mod3, dispersion = dp)

## 
## Call:
## glm(formula = MATINGS ~ AGE, family = "poisson", data = sexyelephants)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.80798  -0.86137  -0.08629   0.60087   2.17777  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.58201    0.58590  -2.700  0.00693 ** 
## AGE          0.06869    0.01479   4.645  3.4e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1.157334)
## 
##     Null deviance: 75.372  on 40  degrees of freedom
## Residual deviance: 51.012  on 39  degrees of freedom
## AIC: 156.46
## 
## Number of Fisher Scoring iterations: 5

summary(mod4, dispersion = dp)

## 
## Call:
## glm(formula = MATINGS ~ 1, family = "poisson", data = sexyelephants)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3164  -1.1799  -0.4368   0.1899   3.0252  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   0.9869     0.1026   9.622   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1.157334)
## 
##     Null deviance: 75.372  on 40  degrees of freedom
## Residual deviance: 75.372  on 40  degrees of freedom
## AIC: 178.82
## 
## Number of Fisher Scoring iterations: 5

teststat <- 75.372 - 51.012
pchisq(teststat, 1, lower.tail = FALSE)

## [1] 7.991084e-07

Part g dispersion:

It does not appear that a quadratic model is preffered over a linear model. Let’s run some tests. i) From the Wald test, we find the pvalue for our quadratic term to be .691 (nice), which means it is NOT significant all. It increased from before, but not by much. ii) The drop in deviance test finds a pvalue of .666 (nice), which indicates that the quadratic term pretty much sucks. The dispersion parameter caused very little change.

agesq <- AGE^2
mod5 <- glm(MATINGS ~ AGE + agesq, data = sexyelephants, family = "poisson")
summary(mod5, dispersion = dp)

## 
## Call:
## glm(formula = MATINGS ~ AGE + agesq, family = "poisson", data = sexyelephants)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.8470  -0.8848  -0.1122   0.6580   2.1134  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.8574060  3.2657237  -0.875    0.382
## AGE          0.1359544  0.1699858   0.800    0.424
## agesq       -0.0008595  0.0021649  -0.397    0.691
## 
## (Dispersion parameter for poisson family taken to be 1.157334)
## 
##     Null deviance: 75.372  on 40  degrees of freedom
## Residual deviance: 50.826  on 38  degrees of freedom
## AIC: 158.27
## 
## Number of Fisher Scoring iterations: 5

teststat <- 51.012 - 50.826
pchisq(teststat, 1, lower.tail = FALSE)

## [1] 0.6662668

Part h dispersion:

The dispersion parameter was small and caused pretty much no difference. The model with age as the only predictor seems pretty good. It has a low pvalue and a decent AIC, comparatively.

Poisson Regression Homework

StatsWithSarah

2/25/2021

SATELLITE CRABS

–ELEPHANT MATING–