HW05_OCN683.knit

Sarah Pennington

Imagine that we are interested in the habitat characteristics that favor Ctenochaetus strigosus, known as kole in Hawaiian, and with the English common name spotted surgeonfish in this dataset. We will focus on the role of different types of benthic substrate, and the most common substrate types in this dataset are turf algae (column ‘ta’), hard coral (column ‘hard_coral’), sand (column ‘sand’), and crustose coralline algae (column ‘cca’). I have included a dataset that only includes this species, and only the sampling period during which turf algae percent cover was measured.

(1) Start by exploring the distributions of the four predictor variables, and how they are correlated with one another. The function ggpairs() in the package GGally is particularly nice for this.

crcp <- read_csv("CRCP_Reef_Fish_Surveys_Hawaii_kole.csv")

## Rows: 310 Columns: 47
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (19): taxonname, region_name, island, site, reef_zone, depth_bin, rep, ...
## dbl  (25): replicateid, count, latitude, longitude, sitevisitid, obs_year, d...
## lgl   (1): other_type
## dttm  (2): time, date_
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

names(crcp)

##  [1] "taxonname"           "replicateid"         "count"              
##  [4] "time"                "latitude"            "longitude"          
##  [7] "region_name"         "island"              "site"               
## [10] "reef_zone"           "depth_bin"           "sitevisitid"        
## [13] "date_"               "obs_year"            "rep"                
## [16] "depth"               "hard_coral"          "soft_coral"         
## [19] "ma"                  "cca"                 "ta"                 
## [22] "sand"                "tunicate"            "zoanthid"           
## [25] "corallimorph"        "clam"                "cyano"              
## [28] "sponge"              "other"               "other_type"         
## [31] "habitat_code"        "habitat_type"        "current_strength"   
## [34] "visibility"          "max_height"          "urchin_dacor"       
## [37] "boring_urchin_dacor" "min_depth"           "max_depth"          
## [40] "objectid"            "species"             "commonname"         
## [43] "commonfamilyall"     "family"              "scientific_name"    
## [46] "rank"                "trophic"

#turf algae (column ‘ta’),
#hard coral (column ‘hard_coral’)
#sand (column ‘sand’)
#crustose coralline algae (column ‘cca’)

library(GGally)

## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

ggpairs(crcp[, c("ta", "hard_coral", "sand", "cca")])

(2) For each of the four predictors, make single-predictor models where the response variable is kole counts. Consider which of the predictors could be transformed to reduce skew along the x-axis. For educational purposes, start by using a poisson distribution for the counts.

library(ggResidpanel)

#make single-predictor models where the response variable is kole counts.

#ta
tamodel <- glm(count~ ta, family= poisson, data = crcp)
summary(tamodel)

## 
## Call:
## glm(formula = count ~ ta, family = poisson, data = crcp)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.5784202  0.0352931  101.39   <2e-16 ***
## ta          -0.0177069  0.0006581  -26.91   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 7573.1  on 309  degrees of freedom
## Residual deviance: 6862.4  on 308  degrees of freedom
## AIC: 7783.4
## 
## Number of Fisher Scoring iterations: 6

Anova(tamodel)

## Analysis of Deviance Table (Type II tests)
## 
## Response: count
##    LR Chisq Df Pr(>Chisq)    
## ta   710.68  1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

resid_xpanel(tamodel, type = "deviance")

#hard_coral
hardmodel <- glm(count~ hard_coral, family= poisson, data = crcp)
summary(hardmodel)

## 
## Call:
## glm(formula = count ~ hard_coral, family = poisson, data = crcp)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 2.0379324  0.0257021   79.29   <2e-16 ***
## hard_coral  0.0243714  0.0006953   35.05   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 7573.1  on 309  degrees of freedom
## Residual deviance: 6481.7  on 308  degrees of freedom
## AIC: 7402.7
## 
## Number of Fisher Scoring iterations: 6

Anova(hardmodel)

## Analysis of Deviance Table (Type II tests)
## 
## Response: count
##            LR Chisq Df Pr(>Chisq)    
## hard_coral   1091.4  1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

resid_xpanel(hardmodel, type = "deviance")

#sand
sandmodel <- glm(count~ sand, family= poisson, data = crcp)
summary(sandmodel)

## 
## Call:
## glm(formula = count ~ sand, family = poisson, data = crcp)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  2.946727   0.018964  155.39   <2e-16 ***
## sand        -0.035770   0.001685  -21.22   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 7573.1  on 309  degrees of freedom
## Residual deviance: 6998.7  on 308  degrees of freedom
## AIC: 7919.6
## 
## Number of Fisher Scoring iterations: 6

Anova(sandmodel)

## Analysis of Deviance Table (Type II tests)
## 
## Response: count
##      LR Chisq Df Pr(>Chisq)    
## sand   574.42  1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

resid_xpanel(sandmodel, type = "deviance") ##Left skew

sandmodel <- glm(count~ sqrt(sand), family= poisson, data = crcp)
summary(sandmodel)

## 
## Call:
## glm(formula = count ~ sqrt(sand), family = poisson, data = crcp)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.101758   0.021722   142.8   <2e-16 ***
## sqrt(sand)  -0.207975   0.008384   -24.8   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 7573.1  on 309  degrees of freedom
## Residual deviance: 6904.5  on 308  degrees of freedom
## AIC: 7825.4
## 
## Number of Fisher Scoring iterations: 6

Anova(sandmodel)

## Analysis of Deviance Table (Type II tests)
## 
## Response: count
##            LR Chisq Df Pr(>Chisq)    
## sqrt(sand)   668.66  1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

resid_xpanel(sandmodel, type = "deviance")

#cca
ccamodel <- glm(count~ cca, family= poisson, data = crcp)
summary(ccamodel)

## 
## Call:
## glm(formula = count ~ cca, family = poisson, data = crcp)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 2.270516   0.021449  105.85   <2e-16 ***
## cca         0.034430   0.001153   29.85   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 7573.1  on 309  degrees of freedom
## Residual deviance: 6846.7  on 308  degrees of freedom
## AIC: 7767.6
## 
## Number of Fisher Scoring iterations: 6

Anova(ccamodel)

## Analysis of Deviance Table (Type II tests)
## 
## Response: count
##     LR Chisq Df Pr(>Chisq)    
## cca   726.46  1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

resid_xpanel(ccamodel, type = "deviance") ###This model has left skewed deviance residuals

ccamodel <- glm(count~ sqrt(cca), family= poisson, data = crcp)
summary(ccamodel)

## 
## Call:
## glm(formula = count ~ sqrt(cca), family = poisson, data = crcp)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 1.762848   0.032938   53.52   <2e-16 ***
## sqrt(cca)   0.302834   0.008778   34.50   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 7573.1  on 309  degrees of freedom
## Residual deviance: 6426.7  on 308  degrees of freedom
## AIC: 7347.6
## 
## Number of Fisher Scoring iterations: 6

Anova(ccamodel)

## Analysis of Deviance Table (Type II tests)
## 
## Response: count
##           LR Chisq Df Pr(>Chisq)    
## sqrt(cca)   1146.4  1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

resid_xpanel(ccamodel, type = "deviance")

?resid_panel

## starting httpd help server ... done

Quantify the degree of overdispersion in these models, and describe what it means.

#ta
dispersionta <- sum(residuals(tamodel, type = "pearson")^2)/(length(tamodel$y)- length(tamodel$coefficients))
print(dispersionta)

## [1] 30.10215

    ##Ideally, overdispersion would be <1. This means that there is extra variance in the model, and we could end up with type 1 error if we don't correct for overdispersion.


#hard_coral
dispersionhard <- sum(residuals(hardmodel, type = "pearson")^2)/(length(hardmodel$y)- length(hardmodel$coefficients))
print(dispersionhard)

## [1] 28.13073

### Like the model with predictor variable ta, the data has high overdispersion that, left uncorrected, would result in type 1 error. 


#sand 
dispersionsand <- sum(residuals(sandmodel, type = "pearson")^2)/(length(sandmodel$y)- length(sandmodel$coefficients))
print(dispersionsand)

## [1] 30.64876

### Like the previous two models, this has high overdispersion. 

#cca
dispersioncca <- sum(residuals(ccamodel, type = "pearson")^2)/(length(ccamodel$y)- length(ccamodel$coefficients))
print(dispersioncca)

## [1] 29.15104

### Like the previous three models, this has high overdispersion.

Based on likelihood ratio tests and effects plots, what are the relationships between kole and the four substrate types?

#Based on likelihood ratio tests and effects plots, what are the relationships between kole and the four substrate types? 

reduced_model <- glm(count~ 1, family= poisson, data = crcp)
summary(reduced_model)

## 
## Call:
## glm(formula = count ~ 1, family = poisson, data = crcp)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  2.63929    0.01518   173.9   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 7573.1  on 309  degrees of freedom
## Residual deviance: 7573.1  on 309  degrees of freedom
## AIC: 8492.1
## 
## Number of Fisher Scoring iterations: 6

#ta 
anova(reduced_model, tamodel, test ="LRT")

## Analysis of Deviance Table
## 
## Model 1: count ~ 1
## Model 2: count ~ ta
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1       309     7573.1                          
## 2       308     6862.4  1   710.68 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#hard_coral
anova(reduced_model, hardmodel, test ="LRT")

## Analysis of Deviance Table
## 
## Model 1: count ~ 1
## Model 2: count ~ hard_coral
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1       309     7573.1                          
## 2       308     6481.7  1   1091.4 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#sand 
anova(reduced_model, sandmodel, test ="LRT")

## Analysis of Deviance Table
## 
## Model 1: count ~ 1
## Model 2: count ~ sqrt(sand)
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1       309     7573.1                          
## 2       308     6904.5  1   668.66 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#cca
anova(reduced_model, ccamodel, test ="LRT")

## Analysis of Deviance Table
## 
## Model 1: count ~ 1
## Model 2: count ~ sqrt(cca)
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1       309     7573.1                          
## 2       308     6426.7  1   1146.4 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(ggeffect(tamodel)) #kole decline as turf algae increases

plot(ggeffect(hardmodel)) #kole increase as hard coral increases

plot(ggeffect(sandmodel)) #kole decline as sand increases

plot(ggeffect(ccamodel)) #kole increase as crustose coralline algae increases

##### (3) Now make new single-predictor models that account for overdispersion (we discussed two different ways to do this in lecture, you can do one or both).

#ta
tamodel_quassi <- glm(count~ ta, family= quasipoisson, data = crcp)
summary(tamodel_quassi)

## 
## Call:
## glm(formula = count ~ ta, family = quasipoisson, data = crcp)
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.578420   0.193637  18.480  < 2e-16 ***
## ta          -0.017707   0.003611  -4.904 1.52e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 30.1022)
## 
##     Null deviance: 7573.1  on 309  degrees of freedom
## Residual deviance: 6862.4  on 308  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 6

Anova(tamodel_quassi, test= 'F') # p= 1.883e-06

## Analysis of Deviance Table (Type II tests)
## 
## Response: count
## Error estimate based on Pearson residuals 
## 
##           Sum Sq  Df F value    Pr(>F)    
## ta         710.7   1  23.609 1.883e-06 ***
## Residuals 9271.5 308                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

tamodel_negbin <- glm.nb(count~ ta, data = crcp)
summary(tamodel_negbin)

## 
## Call:
## glm.nb(formula = count ~ ta, data = crcp, init.theta = 0.3750484251, 
##     link = log)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.64697    0.26206   13.92  < 2e-16 ***
## ta          -0.01900    0.00427   -4.45 8.57e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.375) family taken to be 1)
## 
##     Null deviance: 365.50  on 309  degrees of freedom
## Residual deviance: 346.15  on 308  degrees of freedom
## AIC: 2085.5
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.3750 
##           Std. Err.:  0.0325 
## 
##  2 x log-likelihood:  -2079.4990

Anova(tamodel_negbin) # p= 1.093e-05

## Analysis of Deviance Table (Type II tests)
## 
## Response: count
##    LR Chisq Df Pr(>Chisq)    
## ta   19.342  1  1.093e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#hard_coral 
hardmodel_quassi <- glm(count~ hard_coral, family= quasipoisson, data = crcp)
summary(hardmodel_quassi)

## 
## Call:
## glm(formula = count ~ hard_coral, family = quasipoisson, data = crcp)
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.037932   0.136320  14.950  < 2e-16 ***
## hard_coral  0.024371   0.003688   6.609  1.7e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 28.13075)
## 
##     Null deviance: 7573.1  on 309  degrees of freedom
## Residual deviance: 6481.7  on 308  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 6

Anova(hardmodel_quassi, test= 'F') # p= 1.541e-09

## Analysis of Deviance Table (Type II tests)
## 
## Response: count
## Error estimate based on Pearson residuals 
## 
##            Sum Sq  Df F value    Pr(>F)    
## hard_coral 1091.4   1  38.797 1.541e-09 ***
## Residuals  8664.3 308                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

hardmodel_negbin <- glm.nb(count~ hard_coral, data = crcp)
summary(hardmodel_negbin)

## 
## Call:
## glm.nb(formula = count ~ hard_coral, data = crcp, init.theta = 0.3931303705, 
##     link = log)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 1.858032   0.140369  13.237  < 2e-16 ***
## hard_coral  0.031720   0.005176   6.129 8.86e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.3931) family taken to be 1)
## 
##     Null deviance: 379.65  on 309  degrees of freedom
## Residual deviance: 346.22  on 308  degrees of freedom
## AIC: 2072.6
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.3931 
##           Std. Err.:  0.0344 
## 
##  2 x log-likelihood:  -2066.6070

Anova(hardmodel_negbin) # p= 7.363e-09

## Analysis of Deviance Table (Type II tests)
## 
## Response: count
##            LR Chisq Df Pr(>Chisq)    
## hard_coral   33.437  1  7.363e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#sand 
sandmodel_quassi <- glm(count~ sand, family= quasipoisson, data = crcp)
summary(sandmodel_quassi)

## 
## Call:
## glm(formula = count ~ sand, family = quasipoisson, data = crcp)
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.946727   0.105546  27.919  < 2e-16 ***
## sand        -0.035770   0.009381  -3.813 0.000166 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 30.97734)
## 
##     Null deviance: 7573.1  on 309  degrees of freedom
## Residual deviance: 6998.7  on 308  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 6

Anova(sandmodel_quassi, test= 'F') # p= 2.235e-05

## Analysis of Deviance Table (Type II tests)
## 
## Response: count
## Error estimate based on Pearson residuals 
## 
##           Sum Sq  Df F value    Pr(>F)    
## sand       574.4   1  18.543 2.235e-05 ***
## Residuals 9540.9 308                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

sandmodel_negbin <- glm.nb(count~ sand, data = crcp)
summary(sandmodel_negbin)

## 
## Call:
## glm.nb(formula = count ~ sand, data = crcp, init.theta = 0.3729581071, 
##     link = log)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  2.941783   0.124216   23.68  < 2e-16 ***
## sand        -0.035008   0.007544   -4.64 3.48e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.373) family taken to be 1)
## 
##     Null deviance: 363.85  on 309  degrees of freedom
## Residual deviance: 345.96  on 308  degrees of freedom
## AIC: 2086.9
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.3730 
##           Std. Err.:  0.0323 
## 
##  2 x log-likelihood:  -2080.8530

Anova(sandmodel_negbin) # p= 2.343e-05

## Analysis of Deviance Table (Type II tests)
## 
## Response: count
##      LR Chisq Df Pr(>Chisq)    
## sand   17.888  1  2.343e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#cca 
ccamodel_quassi <- glm(count~ cca, family= quasipoisson, data = crcp)
summary(ccamodel_quassi)

## 
## Call:
## glm(formula = count ~ cca, family = quasipoisson, data = crcp)
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.270516   0.119096  19.065  < 2e-16 ***
## cca         0.034430   0.006404   5.376  1.5e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for quasipoisson family taken to be 30.82933)
## 
##     Null deviance: 7573.1  on 309  degrees of freedom
## Residual deviance: 6846.7  on 308  degrees of freedom
## AIC: NA
## 
## Number of Fisher Scoring iterations: 6

Anova(ccamodel_quassi, test= 'F') # p= 1.924e-06

## Analysis of Deviance Table (Type II tests)
## 
## Response: count
## Error estimate based on Pearson residuals 
## 
##           Sum Sq  Df F value    Pr(>F)    
## cca        726.5   1  23.564 1.924e-06 ***
## Residuals 9495.4 308                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ccamodel_negbin <- glm.nb(count~ cca, data = crcp)
summary(ccamodel_negbin)

## 
## Call:
## glm.nb(formula = count ~ cca, data = crcp, init.theta = 0.3839858353, 
##     link = log)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) 1.999507   0.126584  15.796  < 2e-16 ***
## cca         0.060098   0.009659   6.222 4.91e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.384) family taken to be 1)
## 
##     Null deviance: 372.51  on 309  degrees of freedom
## Residual deviance: 345.99  on 308  degrees of freedom
## AIC: 2078.8
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.3840 
##           Std. Err.:  0.0334 
## 
##  2 x log-likelihood:  -2072.8340

Anova(ccamodel_negbin) # p= 2.597e-07

## Analysis of Deviance Table (Type II tests)
## 
## Response: count
##     LR Chisq Df Pr(>Chisq)    
## cca   26.529  1  2.597e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

###PLOTS### 

plot(ggeffect(tamodel_quassi))

plot(ggeffect(tamodel_negbin))

plot(ggeffect(hardmodel_quassi))

plot(ggeffect(hardmodel_negbin))

plot(ggeffect(sandmodel_quassi))

plot(ggeffect(sandmodel_negbin))

plot(ggeffect(ccamodel_quassi))

plot(ggeffect(ccamodel_negbin))

How have the results changed compared to #2, and why?

#All of the single-predictor models that didn't account for overdispersion showed each benthic substrate had a highly significant effect on kole count. All of the single predictor models that did account for overdispersion, using either a quassipoission or negative binomial model, also showed each benthic substrate had a highly significant effect on kole counts. The main difference was the p-values were lower in the models that account for overdispersion. Additionally, the effects plots accounting for overdispersion showed increased variability in confidence intervals as each benthic substrate increased.

(4) Although the single-predictor models are useful, we may get a more accurate picture of the role of each substrate type if we include all the substrate types in one model. Create such a model.

model <- glm(count~ ta + hard_coral + sand + cca, family= poisson, data = crcp)
summary(model)

## 
## Call:
## glm(formula = count ~ ta + hard_coral + sand + cca, family = poisson, 
##     data = crcp)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.853031   0.360774  -2.364   0.0181 *  
## ta           0.029757   0.003686   8.074  6.8e-16 ***
## hard_coral   0.048936   0.003659  13.373  < 2e-16 ***
## sand         0.010145   0.003997   2.538   0.0111 *  
## cca          0.055212   0.003973  13.897  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 7573.1  on 309  degrees of freedom
## Residual deviance: 5817.6  on 305  degrees of freedom
## AIC: 6744.5
## 
## Number of Fisher Scoring iterations: 6

Anova(model)  ##All predictors are significant

## Analysis of Deviance Table (Type II tests)
## 
## Response: count
##            LR Chisq Df Pr(>Chisq)    
## ta           75.529  1  < 2.2e-16 ***
## hard_coral  230.581  1  < 2.2e-16 ***
## sand          6.645  1   0.009942 ** 
## cca         231.100  1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

dispersion <- sum(residuals(model, type = "pearson")^2)/(length(model$y)- length(model$coefficients))
print(dispersion) #26.93165

## [1] 26.93165

model_quassi <- glm(count~ ta + hard_coral + sand + cca, family= quasipoisson, data = crcp)
Anova(model_quassi, test= 'F')  #only hard_coral and cca have significant pvalues

## Analysis of Deviance Table (Type II tests)
## 
## Response: count
## Error estimate based on Pearson residuals 
## 
##            Sum Sq  Df F value   Pr(>F)   
## ta           75.5   1  2.8045 0.095026 . 
## hard_coral  230.6   1  8.5617 0.003691 **
## sand          6.6   1  0.2467 0.619733   
## cca         231.1   1  8.5810 0.003653 **
## Residuals  8214.2 305                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on likelihood ratio tests and effects plots, how do the results change when all predictors are included simultaneously? Why do you think the results have changed? Don’t forget to make some residual diagnostic plots as well.

#Residual panels 
resid_xpanel(model, type = "deviance")

resid_xpanel(model, type = "pearson")

#ta: If removing ta results in a significant LRT, than adding ta as a layer of complexity has a significant impact on model fit
ta_model <- glm(count~ hard_coral + sand + cca, family= quasipoisson, data = crcp)
anova(ta_model, model_quassi, test ="LRT") #p =0.094

## Analysis of Deviance Table
## 
## Model 1: count ~ hard_coral + sand + cca
## Model 2: count ~ ta + hard_coral + sand + cca
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
## 1       306     5893.1                       
## 2       305     5817.6  1   75.529    0.094 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#hard_coral:
hard_model <- glm(count~ ta+ sand + cca, family= quasipoisson, data = crcp)
anova(hard_model, model_quassi, test ="LRT") # p=0.003433 **

## Analysis of Deviance Table
## 
## Model 1: count ~ ta + sand + cca
## Model 2: count ~ ta + hard_coral + sand + cca
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)   
## 1       306     6048.2                        
## 2       305     5817.6  1   230.58 0.003433 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#sand:
sand_model <- glm(count~ ta+ hard_coral + cca, family= quasipoisson, data = crcp)
anova(sand_model, model_quassi, test ="LRT") # p=0.6194

## Analysis of Deviance Table
## 
## Model 1: count ~ ta + hard_coral + cca
## Model 2: count ~ ta + hard_coral + sand + cca
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1       306     5824.2                     
## 2       305     5817.6  1   6.6453   0.6194

#cca 
cca_model <- glm(count~ ta+ hard_coral + sand, family= quasipoisson, data = crcp)
anova(cca_model, model_quassi, test ="LRT") # p = 0.003397 **

## Analysis of Deviance Table
## 
## Model 1: count ~ ta + hard_coral + sand
## Model 2: count ~ ta + hard_coral + sand + cca
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)   
## 1       306     6048.7                        
## 2       305     5817.6  1    231.1 0.003397 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#Effects plot 
plot(ggeffect(model_quassi))

## $ta

## 
## $hard_coral

## 
## $sand

## 
## $cca

#Why have the results changed? 
###The effect of each individual predictor value was lower in the model which included all of the predictor values. The model with all the predictor values was testing the impact of "w" given "x, y, z", so it was taking into account the variation explained by "x, y, z" when considering the impact of "x". Taking into account the other causes of variation in kole counts, each predictor had a less of an effect on kole count. In the model with all of the precitors and overdispersion accounted for, only hard coral and cca hard a significant impact on kole counts. We saw this same result in the ratio tests, and the effects plot showed the dampened impact of each individual predictor, given all the variation explained by the other predictors.

Finally, what are your overall conclusions about the substrate associations of kole, from this look at the data? Are there any additional analyses you would want to do that we have not done in this assignment?

Based all all of the analysis we did, there is a significant positive interaction between kole and hard coral and crustose coralline algae. Kole are positively associated with these two substrates. The dataset we used has a lot of additional information, and I would like to do another analysis of kole association with substrate that includes other measured factors, liek depth potentially. I work in agriculture, so I don’t know if that would be a good additional factor to include, but it seems to vary a lot in the dataset and could be useful to include.

OCN 683 - Homework 5

Sarah Pennington

(1) Start by exploring the distributions of the four predictor variables, and how they are correlated with one another. The function ggpairs() in the package GGally is particularly nice for this.

(2) For each of the four predictors, make single-predictor models where the response variable is kole counts. Consider which of the predictors could be transformed to reduce skew along the x-axis. For educational purposes, start by using a poisson distribution for the counts.

Quantify the degree of overdispersion in these models, and describe what it means.

Based on likelihood ratio tests and effects plots, what are the relationships between kole and the four substrate types?

How have the results changed compared to #2, and why?

(4) Although the single-predictor models are useful, we may get a more accurate picture of the role of each substrate type if we include all the substrate types in one model. Create such a model.

Based on likelihood ratio tests and effects plots, how do the results change when all predictors are included simultaneously? Why do you think the results have changed? Don’t forget to make some residual diagnostic plots as well.

Finally, what are your overall conclusions about the substrate associations of kole, from this look at the data? Are there any additional analyses you would want to do that we have not done in this assignment?