OCN 683 - Homework 14

Sarah Pennington

‘high nucleic acid’ (HNA) and ‘low nucleic acid’ (LNA). LNA bacteria tend to be smaller in terms of cell size, and tend to be dominated by the widespread SAR11 clade. Moran et al. (2015) used a ten year, monthly time series from the southern Bay of Biscay to consider seasonal and longer-term patterns in the abundance, biomass, and cell size of these two groups of bacteria. They were particularly interested in whether patterns of abundance and cell size may be driven by temperature.

‘LNA ab uml’ = LNA abundance in cells per mL,

####‘LNA bv’ = LNA mean cell size in units of cubic microns, ####‘LNA B’ = LNA total biomass in units of µg C per L, and ####‘%LNA BB’ = percent of total bacterial biomass in the LNA fraction. ####Other important variables are ####year, ####month, ####date, ####day of year, and ####‘temp 5 m E2’ = temperature at 5 meters depth.

All of these predictors could be important for species diversity on islands, and the point of the analysis is to figure out which ones actually have support, using a comparative approach. We will focus on island-level richness:
bacteria <-read_csv("bacteria_size_temperature_edit.csv", 
    col_types = cols(`day of year` = col_number(), 
        `Z uml E2` = col_number(), `temp 5 m E2` = col_number(), 
        `LNA ab uml` = col_number(), `LNA bv` = col_number(), 
        `LNA B` = col_number(), `%LNA BB` = col_number()))
## New names:
## • `` -> `...32`
## • `` -> `...33`
## • `` -> `...34`
## • `` -> `...35`
## • `` -> `...36`
## • `` -> `...37`
## • `` -> `...38`
## • `` -> `...39`
## • `` -> `...40`
## • `` -> `...41`
## • `` -> `...42`
## • `` -> `...43`
## • `` -> `...44`
## • `` -> `...45`
## • `` -> `...46`
## • `` -> `...47`
## • `` -> `...48`
## • `` -> `...49`
## • `` -> `...50`
## • `` -> `...51`
## • `` -> `...52`
## • `` -> `...53`
## • `` -> `...54`
## • `` -> `...55`
## • `` -> `...56`
## • `` -> `...57`
## • `` -> `...58`
## • `` -> `...59`
## • `` -> `...60`
## • `` -> `...61`
## • `` -> `...62`
## • `` -> `...63`
## • `` -> `...64`
## • `` -> `...65`
## • `` -> `...66`
bacteria$date <- dmy(bacteria$date)

bacteria <- bacteria %>% filter(!is.na(`LNA ab uml`))
1. Begin by exploring seasonal patterns in LNA abundance, cell size, biomass, and percent biomass, as well as temperature. I.e., make exploratory plots of these variables that focus on seasonality. What have you learned so far?
#LNA abundance 
ggplot(bacteria, aes(x = factor(season), y = `LNA ab uml`)) +
  geom_boxplot()

    ##Abundance appears to increase in seasons 3-4, or from Summer into autumn/early winter. 

#LNA cell size 
ggplot(bacteria, aes(x = factor(season), y = `LNA bv`)) +
  geom_boxplot()

    ##Size might decrease from Summer into autumn/early winter. 

#LNA biomass
ggplot(bacteria, aes(x = factor(season), y = `LNA B`)) +
  geom_boxplot()

    ##Biomass is highest in season 3, Summer 

#LNA % biomass
ggplot(bacteria, aes(x = factor(season), y = `%LNA BB`)) +
  geom_boxplot()

    ##The LNA % of biomass increases from Summer into autumn/early winter. 

#Temperature
ggplot(bacteria, aes(x = factor(season), y = `temp 5 m E2`)) +
  geom_boxplot()

    ##Temperature at 5 meters deep is highest in season 3, July, August, and September. 

#LNA abundance and temperature
ggplot(bacteria, aes(x = `temp uml`, y = `LNA ab uml`, color = factor(season))) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE) 
## `geom_smooth()` using formula = 'y ~ x'

  ##Abundance appears to be positively correlated with temeperature  
Now make new exploratory plots of the same five variables, focusing on longer term trends. How do you interpret these plots?
#LNA abundance 
ggplot(bacteria, aes(x = date, y = `LNA ab uml`))+ geom_point() + geom_line()

    ##Abundance does vary by year 

#LNA cell size 
ggplot(bacteria, aes(x = date, y = `LNA bv`)) + geom_point() + geom_line() + geom_smooth(method = "loess")
## `geom_smooth()` using formula = 'y ~ x'

    ##Size also varies by year with a possible downward trend overtime

#LNA biomass
ggplot(bacteria, aes(x = date, y = `LNA B`)) + geom_point() + geom_line()

    ##Biomass also varies by year. The fluctuations within each year is probably because of season.  

#LNA % biomass
ggplot(bacteria, aes(x = date, y = `%LNA BB`)) + geom_point() + geom_line()

    ##The LNA % of biomass I think changes more by season, then by year.  

#Temperature
ggplot(bacteria, aes(x = date, y = `temp 5 m E2`)) + geom_point() + geom_line()

    ##Temperature fluctuates by season throughout the year 
4. Perform a similar set of analyses using temperature as the predictor, to test whether temperature can explain variation in the four LNA metrics, and whether autocorrelation needs to be accounted for when testing these relationships. Make appropriate plots of the results and discuss the magnitude of the relationships. Considering the whole set of analyses, what are your interpretations of the dynamics and drivers of LNA bacteria in this system?
#LNA abundance 
abund = lm(LNA_ab_uml~ temp_5m_E2, data = bacteria)
summary(abund)
## 
## Call:
## lm(formula = LNA_ab_uml ~ temp_5m_E2, data = bacteria)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -285341 -127694  -49946   79080 1062318 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -71294      99480  -0.717    0.475    
## temp_5m_E2     27312       6190   4.412 2.36e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 205300 on 112 degrees of freedom
## Multiple R-squared:  0.1481, Adjusted R-squared:  0.1405 
## F-statistic: 19.47 on 1 and 112 DF,  p-value: 2.363e-05
bacteria$resid.rainpredictor.raw = resid(abund) 
ggAcf(bacteria$resid.rainpredictor.raw) + labs(title = 'raw residuals, no predictor and no autocorrelation for time')

plot(ggeffect(abund))

abund1 = gls(LNA_ab_uml~ temp_5m_E2, data = bacteria, correlation = corAR1(form =~ sample_number), method = "ML") 
summary(abund1)
## Generalized least squares fit by maximum likelihood
##   Model: LNA_ab_uml ~ temp_5m_E2 
##   Data: bacteria 
##        AIC      BIC    logLik
##   3114.499 3125.444 -1553.249
## 
## Correlation Structure: AR(1)
##  Formula: ~sample_number 
##  Parameter estimate(s):
##       Phi 
## 0.1912381 
## 
## Coefficients:
##                 Value Std.Error   t-value p-value
## (Intercept) -25089.47 112956.30 -0.222117  0.8246
## temp_5m_E2   24327.48   7014.93  3.467958  0.0007
## 
##  Correlation: 
##            (Intr)
## temp_5m_E2 -0.978
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -1.3919184 -0.6500211 -0.2209597  0.3769490  5.1909826 
## 
## Residual standard error: 203724.2 
## Degrees of freedom: 114 total; 112 residual
ggAcf(resid(abund1, type = 'normalized')) + labs(title = 'normalized resi
 duals, no predictor')

plot(ggeffect(abund1)) 
## Can't compute adjusted predictions, `effects::Effect()` returned an error.
## 
## Reason: the following predictor is not in the model: sample_number
## You may try `ggpredict()` or `ggemmeans()`.

AICc(abund) #3116.686
## [1] 3116.686
AICc(abund1) # [1] 3114.866
## [1] 3114.866
#LNA cell size 
size = lm(LNA_bv~ temp_5m_E2, data = bacteria)
summary(size)
## 
## Call:
## lm(formula = LNA_bv ~ temp_5m_E2, data = bacteria)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.018101 -0.003579 -0.000305  0.002547  0.033770 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.058286   0.003086  18.889   <2e-16 ***
## temp_5m_E2  -0.000422   0.000192  -2.198     0.03 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.006369 on 112 degrees of freedom
## Multiple R-squared:  0.04135,    Adjusted R-squared:  0.03279 
## F-statistic: 4.831 on 1 and 112 DF,  p-value: 0.03001
bacteria$resid.rainpredictor.raw = resid(size) 
ggAcf(bacteria$resid.rainpredictor.raw) + labs(title = 'raw residuals, no predictor and no autocorrelation for time')

size1 = gls(LNA_bv~ temp_5m_E2, data = bacteria, correlation = corAR1(form =~ sample_number), method = "ML") 
summary(size1)
## Generalized least squares fit by maximum likelihood
##   Model: LNA_bv ~ temp_5m_E2 
##   Data: bacteria 
##        AIC       BIC   logLik
##   -837.799 -826.8542 422.8995
## 
## Correlation Structure: AR(1)
##  Formula: ~sample_number 
##  Parameter estimate(s):
##       Phi 
## 0.3493379 
## 
## Coefficients:
##                   Value   Std.Error   t-value p-value
## (Intercept)  0.05843996 0.003756441 15.557270  0.0000
## temp_5m_E2  -0.00042879 0.000232385 -1.845173  0.0677
## 
##  Correlation: 
##            (Intr)
## temp_5m_E2 -0.974
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -2.87543510 -0.57222425 -0.05359929  0.39527644  5.33467976 
## 
## Residual standard error: 0.00631973 
## Degrees of freedom: 114 total; 112 residual
ggAcf(resid(size1, type = 'normalized')) + labs(title = 'normalized resi
 duals, no predictor')

plot(ggeffect(size1)) 
## Can't compute adjusted predictions, `effects::Effect()` returned an error.
## 
## Reason: the following predictor is not in the model: sample_number
## You may try `ggpredict()` or `ggemmeans()`.

AICc(size) #-825.1361
## [1] -825.1361
AICc(size1) # [1] -837.432
## [1] -837.432
#LNA biomass
mass = lm(LNA_B~ temp_5m_E2, data = bacteria)
summary(mass)
## 
## Call:
## lm(formula = LNA_B ~ temp_5m_E2, data = bacteria)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1996 -2.0080 -0.7228  1.2707 16.2048 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.61170    1.47640  -0.414    0.679    
## temp_5m_E2   0.37741    0.09187   4.108 7.61e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.047 on 112 degrees of freedom
## Multiple R-squared:  0.131,  Adjusted R-squared:  0.1232 
## F-statistic: 16.88 on 1 and 112 DF,  p-value: 7.612e-05
bacteria$resid.rainpredictor.raw = resid(mass) 
ggAcf(bacteria$resid.rainpredictor.raw) + labs(title = 'raw residuals, no predictor and no autocorrelation for time')

mass1 = gls(LNA_B~ temp_5m_E2, data = bacteria, correlation = corAR1(form =~ sample_number), method = "ML") 
summary(mass1)
## Generalized least squares fit by maximum likelihood
##   Model: LNA_B ~ temp_5m_E2 
##   Data: bacteria 
##        AIC      BIC    logLik
##   579.8249 590.7697 -285.9124
## 
## Correlation Structure: AR(1)
##  Formula: ~sample_number 
##  Parameter estimate(s):
##       Phi 
## 0.1852965 
## 
## Coefficients:
##                 Value Std.Error  t-value p-value
## (Intercept) 0.0621473   1.67078 0.037197  0.9704
## temp_5m_E2  0.3339085   0.10377 3.217774  0.0017
## 
##  Correlation: 
##            (Intr)
## temp_5m_E2 -0.979
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -1.3289655 -0.6728622 -0.2107094  0.4031903  5.3365923 
## 
## Residual standard error: 3.023397 
## Degrees of freedom: 114 total; 112 residual
ggAcf(resid(mass1, type = 'normalized')) + labs(title = 'normalized resi
 duals, no predictor')

plot(ggeffect(mass1))
## Can't compute adjusted predictions, `effects::Effect()` returned an error.
## 
## Reason: the following predictor is not in the model: sample_number
## You may try `ggpredict()` or `ggemmeans()`.

AICc(mass) #581.7598
## [1] 581.7598
AICc(mass1) #580.1918
## [1] 580.1918
#LNA % biomass
Bmass = lm(LNA_percent_BB~ temp_5m_E2, data = bacteria)
summary(Bmass)
## 
## Call:
## lm(formula = LNA_percent_BB ~ temp_5m_E2, data = bacteria)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.31774 -0.05272  0.00403  0.07151  0.19718 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.196786   0.047875   4.110 7.55e-05 ***
## temp_5m_E2  0.015569   0.002979   5.226 8.09e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09881 on 112 degrees of freedom
## Multiple R-squared:  0.1961, Adjusted R-squared:  0.1889 
## F-statistic: 27.31 on 1 and 112 DF,  p-value: 8.092e-07
bacteria$resid.rainpredictor.raw = resid(Bmass) 
ggAcf(bacteria$resid.rainpredictor.raw) + labs(title = 'raw residuals, no predictor and no autocorrelation for time')

Bmass1 = gls(LNA_percent_BB~ temp_5m_E2, data = bacteria, correlation = corAR1(form =~ sample_number), method = "ML") 
summary(Bmass1)
## Generalized least squares fit by maximum likelihood
##   Model: LNA_percent_BB ~ temp_5m_E2 
##   Data: bacteria 
##         AIC       BIC  logLik
##   -204.9979 -194.0531 106.499
## 
## Correlation Structure: AR(1)
##  Formula: ~sample_number 
##  Parameter estimate(s):
##       Phi 
## 0.2482872 
## 
## Coefficients:
##                  Value  Std.Error  t-value p-value
## (Intercept) 0.21155723 0.05603314 3.775573   3e-04
## temp_5m_E2  0.01456914 0.00347621 4.191096   1e-04
## 
##  Correlation: 
##            (Intr)
## temp_5m_E2 -0.977
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -3.25743954 -0.50862611  0.06507804  0.72092525  2.00054770 
## 
## Residual standard error: 0.098116 
## Degrees of freedom: 114 total; 112 residual
ggAcf(resid(Bmass1, type = 'normalized')) + labs(title = 'normalized resi
 duals, no predictor')

plot(ggeffect(Bmass1))
## Can't compute adjusted predictions, `effects::Effect()` returned an error.
## 
## Reason: the following predictor is not in the model: sample_number
## You may try `ggpredict()` or `ggemmeans()`.

AICc(Bmass) #-199.999
## [1] -199.999
AICc(Bmass1) #-204.631
## [1] -204.631
#Based on the models, abundance of LNA significantly increased with temperature, with the model that included time autocorrelation being the best. LNA size was negatively correlated with temperature, with the model not including time autocorrelation being the best. Total biomass increased with temperature, with or without taking into account time autocorrelation. Percent biomass increased with temperature, with the model not including time autocorrelation being the best.So, time autocorrelation ended up being most important when considering LNA abundance, and temperature changes did drive abundance, size, total biomass, and percent biomass difference witht he notable change that as abundance increased, size decreased for LNA.