Fish Growth and Temperature Relationships

EPSCoR SCTC Aquatic Ecology

Analysis in progress

R Session Info

sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.5.1  backports_1.1.2 magrittr_1.5    rprojroot_1.3-2
##  [5] tools_3.5.1     htmltools_0.3.6 yaml_2.1.19     Rcpp_0.12.17   
##  [9] stringi_1.2.4   rmarkdown_1.10  knitr_1.20      stringr_1.3.1  
## [13] digest_0.6.15   evaluate_0.11

Load packages

This document explores the spatial and temporal scales at which growth trends of juvenile Chinook/Coho salmon in the Kenai River (2015/2016) may be compared and understood. It also begins exploring GLME models for understanding if water temperature metrics are related to growth trends.

The most granular level of segregation in to which fish size data may be organized is as follows:

Year (2015/2016)
River (Beaver Creek, Russian River, Ptarmigan Creek, Kenai River)
Fish Sampling Site (Lower, Middle, and Upper for each study drainage [Kenai River and Ptarmigan Creek do not have upper sites])
Fish Age (0,1 [All Age 2 Coho and Age 1 Chinook excluded from growth analyses])
Species (Coho, Chinook)
Season (Interval of time between site visits [Early Summer, Mid-Summer, Late Summer, Fall])

Set custom plotting themes

Import results from “season_temp_stats.Rmd”, googlesheets file “sim_extents_by_site” <>

This sheet contains values for season temporal extent and basic temperature metrics associated with each fish sampling site.

season_site_temps <- gs_title("sim_extents_by_site") %>%
  gs_read(ws = "Sheet1")

When did sampling occur?

Define temporal extents of seasons at chosen spatial/temporal scale.

A “season” is defined as the period of time between visits to a fish sampling site; generally about ~1 month.

# Define temporal extents of proposed simulations

# NOTE 1/22/18: this code also exists as a script at "/Users/bmeyer/Google Drive/Thesis/R Analysis/Thesis Analyses R Project/Overall_scripts_2015_2016/sampling_periods.R".  Attempted to just source("\path") this script in to the markdown document but, 'source' function prevents knitting through markdown for some reason.  

############
#
# 2015
#
###########

# Verify that sheet is present, create object of sheet with all tabs
SCTC2015 <- gs_title("2015 EPSCoR SCTC.xlsx")

#read in specific worksheet containing fishing data
effort2015 <- SCTC2015 %>% 
  gs_read(ws = "B Fishing Data",col_names=TRUE, skip=2) %>%
  #rename needed columns  
  rename(site.id=Site_ID,
         sample.event=Sample_Event) %>%
  
  #select desired columns
  select(site.id,River,sample.event,Reach,sample.event.num,deploy_dt,collect_dt) %>%
  
  #filter out blanks
  filter(!is.na(sample.event)) %>%
  
  # transform columns to desired classes
  transform(site.id=as.factor(site.id),
            sample.event=as.factor(sample.event),
            sample.event.num=as.factor(sample.event.num)) 


###################
#
# 2016
#
###################

# Verify that sheet is present, create object of sheet with all tabs
SCTC2016 <- gs_title("2016_EPSCoR_Aquatic_Ecology_Database")

#read in specific worksheet containing fishing data
effort2016 <- SCTC2016 %>% 
  gs_read(ws = "B Fishing Data",col_names=TRUE, skip=2) %>%
  
  #rename needed columns  
  rename(site.id=Site_ID,
         sample.event=Sample_Event) %>%
  
  #select desired columns
  select(site.id,River,sample.event,sample.event.num,Reach,deploy_dt,collect_dt) %>%
  
  #filter our blanks
  filter(!is.na(sample.event)) %>%
  
  # transform columns to desired classes
  transform(site.id=as.factor(site.id),
            sample.event=as.factor(sample.event),
            sample.event.num=as.factor(sample.event.num)) 


# bind 2015 & 2016 data; format datetimes
effort <- bind_rows(effort2015,effort2016) %>%
  transform(collect_dt = anytime(collect_dt),
            deploy_dt = anytime(deploy_dt)) %>%
  mutate(year = year(collect_dt)) 



# Define Temporal Extents of proposed simulations 

# "Simulation" temporal extent
# Time period from first date of each season to first date of following season

# prepare data to tibble form
sim_extents <- effort %>% 
 # select(-site.id) %>%
  group_by(sample.event,Reach,River, year, sample.event.num) %>% 
  nest() %>% 
  mutate(season.start = map_dbl(data, ~min(.$deploy_dt)),
         season.end = map_dbl(data, ~max(.$deploy_dt))
         ) %>%
  transform(season.start = anytime(season.start),
            season.end = anytime(season.end)) %>%
  arrange(River,Reach,year,sample.event.num)

# prepare data such that end of one season is beginning of next
t <- sim_extents %>%
  mutate(season.end_x = lead(season.end)) %>%
  select(-season.end) %>%
  arrange(River,year,sample.event.num)

# prepare data such that final season of each year has duration of 30 days.
# 2015 had 3 seasons, 2016 had 4 seasons.

t_end15 <- t %>%
  filter(sample.event.num == 3,
         year == 2015) %>%
  mutate(season.end = season.start + days(30)) %>%
  select(-season.end_x)

t_end16 <- t %>%
  filter(sample.event.num == 4,
         year == 2016) %>%
  mutate(season.end = season.start + days(30)) %>%
  select(-season.end_x)

t_end15_1 <- t %>%
  filter(sample.event.num != 3,
         year != 2016) %>%
  rename(season.end = season.end_x)

t_end16_1 <- t %>%
  filter(sample.event.num != 4,
         year != 2015) %>%
  rename(season.end = season.end_x)

# final version of simulation extents
sim_extents <- bind_rows(t_end15_1,t_end15,t_end16_1,t_end16) %>%
  rename(Stream_Name = River,
         sim.start = season.start,
         sim.end = season.end) %>%
  transform(year = as.factor(year)) %>% 
  mutate(start_doy = yday(sim.start),
         end_doy = yday(sim.end)) %>%
  mutate(season = fct_recode(as.factor(sample.event.num),         #code sampling events as seasons
                             "Early Summer"="1", 
                             "Mid-Summer" = "2",
                             "Late Summer" ="3",
                             "Fall" = "4")) %>%
  mutate(river_reach = paste(Reach,Stream_Name),
         days = end_doy - start_doy) %>%
  filter(sample.event != c("UPC-2015-1","MPC-2016-3"))    # we want to exclude both of these non-events

# remove temporary objects
rm(t,t_end15,t_end15_1,t_end16,t_end16_1,effort2015,effort2016)

Fish sampling periods


#  make plot that visualizes extent of each simulation period faceted by year and river

# order facets
sim_extents$river_reach <- factor(sim_extents$river_reach, levels = 
                                    c("Lower Beaver Creek",
                                      "Middle Beaver Creek",
                                      "Upper Beaver Creek",
                                      "Lower Russian River",
                                      "Middle Russian River",
                                      "Upper Russian River",
                                      "Lower Ptarmigan Creek",
                                      "Middle Ptarmigan Creek",
                                      "Lower Kenai River",
                                      "Middle Kenai River"))

# color plot
 p <- ggplot(sim_extents) +
  geom_segment(aes(x = start_doy, xend= end_doy,
                   y = year, yend = year, color = season),
               size = 3) + 
  facet_grid(river_reach~.) +
  scale_x_continuous(breaks = c(152,182,213,244),
                     labels = c("June","July","August","Sept")) +
  theme_bw() +
  ggtitle("Fish Sampling Periods") +
  xlab("") +
  ylab("") +
  theme(strip.text.y = element_text(angle = 0, size = 14)) +
  theme(plot.title = element_text(size= 30, face = "bold", hjust = 0.3)) +
  theme(legend.title=element_text(size=24, face = "bold")) +
  theme(axis.text.x = element_text(size = 14),
        axis.text.y = element_text(size = 14),
        legend.text = element_text(size = 14))
 
# print color plot
p


# print black and white plot
#p + scale_color_grey()

What is the range of lengths for seasons?

range(sim_extents$days)
## [1] 21 48

The range of days is within a season is 21 to 48.

Calculate catch summary data

# Fish ages used here were assigned by visual inspection of thresholds in length-frequency distribution charts as described in the script "fish_age_structure.R".  As time permits, these age assignment results will be compared with bayesian methods described in Sethi et al. 

# Prior to this approach, age-length key methods as described in Ch. 5 of Ogle 2016 ("Introductory fisheries analyses with R") were used. This appraoch was deemed to be unsuccessful in Spring 2018.


# Verify that sheets are present, create object of sheet with all tabs
SCTC2015 <- gs_title("2015 EPSCoR SCTC.xlsx")
SCTC2016 <- gs_title("2016_EPSCoR_Aquatic_Ecology_Database")

# read in 2015 length and weight data
fish_ages15 <- SCTC2015 %>% 
  gs_read(ws = "C 2015 Diet & LW Data",col_names=TRUE) %>%
  select(river,sample.id,sample.event,spp,Weight_g,Length_mm,Age_manual2,sample.event.num,Date,`Site ID`,reach_name, DS, Count,DOY,`Gear ID`) %>%
  mutate(season = fct_recode(as.factor(sample.event.num),         #code sampling events as seasons
                             "Early Summer"="1", 
                             "Mid-Summer" = "2",
                             "Late Summer" ="3")) %>%
  rename(wt_g = Weight_g,
         len_mm = Length_mm,
         age = Age_manual2,
         Hydrologic.Segment = reach_name,
         diet_sample = DS,
         Gear_ID = `Gear ID`) %>%
  transform(Date = anydate(Date)) %>%
  filter(spp %in% c("Coho", "Chinook")) %>% # filter out all species except Chinook and Coho
  filter(Gear_ID != "Aq. Net")  # filter out fish caught opportunistically with aquarium net for size evaluations

# read in length and weight worksheet 2016
fish_ages16 <- SCTC2016 %>% 
  gs_read(ws = "C 2016 Diet & LW Data",col_names=TRUE) %>%
  select(river,sample.id,sample.event,spp,Weight_g,Length_mm,Age_manual2,sample.event.num,Date,`Site ID`,reach_name,DS,Count,DOY) %>%
  mutate(season = fct_recode(as.factor(sample.event.num),         #code sampling events as seasons
                             "Early Summer"="1", 
                             "Mid-Summer" = "2",
                             "Late Summer" ="3",
                             "Fall" ="4")) %>%
  rename(wt_g = Weight_g,
         len_mm = Length_mm,
         age = Age_manual2,
         Hydrologic.Segment = reach_name,
         diet_sample = DS) %>%
  transform(Date = anydate(Date),
            wt_g = as.numeric(wt_g)) %>%
  filter(spp %in% c("Coho", "Chinook")) # filter out all species except Chinook and Coho


#join 2015 and 2016 data
fish_ages <- bind_rows(fish_ages15,fish_ages16)

#prep dataframe containing age data
dat <- fish_ages %>% 
  separate(sample.event, c("reach","year","sample.event.num"), sep = "-", remove = F) %>%
  unite(river_spp, river, spp, remove = F, sep = " ") %>%
  # manually order appearance of seasons and rivers for plots
  transform(year = as.factor(year),
            river = factor(river, levels=c("Beaver Creek",
                                              'Russian River',
                                              'Ptarmigan Creek',
                                              'Kenai River')),
            season = factor(season, levels = c("Early Summer",
                                     "Mid-Summer",
                                     "Late Summer",
                                     "Fall")), 
            river_spp = as.factor(river_spp),
            age = as.factor(age)) %>%
  mutate(river_spp = paste(river,"\n",spp)) %>%
  mutate(river_reach = paste(Hydrologic.Segment,river)) %>%
  mutate(river_spp_yr = paste(year,river,"\n",spp)) %>%
  mutate(doy = yday(Date)) %>%
  # filter out blanks where fish age/length/weight data is missing
  filter(!is.na(age),
         !is.na(wt_g),
         !is.na(len_mm)) %>%
 # age 2 fish: this cohort is sparse and likely departing for the ocean.
  filter(age != 2) %>%
    select(river_spp,river,river_reach,sample.id, sample.event,reach,year,sample.event.num,spp,wt_g,len_mm,
           age,season,Date,Site.ID,Hydrologic.Segment,Count,diet_sample,river_spp_yr,doy)

#housekeeping: remove Age 1 Chinook; not considered for growth models
age1chnk <- dat %>% filter(spp == "Chinook",
                           age == 1)
dat <- anti_join(dat,age1chnk)
rm(age1chnk)

# catch summary table; with age 1 chnk removed and all age 2 coho removed
catch_summary <- dat %>%
  group_by(year,river,season,river_reach,spp,age,diet_sample) %>%
  summarise(ct = sum(Count)) %>%
  spread(diet_sample,ct) %>%
  replace(is.na(.), 0) %>%
  mutate(total_fish = sum(N,Y)) %>%
  rename(diet_sample = Y,
         len_wt_only = N)

rm(fish_ages15,fish_ages16)

Fish catch data table


datatable(catch_summary, caption = "Juvenile Chinook and Coho Catch Data", 
          filter = 'top', options = list(
  pageLength = 5, autoWidth = TRUE)) #%>%

  #formatSignif(c("log.len_slope_annual","log.len_intercept_annual"), digits = 2) 


# 2015 & 2016 catch data

# Create bar graph(s) showing proportions (by spp, year, river, diet sample (y/n))  
# 1.) By overall drainage and year
#   a.) 2015
#   b.) 2016
#   c.) both years juxtaposed

# read in desired worksheet
catch15 <- SCTC2015 %>% 
  gs_read(ws = "C 2015 Diet & LW Data",col_names=TRUE) %>%
  select(Year,river,spp,Count,sample.event.num,DS) %>%
  mutate(season = fct_recode(as.factor(sample.event.num),         #code sampling events as seasons
                             "Early Summer"="1", 
                             "Mid-Summer" = "2",
                             "Late Summer" ="3")) %>%
  rename(diet_sample = DS) %>%
  filter(spp %in% c("Coho", "Chinook")) # filter out all species except Chinook and Coho

# create summary of catch by drainage, species, and lavaged vs non-lavaged
catch_summary15 <- catch15 %>%
  group_by(Year,river,spp,diet_sample) %>%
  summarise(ct = sum(Count)) %>%
  mutate(count_label = paste("( n=", as.character(ct),")")) #create count labels

# specify order of drainages
catch_summary15$river <- factor(catch_summary15$river, levels = c("Beaver Creek", 
                                                              "Russian River", 
                                                              "Ptarmigan Creek", 
                                                              "Kenai River"))

# create theme to plot 2015 catch data
c15 <- ggplot(catch_summary15, aes(x = spp, y = ct, fill = diet_sample)) +
  geom_bar(stat = "identity", color = "white") +
  facet_wrap(~river,nrow=1) +
  theme_bw() +
  scale_fill_discrete(name="Diet Sample\nCollected") +
  theme(legend.key.size = unit(1.7, 'lines')) +
  theme(plot.title = element_text(size = 26, face = "bold"),
        axis.text.x = element_text(size=14, face="bold"),
        axis.text.y = element_text(size=16, face="bold"),
        axis.title.x = element_text(size=22, face="bold"),
        axis.title.y = element_text(size = 22, face = "bold")) +
  theme(strip.text.x = element_text(size=16, face="bold")) +
  theme(legend.title = element_text(size=20, face="bold")) + 
  theme(legend.text = element_text(size = 16, face = "bold")) +
  xlab("Species") +
  ylab("Count") 


# plot and title for 2015 data
# c15 + ggtitle("2015 Juvenile Chinook and Coho\n Catch Data") # can un-hashtag to show 2015 plot


# 2016 

# read in desired worksheet
catch16 <- SCTC2016 %>% 
  gs_read(ws = "C 2016 Diet & LW Data",col_names=TRUE) %>%
  select(Year,river,spp,Count,sample.event.num,DS) %>%
  mutate(season = fct_recode(as.factor(sample.event.num),         #code sampling events as seasons
                             "Early Summer"="1", 
                             "Mid-Summer" = "2",
                             "Late Summer" ="3",
                             "Fall" = "4")) %>%
  rename(diet_sample = DS) %>%
  filter(spp %in% c("Coho", "Chinook")) # filter out all species except Chinook and Coho

# create summary of catch by drainage, species, and lavaged vs non-lavaged
catch_summary16 <- catch16 %>%
  group_by(Year,river,spp,diet_sample) %>%
  summarise(ct = sum(Count)) %>%
  mutate(count_label = paste("( n=", as.character(ct),")")) #create count labels

# specify order of drainages
catch_summary16$river <- factor(catch_summary16$river, levels = c("Beaver Creek", 
                                                              "Russian River", 
                                                              "Ptarmigan Creek", 
                                                              "Kenai River"))

# create theme to plot catch data
c16 <- ggplot(catch_summary16, aes(x = spp, y = ct, fill = diet_sample)) +
  geom_bar(stat = "identity", color = "white") +
  facet_wrap(~river,nrow=1) +
  theme_bw() +
  scale_fill_discrete(name="Diet Sample\nCollected") +
  theme(legend.key.size = unit(1.7, 'lines')) +
  theme(plot.title = element_text(size = 26, face = "bold"),
        axis.text.x = element_text(size=14, face="bold"),
        axis.text.y = element_text(size=16, face="bold"),
        axis.title.x = element_text(size=22, face="bold"),
        axis.title.y = element_text(size = 22, face = "bold")) +
  theme(strip.text.x = element_text(size=16, face="bold")) +
  theme(legend.title = element_text(size=20, face="bold")) + 
  theme(legend.text = element_text(size = 16, face = "bold")) +
  xlab("Species") +
  ylab("Count") 


# plot and title for 2016 data
# c16 + ggtitle("2016 Juvenile Chinook and Coho\n Catch Data") # can un-hashtag to show 2016 plot

# combined plot for 2015 and 2016 data

# create combined table
all <- bind_rows(catch_summary15,catch_summary16)

allplot <- ggplot(all, aes(x = spp, y = ct, fill = diet_sample, color = spp)) +
  geom_bar(stat = "identity", color = "white") +
  facet_wrap(Year~river,nrow=2) +
  theme_bw() +
  scale_fill_discrete(name="Diet Sample\nCollected") +
  theme(legend.key.size = unit(1.7, 'lines')) +
  theme(plot.title = element_text(size = 26, face = "bold"),
        axis.text.x = element_text(size=14, face="bold"),
        axis.text.y = element_text(size=16, face="bold"),
        axis.title.x = element_text(size=22, face="bold"),
        axis.title.y = element_text(size = 22, face = "bold")) +
  theme(strip.text.x = element_text(size=14, face="bold")) +
  theme(legend.title = element_text(size=20, face="bold")) + 
  theme(legend.text = element_text(size = 16, face = "bold")) +
  xlab("Species") +
  ylab("Count") 

allplot

Growth Trends

Discussion:

What value do we want to use as our response variable for exploring growth trends? E.g., fork length? Weight? A log-transformed version of either?

For now, we will use log-transformed fork length or our unit of growth. See “fish_weight_length.Rmd” for further rationale on this choice <>. Often it can be more appropriate to model weight or length trends in relationship to another factor as log-transformed for two reasons (Ogle 2016, p 133):

a.) Relationship between weight and length is often non-linear (linear amount of fish length but 3D volume of mass). b.) Weights of shorter fish are often less variable than lengths of larger fish.

We will ultimately back-transform our log-length values back to a predicted fish weight, particularly for purposes of comparing bionergetics output with empirical results. See “fish_weight_length.Rmd” more details of this process.

Create column for log-transformed fork length values

dat <- dat %>% mutate(log.len_mm = log(len_mm))

Linear mixed models

To explore spatial/temporal scale of growth

GLME exploration of size data (following example on https://ourcodingclub.github.io/2017/03/15/mixed-models.html)

At what temporal/spatial scale may we consider and compare growth rates?

Our hypothesis involves an assumption that growth rates differ in different places and times. Is this true? We are looking to control for the effects of year, age, spp. (We are antcipating that different rivers/sites have different growth rates.)
Is there an association between fork length and day (for each of our species/age/year cohorts) after controlling for variation in rivers and sites within rivers?

Calculate overall annual slopes of length and weight growth trends for grouping level: year ~ river ~ reach ~ species ~ age

Overall, do our fish grow? Plot doy (day of year) vs. fork length.

## Fit basic models
# Age 0 
basic.lm.0 <- lm(log.len_mm ~ doy, data = subset(dat, age == 0))
summary(basic.lm.0)
## 
## Call:
## lm(formula = log.len_mm ~ doy, data = subset(dat, age == 0))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.37488 -0.09614 -0.00580  0.09173  0.40794 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.566476   0.022351  159.57   <2e-16 ***
## doy         0.002038   0.000106   19.24   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1427 on 2619 degrees of freedom
## Multiple R-squared:  0.1238, Adjusted R-squared:  0.1235 
## F-statistic:   370 on 1 and 2619 DF,  p-value: < 2.2e-16

# Age 1
basic.lm.1 <- lm(log.len_mm ~ doy, data = subset(dat, age == 1))
summary(basic.lm.1)
## 
## Call:
## lm(formula = log.len_mm ~ doy, data = subset(dat, age == 1))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.46671 -0.09084  0.00805  0.09329  0.41574 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3.9385889  0.0224209   175.7   <2e-16 ***
## doy         0.0023709  0.0001162    20.4   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1319 on 1652 degrees of freedom
## Multiple R-squared:  0.2013, Adjusted R-squared:  0.2008 
## F-statistic: 416.3 on 1 and 1652 DF,  p-value: < 2.2e-16


## Plot

# all together
dat %>%
ggplot(aes(doy,log.len_mm)) +
  geom_jitter(width = 1)+
  geom_smooth(method = "lm") +
  ggtitle("All Fish")



# segeregate ages
dat %>%
ggplot(aes(doy,log.len_mm)) +
  facet_grid(~age) +
  geom_jitter(width = 1)+
  geom_smooth(method = "lm") +
  ggtitle("All Age 0 and Age 1 Fish")

Age 0 plot diagnotsics

layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page 
plot(basic.lm.0)

Age 1 plot diagnostics

layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page 
plot(basic.lm.1)

Discussion: there is clearly a relationship between doy (Day of Year) and fork length in both years, but the basic model of doy + age doesn’t explain much of the variation.

What drives the variation in the relationship between doy and size? Let’s consider growth slopes at smaller temporal and spatial scales.

Calculate annual growth trends at temporal/spatial scale of year ~ spp ~ age ~ river ~ river_reach


# calc slope of each size trend for len and wt...??
all_slopes <- dat %>%
  transform(river_spp_yr = as.factor(river_spp_yr),
            year = as.factor(year),
            spp = as.factor(spp),
            log.len_mm = as.numeric(log.len_mm)) %>%
  mutate(river_reach = paste(Hydrologic.Segment,river)) %>%
  select(river_spp_yr,Hydrologic.Segment,river_reach,river,year,spp,age,log.len_mm,doy) 

# length model formulation for growth trend
len_growth_model <- function(df) {
  lm(log.len_mm ~ doy, data = df)
}


# create table of length and weight trend slopes for each annual sequence 
size_age_spp_year <- all_slopes %>% 
  group_by(river,river_reach, spp, age, year) %>% 
  nest() %>% 
  # calculate slopes & intercept for len_mm ~ day of year
  mutate(
    age_len = map(data, len_growth_model),
    tidy_age_len = map(age_len, broom::tidy),
    log.len_slope_annual = map_dbl(tidy_age_len, ~.$estimate[2]),
    log.len_intercept_annual = map_dbl(tidy_age_len, ~.$estimate[[1]])) %>%
  select(year,river,river_reach, spp, age, log.len_slope_annual, log.len_intercept_annual)

Show annual linear growth rate summary table


datatable(size_age_spp_year, caption = "Juvenile Chinook and Coho Catch and Size Trend Summary Data", 
          filter = 'top', options = list(
  pageLength = 5, autoWidth = TRUE)) %>%
  formatSignif(c("log.len_slope_annual","log.len_intercept_annual"), digits = 2)

Minimum standards for including size data from a site visit

We want to apply some minimum standards woth regards to site vists for including data as part of growth rate analyses/comparisons. We will exclude fish size data if:

a.) There were n < 3 fish obersations in a site visit (segregated by species/age)
b.) The overall annual growth relationship at the site is < 0 (likely indicative of data paucity rather than fish “shrinking” in size).

Apply minimum data standards

# filter out site visits with < 3 fish per analysis category
catch_summary_mod <- catch_summary %>%
  filter(total_fish >= 3)

# filter out negative/absent annual slopes
size_age_spp_year <- size_age_spp_year %>%
  filter(log.len_slope_annual > 0)

# join reduced datasets
size_age_spp_year <- inner_join(catch_summary_mod,size_age_spp_year, by = c("year","river_reach","river","spp","age"))


# apply filter to overall dataset
dat <- inner_join(dat, size_age_spp_year, by = c("year","river_reach","season","river","spp","age")) %>%
  select(-diet_sample.x) %>%
  rename(diet_sample = diet_sample.y)

# recreate catch summary with filter applied
catch_summary_filt <- dat %>%
  group_by(year,river,season,river_reach,spp,age,diet_sample) %>%
  summarise(ct = sum(Count)) %>%
  mutate(diff = ct-diet_sample) %>%
  rename(total_fish = ct,
         len_wt_only = diff) %>%
  select(year,river,season,river_reach,spp,age,len_wt_only,diet_sample,total_fish)

When data is segregated by year/site/season/species/age, 72% (108/150) of potential groups by site visit remain after applying these standards. 36 lost because fish count by site visit was n < 3 per spp/age; 6 lost becuase overall annual growth slope was m < 0.


datatable(catch_summary_filt, caption = "Selected Subset of Juvenile Chinook and Coho Catch Data", 
          filter = 'top', options = list(
  pageLength = 5, autoWidth = TRUE))

With minimum data inclusion standards applied, what does our empirical fork length distribution look like?

# Overall Coho lengths
dat %>%
  filter(spp == "Coho") %>% 
ggplot(aes(season,len_mm, fill = as.factor(age))) +
  geom_boxplot(outlier.size = 0) +
  facet_grid(.~year) +
  ggtitle("Overall Coho Lengths") +
  bm_theme


# Overall Chinook lengths
dat %>%
  filter(spp == "Chinook") %>% 
ggplot(aes(season,len_mm, fill = as.factor(age))) +
  geom_boxplot(outlier.size = 0) +
  facet_grid(.~year) +
  ggtitle("Overall Chinook Lengths") +
  bm_theme

For more detailed plots and segregation of fish size distributions, see http://rpubs.com/bmeyer/352129.

Visualize slopes of annual linear raw size trends from selected subset.

# Note: color switches between species in Age 0 vs Age 1 charts.  How to fix colors?

# Age 0 fish lengths
ggplot(data = subset(dat, age == 0),aes(doy,len_mm, color = spp, shape = Hydrologic.Segment)) + 
  facet_grid(year~river) +
  geom_jitter(width = 0.06) +
  geom_smooth(method = "lm") +
  ggtitle("Age 0 Fish Fork Lengths") + 
  theme_bw()


# Age 1 fish lengths
ggplot(data = subset(dat, age == 1),aes(doy,len_mm, color = spp, shape = Hydrologic.Segment)) + 
  facet_grid(year~river) +
  geom_jitter(width = 0.06) +
  geom_smooth(method = "lm") +
  ggtitle("Age 1 Fish Fork Lengths (Coho only)") + 
  theme_bw()


# Age 0 fish weights
#ggplot(data = subset(dat, age == 0),aes(doy,wt_g, color = spp, shape = Hydrologic.Segment)) + 
#  facet_grid(year~river) +
#  geom_jitter(width = 0.06) +
#  geom_smooth(method = "lm") +
#  ggtitle("Age 0 Fish Weights") + 
#  theme_bw()

# Age 1 fish weights
#ggplot(data = subset(dat, age == 1),aes(doy,wt_g, color = spp, shape = Hydrologic.Segment)) + 
#  facet_grid(year~river) +
#  geom_jitter(width = 0.06) +
#  geom_smooth(method = "lm") +
#  ggtitle("Age 1 Fish Weights (Coho only)") + 
 # theme_bw()

Looks like the Chinook almost always grow larger than the coho. Is this an artifact of small sample size or is there a possible mechanism?

Question: is there a way to check for linearity assumptions on all of these relationships, then just include the ones that “pass” ?

GLME Models

Other than the passage of time within a growing season, what else has noteworthy effects in predicting fish size? Use general linear mixed-models (GLME) to explore further.

Define fixed vs. random effects for GLME models.

Fixed: factors we anticipate will have an effect on the dependent/response variable (fish size). Random: factors we want to control for; we are not specifically interested in their impact on fish size.

NOTE: random effects generally must have > 2 levels.

FIXED EFFECTS * year (fixed, 2 levels) * species (fixed, 2 levels) * age (fixed, 2 levels) * river (fixed, 4 levels)

RANDOM EFFECTS * site, nested within river (random, 10 levels)

Notes:

“log.len_mm_scl” is the response variable; log-transformed fork length; scaled values (per recommendation in GLME vignette at https://ourcodingclub.github.io/2017/03/15/mixed-models.html )
“doy” = “day of year”

Consider linear mixed effect models for log.length ~ doy + year + river + spp + age + (1|river_reach))

# load lme4
require(lme4)

# scale response variables
dat_scl <- dat %>%
  mutate(log.len_mm_scl = scale(log.len_mm))

### --> Do I also need to log-transform these response variables???

# fork length glme
fit.len <- lmer(log.len_mm_scl ~ doy + year + river + spp + age + (1|river_reach), data = dat_scl)
summary(fit.len)
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## log.len_mm_scl ~ doy + year + river + spp + age + (1 | river_reach)
##    Data: dat_scl
## 
## REML criterion at convergence: 5874.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.4133 -0.6831 -0.0457  0.6690  3.6117 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  river_reach (Intercept) 0.03518  0.1876  
##  Residual                0.24532  0.4953  
## Number of obs: 4049, groups:  river_reach, 10
## 
## Fixed effects:
##                        Estimate Std. Error t value
## (Intercept)          -2.3990313  0.1303917 -18.399
## doy                   0.0102780  0.0003314  31.012
## year2016              0.2662377  0.0182980  14.550
## riverRussian River   -0.0646612  0.1552262  -0.417
## riverPtarmigan Creek  0.0362479  0.1740294   0.208
## riverKenai River     -0.4676067  0.1736228  -2.693
## sppCoho              -0.5280689  0.0268281 -19.683
## age1                  1.8223222  0.0271785  67.050
## 
## Correlation of Fixed Effects:
##             (Intr) doy    yr2016 rvrRsR rvrPtC rvrKnR sppCoh
## doy         -0.494                                          
## year2016    -0.119 -0.071                                   
## rivrRssnRvr -0.615  0.034  0.020                            
## rvrPtrmgnCr -0.509 -0.040  0.037  0.441                     
## riverKenRvr -0.547 -0.007  0.032  0.449  0.395              
## sppCoho      0.048 -0.381  0.242 -0.040  0.014  0.059       
## age1        -0.313  0.474 -0.030  0.094 -0.018  0.065 -0.430
plot(fit.len)

Discussion: The nested random effect of river reach (site) accounts for 12.54% [Random Effects: Variances; (0.03518 / (0.03518 + 0.24532)) x 100] of the variation in fork length.

After controlling for reach within river; the magnitude of t-values for the fixed effects of age, year, and species are all notably larger than the others; indicating that degree to which these factors are predictive of size.

We can visualize this with a figure of effect size from the “jtools”" package:

Effect size results for model log.len_mm_scl ~ doy + year + river + spp + age + (1|river_reach)

# load jtools package
require(jtools)

# fork length, glme model summary
summ(fit.len)
## MODEL INFO:
## Observations: 4049
## Dependent Variable: log.len_mm_scl
## Type: Mixed effects linear regression 
## 
## MODEL FIT:
## AIC = 5894.09, BIC = 5957.15
## Pseudo-R² (fixed effects) = 0.72
## Pseudo-R² (total) = 0.75 
## 
## FIXED EFFECTS:
##                       Est. S.E. t val.    p    
## (Intercept)          -2.40 0.13 -18.40 0.00 ***
## doy                   0.01 0.00  31.01 0.00 ***
## year2016              0.27 0.02  14.55 0.00 ***
## riverRussian River   -0.06 0.16  -0.42 0.68    
## riverPtarmigan Creek  0.04 0.17   0.21 0.84    
## riverKenai River     -0.47 0.17  -2.69 0.01   *
## sppCoho              -0.53 0.03 -19.68 0.00 ***
## age1                  1.82 0.03  67.05 0.00 ***
## 
## p values calculated using Kenward-Roger d.f. = 20.79 
## 
## RANDOM EFFECTS:
##        Group   Parameter Std. Dev.
##  river_reach (Intercept)      0.19
##     Residual                  0.50
## 
## Grouping variables:
##        Group # groups  ICC
##  river_reach       10 0.13
plot_summs(fit.len)

Notes: age has the greatest magnitude of effect size on fish fork length. Species and year have smaller effects. River has the greatest level of uncertainty.

Decision: we will consider fish growth at the scale of segregated year ~ spp ~ age ~ reach ~ river. A priori hypothesis is still that growth differs among environments (rivers and/or river reaches) and is driven by diet & temperatrure.

Size trends by season

Plot raw size trends by season


# length and weight summary values
all_summ <- dat %>%
  filter(!is.na(len_mm),
         !is.na(wt_g)) %>%
  # Group by sample.event rather than doy because two sample events occurred over two successive days (e.g. LRR-2015-1; LBC-2015-2).
  # To simplify both of these cases; a single day of year is assigned to the sample event; the first of the two days.
  group_by(year,spp,season,river,river_reach,age,Hydrologic.Segment,total_fish,sample.event,
           log.len_slope_annual,log.len_intercept_annual) %>%
  summarise(len_mean = mean(len_mm),
            len_se = std.error(len_mm),
            len.log_mean = mean(log.len_mm),
            len.log_se = std.error(log.len_mm)) %>%
  transform(year = as.factor(year)) %>%
  rename(River = river) 

# create & join column for doy back on to summary table
yday <- dat %>%
  distinct(sample.event,doy) %>%
  # include only first of days in sample events that occurred over successive two-day events
  filter(doy != 156,
         doy != 160,
         doy != 176,
         doy != 181)
all_summ <- left_join(all_summ,yday, by = "sample.event")


# reorder levels of "River" for plot legend
river_order <- c("Beaver Creek","Russian River","Ptarmigan Creek","Kenai River")
all_summ$River  <-factor(all_summ$River, levels = river_order)


# plotting

# The errorbars overlapped, so use position_dodge to move them horizontally
pd <- position_dodge(0.5) # move them .05 to the left and right

#create general plot object
p_fl <- ggplot(all_summ, aes(doy,len_mean,color = River,shape = Hydrologic.Segment)) +
  theme_bw() +
  geom_errorbar(aes(ymin=len_mean-len_se, ymax=len_mean+len_se), colour="black", width=3, position=pd,size=1.1) +
  geom_line(position=pd, size = 2) +
  #geom_smooth(method = "lm", se = FALSE) +
  geom_point(fill = "white",position=pd, size=3) +
  xlab("Date") +
  ylab("Fork Length (mm)") +
  bm_theme_2 +
  theme(panel.spacing = unit(1.3, "lines")) +
  scale_x_continuous(breaks = c(166,196,227),
                     labels = c("June 15","July 15","August 15")) +
  theme(legend.title=element_text(size=24),
        legend.text = element_text(size = 16)) +
  guides(color = guide_legend(override.aes = list(size=8))) 

# rename coho age facet labels
age_facets <- c(`0` = "Age 0", `1` = "Age 1")

#coho plot
p_fl %+% subset(all_summ, spp %in% c("Coho")) +
  facet_grid(age~year, scales = "free", space = "free", labeller=labeller(age= age_facets)) +
  ggtitle("Coho Seasonal Fork Lengths 2015-2016")


# age 0 only plot object
all_summ_0 <- all_summ %>%
  filter(age == 0) 
  
# chinook age 0 plots
p_fl %+% subset(all_summ_0, spp %in% c("Chinook")) +
  facet_grid(~year, scales = "free_x", space = "free_x") +
  ggtitle("Chinook Seasonal Fork Lengths 2015-2016")

Further segregate and facet charts of growth patterns.


#create general plot object
p_fl <- ggplot(all_summ, aes(doy,len_mean,color = Hydrologic.Segment,shape = Hydrologic.Segment)) +
  theme_bw() +
  geom_errorbar(aes(ymin=len_mean-len_se, ymax=len_mean+len_se), colour="black", width=3, position=pd,size=1.1) +
  geom_line(position=pd, size = 2) +
  #geom_smooth(method = "auto", se = FALSE) +
  geom_point(fill = "white",position=pd, size=3) +
  xlab("Date") +
  ylab("Fork Length (mm)") +
  bm_theme_2 +
  theme(panel.spacing = unit(1.3, "lines")) +
  scale_x_continuous(breaks = c(166,196,227),
                     labels = c("June 15","July 15","August 15")) +
  theme(legend.title=element_text(size=24),
        legend.text = element_text(size = 16)) +
  guides(color = guide_legend(override.aes = list(size=8))) 

# rename coho age facet labels
age_facets <- c(`0` = "Age 0", `1` = "Age 1")

#coho plot 2015
p_fl %+% subset(all_summ, spp %in% c("Coho") & year %in% c("2015")) +
  facet_grid(age~River, scales = "free_y",space = "free", labeller=labeller(age= age_facets)) +
  ggtitle("Coho Seasonal Fork Lengths 2015")


#coho plot 2016
p_fl %+% subset(all_summ, spp %in% c("Coho") & year %in% c("2016")) +
  facet_grid(age~River, scales = "free_y",space = "free", labeller=labeller(age= age_facets)) +
  ggtitle("Coho Seasonal Fork Lengths 2016")


#chinook plot 2015
p_fl %+% subset(all_summ, spp %in% c("Chinook") & year %in% c("2015")) +
  facet_grid(~River, scales = "free_y",space = "free", labeller=labeller(age= age_facets)) +
  ggtitle("Chinook Seasonal Fork Lengths 2015")


#chinook plot 2016
p_fl %+% subset(all_summ, spp %in% c("Chinook") & year %in% c("2016")) +
  facet_grid(~River, scales = "free_y",space = "free", labeller=labeller(age= age_facets)) +
  ggtitle("Chinook Seasonal Fork Lengths 2016")

Further segregate and facet charts of growth patterns.


#create general plot object
p_fl <- ggplot(all_summ, aes(doy,len_mean,color = Hydrologic.Segment,shape = Hydrologic.Segment)) +
  theme_bw() +
  geom_errorbar(aes(ymin=len_mean-len_se, ymax=len_mean+len_se), colour="black", width=3, position=pd,size=1.1) +
  geom_line(position=pd, size = 2) +
  #geom_smooth(method = "auto", se = FALSE) +
  geom_point(fill = "white",position=pd, size=3) +
  xlab("Date") +
  ylab("Fork Length (mm)") +
  bm_theme_2 +
  theme(panel.spacing = unit(1.3, "lines")) +
  scale_x_continuous(breaks = c(166,196,227),
                     labels = c("June 15","July 15","August 15")) +
  theme(legend.title=element_text(size=24),
        legend.text = element_text(size = 16)) +
  guides(color = guide_legend(override.aes = list(size=8))) 

# rename coho age facet labels
age_facets <- c(`0` = "Age 0", `1` = "Age 1")

#coho plot 2015
p_fl %+% subset(all_summ, spp %in% c("Coho") & year %in% c("2015")) +
  facet_grid(age~River, scales = "free_y",space = "free", labeller=labeller(age= age_facets)) +
  ggtitle("Coho Seasonal Fork Lengths 2015")


#coho plot 2016
p_fl %+% subset(all_summ, spp %in% c("Coho") & year %in% c("2016")) +
  facet_grid(age~River, scales = "free_y",space = "free", labeller=labeller(age= age_facets)) +
  ggtitle("Coho Seasonal Fork Lengths 2016")


#chinook plot 2015
p_fl %+% subset(all_summ, spp %in% c("Chinook") & year %in% c("2015")) +
  facet_grid(~River, scales = "free_y",space = "free", labeller=labeller(age= age_facets)) +
  ggtitle("Chinook Seasonal Fork Lengths 2015")


#chinook plot 2016
p_fl %+% subset(all_summ, spp %in% c("Chinook") & year %in% c("2016")) +
  facet_grid(~River, scales = "free_y",space = "free", labeller=labeller(age= age_facets)) +
  ggtitle("Chinook Seasonal Fork Lengths 2016")

Size-specific growth trends

Specific growth = mean growth rate divided by mean fish size.

Size-Specific Data

Are growth rates significantly different among seasons and among groups? How can we explore this question?

Approach: Size-specific growth = Linear trendline between two seasons of log.length (slope (mean log.length) / # of days between)

Current Issues to consider/resolve:

Growth trends through the summer season may not meet assumptions for linearity. Maybe season-to-season trends do though?
Availability of fish size data is not temporally consistent between seasons
“Season” designation not temporally consistent among sites & years

Rambling thoughts on method & procedure: it would be better to have DOY as the temporal independent variable rather than season; since “season” is not temporally consistent among sites/years. It makes sense that most of the othe potential predictors are not emerging with strong effects in initial GLME explorations…

Or, it could make sense to consider the two yearly datasets seperately for GLME/AICc procedures.

NOTE: We ultimately want to convert our log-fork length values back to fish weights so that we’ll eventually be able to compare them to output from bioenergetic simulations.

Import weights that were back-calculated from length-weight. See “fish_weight_length.Rmd” for details on how weight values are acquired.

# import back-calculated weights and associated confidence intervals
#backcal_wts <- gs_read(gs_title("fish_backcal_wts")) %>%
#  transform(year = as.factor(year),
#            sample.event.num = as.character(sample.event.num),
#            age = as.factor(age)
#            ) %>%
#  select(-diet_sample,-X1) %>%
#  mutate(sim = paste0(year,"_",season,"_",Hydrologic.Segment,"_",river,"_","Age","_",age,"_#",spp)) #%>%
  #select(sim,backcal.wts,lwrCImean,uprCImean,lwrCIindv,uprCIindv)

# reduce backcal_wts dataset to same rows as overall dataset
#datx <- dat %>%
#  mutate(sim = paste0(year,"_",season,"_",Hydrologic.Segment,"_",river,"_","Age","_",age,"_#",spp)) %>%
#  #select("year","season","Hydrologic.Segment","river","age","age","spp")
#  select(sim,len_mm,wt_g)



### to be able to properly join these; we need to assign and retain a row number in the original "dat" object, keep through on the backcal_wts object, then inner_join the two back by row number!  Use "arrange" for same factors on object exported from "fish_weight_length.Rmd" and 


##### HERE ::: figure out how to properly join this ....
#  z <- merge(datx,backcal_wts,by = "sim")


#str(backcal_wts)

# OK.  the giant dataset shit happens even when sim in the ONLY column... so maybe join by indv. componets

7/22/2018: having a technical hurdle with converting log.length values back to weight. For now, we will just consider the procedure for exploring & comparing size-specific growth rates in terms of raw length. Will ultimately plan to use weight values for comparing to bioenergetic output.

Calculate length-specific growth values (season-to-season, mm / mm * day (mm growth per mm fish per day))

# Prepare values for length-specific growth (change in mean len divided by number of days passed between site visits)

# need number of days for each season in this data table
season_site_temps <- season_site_temps %>%
  rename(river = Stream_Name,
         Hydrologic.Segment = Reach) %>%
  transform(year = as.factor(year))

# get dataframe with only number of days per season/site/year
ss <- season_site_temps %>%
  select(season,Hydrologic.Segment,river,year,days) %>%
  unite(col = "sim_1",year,Hydrologic.Segment,river) %>%
  spread(season,days) %>%
  select(sim_1,`Early Summer`,`Mid-Summer`,`Late Summer`) 


# summarise seasonal length-specific growth values (change in size divided by number of days)
z <- dat %>%
  group_by(year,season,spp,river,age,Hydrologic.Segment) %>%
  summarise(
    ct = n(),
    len_mean = mean(len_mm),
    se = std.error(len_mm)) %>%
  filter(!is.na(len_mean)) %>%
  ungroup(year,season,spp,river,age,Hydrologic.Segment,season) %>%
  gather(key,val,ct:se) %>%
  unite(col = "sim",year,spp,river,age,Hydrologic.Segment, sep = "_") %>%
  unite(col = "sim_1",sim,key,sep = "__") %>% 
  spread(season, val) %>%
  separate(sim_1,sep = "__", into = c("sim_1","value")) %>%
  separate(sim_1,sep = "_", into = c("year","spp","river","age","Hydrologic.Segment")) %>%
  unite(col = "sim_1",year,Hydrologic.Segment,river)
             
# calculate length-specific growth values for change in mean length between seasons
season_len_change <- z %>%
  filter(value == "len_mean") %>%
  mutate(`1_2` = `Mid-Summer` - `Early Summer`,
         `2_3` = `Late Summer` - `Mid-Summer`,
         `3_4` = `Fall` - `Late Summer`) %>%
  select(-`Early Summer`,-`Mid-Summer`,-`Late Summer`,-Fall) 

# get seasonal mean fork lengths
season_mean_lens <- z %>%
  filter(value == "len_mean") 

# calculate length-specific growth values (Lm2 - Lm1 / Lm1)
season_len_change <- left_join(season_len_change,season_mean_lens) %>%
  mutate(`1_2` = `1_2` / `Early Summer`,
         `2_3` = `2_3` / `Mid-Summer`,
         `3_4` = `3_4` / `Late Summer`) %>%
  select(-`Early Summer`, -`Mid-Summer`,-`Late Summer`, -`Fall`)


# divide length-specfic growth values by time intervals
# lsc = "length-specific chnage"; #_# = interval between seasons
season_len_change <- left_join(season_len_change,ss,by = "sim_1") %>%
  mutate(`1_2_lsc` = `1_2` / `Early Summer`,
         `2_3_lsc` = `2_3` / `Mid-Summer`,
         `3_4_lsc` = `3_4` / `Late Summer`) %>%
  select(-one_of(c("1_2","2_3","3_4","Early Summer","Mid-Summer","Late Summer","Fall","value"))) %>%
  gather(season_interval,val,`1_2_lsc`:`3_4_lsc`) %>%
  separate(sim_1,sep = "_",into = c("year","Hydrologic.Segment","river")) %>%
  unite(col = river_reach, Hydrologic.Segment,river,sep = " ", remove = F) %>%
  rename(ls_growth = val)

Plot length-specific growth values

# Length-specific growth plots
 p <- ggplot(season_len_change,aes(season_interval,ls_growth, color = river,shape = Hydrologic.Segment)) +
  geom_jitter(width = 0.2, size = 3) +
  theme_bw() + 
  xlab("Season Interval") + 
  ylab("Seasonal Growth Rate") + 
  ggtitle("Seasonal Growth Rates\n(Length)")
p


p + facet_grid(year~.)


p + facet_grid(year~spp)


p1 <- ggplot(season_len_change,aes(season_interval,ls_growth, color = river)) +
  geom_jitter(width = 0.2, size = 3) +
  theme_bw() + 
  xlab("Season Interval") + 
  ylab("Seasonal Growth Rate") + 
  ggtitle("Seasonal Growth Rates\n(Length)")

p1 + facet_grid(year~`Hydrologic.Segment`)

Join temperature metrics to growth rate values

# create table that has temperature metrics paired with size-specific growth values

# fork length
season_len_change <- season_len_change %>%
  mutate(season = fct_recode(as.factor(season_interval),         
                             "Early Summer"="1_2_lsc", 
                             "Mid-Summer" = "2_3_lsc",
                             "Late Summer" ="3_4_lsc")) 

# join seasonal size change values to site/season-level temperature metrics
growth_temp_tbl <- left_join(season_len_change,season_site_temps)

AICc exploration of seasonal water temperature metrics and growth

Purpose: explore GLME models to understand the major influences on growth rates. Do any temperature metrics have significant influences?

Explore two different model set-ups:

1.) Season and mean seasonal temperature as additive effects on fork length; also potential interactions with season for year/spp/river/age.

2.) Season and seasonal maximum temperature as an additive effects on fork length; potential interactions with season for year/spp/river/age.

Others: –> non-linear models? (nlme)

Take Note:

From Burnham and Anderson 2002: " We suggest presenting a table to show the value of the maximized log-likelihood [log(L)], number of estimable parameters (K), the appropriate information criterion used (e.g., QAICc), the differences (Ai), and the Akaike weights (wi) for each model in the Results section."

1.) Is mean water temperature supported as a predictive factor in a GLME model?


# response variable:: growth rate (weight or length).  what factors are predictive of it?  are any temperature metrics predictive?

require(AICcmodavg)

# scale response variables
growth_temp_tbl_scl <- growth_temp_tbl %>%
  mutate(ls_growth_scl = scale(ls_growth))

### --> Do I also need to log-transform these response variables???

### see methods in https://rdrr.io/cran/AICcmodavg/man/AICcmodavg-package.html

require(lme4)

##set up  fork length glmm candidate models     
len.mods <- list()

# Develop list of candidate models based on a priori hypotheses

# 1.) Full model
len.mods[[1]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           + season_avg_temp
                           + season*season_avg_temp
                           + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)


# 2.) Reduced model 1; mean seasonal temp not a predictor variable
len.mods[[2]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           #+ season_avg_temp
                           + season*season_avg_temp
                           + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 3.) Reduced model 2; test
len.mods[[3]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           #+ season_avg_temp
                           #+ season*season_avg_temp
                           + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 4.) Reduced model 3; test
len.mods[[4]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                          # + season_avg_temp
                          # + season*season_avg_temp
                          # + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 6.) Reduced model 4; test
len.mods[[5]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                          # + season_avg_temp
                          # + season*season_avg_temp
                          # + season*river   
                          # + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 7.) Reduced model 5; test
len.mods[[6]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                          # + season_avg_temp
                          # + season*season_avg_temp
                          # + season*river   
                          # + season*spp
                          # + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

##assign names to each model
Modnames <- c("full","reduced1","reduced2","reduced3","reduced4","reduced5") 

##model selection table based on AICc
aictab(cand.set = len.mods, modnames = Modnames)


# question: what is "Res.LL (residual log-liklihood)"?

##compute evidence ratio
evidence(aictab(cand.set = len.mods, modnames = Modnames))
## 
## Evidence ratio between models 'reduced5' and 'reduced4':
## 11.21

2.) Is seasonal max water temperature supported as a predictive factor in a GLME model?

##set up  fork length glmm candidate models     
len.mods <- list()

# Develop list of candidate models based on a priori hypotheses

# 1.) Full model
len.mods[[1]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           + max_temp
                           + season*max_temp
                           + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)


# 2.) Reduced model 1; mean seasonal temp not a predictor variable
len.mods[[2]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           #+ max_temp
                           + season*max_temp
                           + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 3.) Reduced model 2; test
len.mods[[3]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           #+ max_temp
                           #+ season*max_temp
                           + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 4.) Reduced model 3; test
len.mods[[4]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                          # + max_temp
                          # + season*max_temp
                          # + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 6.) Reduced model 4; test
len.mods[[5]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                          # + max_temp
                          # + season*max_temp
                          # + season*river   
                          # + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 7.) Reduced model 5; test
len.mods[[6]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                          # + max_temp
                          # + season*max_temp
                          # + season*river   
                          # + season*spp
                          # + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)


##assign names to each model
Modnames <- c("full","reduced1","reduced2","reduced3","reduced4","reduced5") 

##model selection table based on AICc
aictab(cand.set = len.mods, modnames = Modnames)


# question: what is "Res.LL (residual log-liklihood)"?

##compute evidence ratio
evidence(aictab(cand.set = len.mods, modnames = Modnames))
## 
## Evidence ratio between models 'reduced5' and 'reduced4':
## 11.21

The models including either season mean temp or season max temp are not the highest ranked using AICc selection.

What about food? Do fish with more food grow faster? What if we include a “fullness” predictor; total calories per fish divided by mean fish length.

Create column that has a value of calories/fish weight for each fish with a diet sample

# import diet calorie info from "fish_diet_content.Rmd" results
diet_cal <- gs_read(gs_title("fish_diet_calories_by_event"), ws = "diet") %>%
  rename(Hydrologic.Segment = section) %>%
  transform(year = as.character(year),
            age = as.character(age))

growth_temp_tbl_scl <- inner_join(growth_temp_tbl_scl,diet_cal, 
                                  by = c("year","river","age","spp","Hydrologic.Segment","season","sample.event")) %>%
   # calculate "fullness"; average calories divided by average fish size
   mutate(fullness_len = avg_len / mean_j,
          fullness_wt = avg_wt / mean_j)

What does the overall relationship between growth rate and “fullness” look like?

ggplot(growth_temp_tbl_scl, aes(fullness_len,ls_growth)) +
  geom_jitter(width = 1) +
  xlim(0,100)

What does the overall relationship between seasonal temperature and growth look like?

# order facets
growth_temp_tbl_scl$season <- factor(growth_temp_tbl_scl$season, levels = 
                                    c("Early Summer",
                                      "Mid-Summer",
                                      "Late Summer"))

ggplot(growth_temp_tbl_scl, aes(season_avg_temp,ls_growth)) +
  facet_wrap(~season) +
  geom_boxplot() +
  geom_point(aes(color = river)) +
  xlim(10,17) +
  ggtitle("Season MEAN Water Temp vs. Size-Specific Growth Rate") +
  ylab("Length-Specific Growth\n (mm/mm * day")


ggplot(growth_temp_tbl_scl, aes(max_temp,ls_growth)) +
  facet_wrap(~season) +
  geom_boxplot() +
  geom_point(aes(color = river)) +
  xlim(10,17) +
  ggtitle("Season MAX Water Temp vs. Size-Specific Growth Rate") +
  ylab("Length-Specific Growth\n (mm/mm * day")

Try AICc model selection when the new “fullness” value is included as a predictor


##set up  fork length glmm candidate models     
len.mods <- list()

# Develop list of candidate models based on a priori hypotheses

# 1.) Full model
len.mods[[1]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           + season_avg_temp
                           + fullness_len
                           + season*season_avg_temp
                           + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)


# 2.) Reduced model 1; mean seasonal temp not a predictor variable
len.mods[[2]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           #+ season_avg_temp
                           + fullness_len
                           + season*season_avg_temp
                           + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 3.) Reduced model 2; test
len.mods[[3]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           #+ season_avg_temp
                           #+ fullness_len
                           + season*season_avg_temp
                           + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 4.) Reduced model 3; test
len.mods[[4]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           #+ season_avg_temp
                           #+ fullness_len
                           #+ season*season_avg_temp
                           + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 6.) Reduced model 4; test
len.mods[[5]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           #+ season_avg_temp
                           #+ fullness_len
                           #+ season*season_avg_temp
                           #+ season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 7.) Reduced model 5; test
len.mods[[6]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           #+ season_avg_temp
                           #+ fullness_len
                           #+ season*season_avg_temp
                           #+ season*river   
                           #+ season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 7.) Reduced model 6; test
len.mods[[7]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           #+ season_avg_temp
                           #+ fullness_len
                           #+ season*season_avg_temp
                           #+ season*river   
                           #+ season*spp
                           #+ season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

##assign names to each model
Modnames <- c("full","reduced1","reduced2","reduced3","reduced4","reduced5","reduced6") 

##model selection table based on AICc
aictab(cand.set = len.mods, modnames = Modnames)


# question: what is "Res.LL (residual log-liklihood)"?

##compute evidence ratio
evidence(aictab(cand.set = len.mods, modnames = Modnames))
## 
## Evidence ratio between models 'reduced6' and 'reduced5':
## NaN

7/21/2018::: Something isn’t quite working right here yet !

7/23:: this document is in the process of being transferred from “bioenerg_data_input_selection_v2” and can successfully knit up to this point.