EPSCoR SCTC Aquatic Ecology

Analysis in progress

R Session Info

sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.5.1  backports_1.1.2 magrittr_1.5    rprojroot_1.3-2
##  [5] tools_3.5.1     htmltools_0.3.6 yaml_2.1.19     Rcpp_0.12.17   
##  [9] stringi_1.2.4   rmarkdown_1.10  knitr_1.20      stringr_1.3.1  
## [13] digest_0.6.15   evaluate_0.11


Load packages




This document explores the spatial and temporal scales at which growth trends of juvenile Chinook/Coho salmon in the Kenai River (2015/2016) may be compared and understood. It also begins exploring GLME models for understanding if water temperature metrics are related to growth trends.


The most granular level of segregation in to which fish size data may be organized is as follows:

  • Year (2015/2016)
  • River (Beaver Creek, Russian River, Ptarmigan Creek, Kenai River)
  • Fish Sampling Site (Lower, Middle, and Upper for each study drainage [Kenai River and Ptarmigan Creek do not have upper sites])
  • Fish Age (0,1 [All Age 2 Coho and Age 1 Chinook excluded from growth analyses])
  • Species (Coho, Chinook)
  • Season (Interval of time between site visits [Early Summer, Mid-Summer, Late Summer, Fall])


Set custom plotting themes


Import results from “season_temp_stats.Rmd”, googlesheets file “sim_extents_by_site” <>

This sheet contains values for season temporal extent and basic temperature metrics associated with each fish sampling site.

season_site_temps <- gs_title("sim_extents_by_site") %>%
  gs_read(ws = "Sheet1")


When did sampling occur?

Define temporal extents of seasons at chosen spatial/temporal scale.

A “season” is defined as the period of time between visits to a fish sampling site; generally about ~1 month.

# Define temporal extents of proposed simulations

# NOTE 1/22/18: this code also exists as a script at "/Users/bmeyer/Google Drive/Thesis/R Analysis/Thesis Analyses R Project/Overall_scripts_2015_2016/sampling_periods.R".  Attempted to just source("\path") this script in to the markdown document but, 'source' function prevents knitting through markdown for some reason.  

############
#
# 2015
#
###########

# Verify that sheet is present, create object of sheet with all tabs
SCTC2015 <- gs_title("2015 EPSCoR SCTC.xlsx")

#read in specific worksheet containing fishing data
effort2015 <- SCTC2015 %>% 
  gs_read(ws = "B Fishing Data",col_names=TRUE, skip=2) %>%
  #rename needed columns  
  rename(site.id=Site_ID,
         sample.event=Sample_Event) %>%
  
  #select desired columns
  select(site.id,River,sample.event,Reach,sample.event.num,deploy_dt,collect_dt) %>%
  
  #filter out blanks
  filter(!is.na(sample.event)) %>%
  
  # transform columns to desired classes
  transform(site.id=as.factor(site.id),
            sample.event=as.factor(sample.event),
            sample.event.num=as.factor(sample.event.num)) 


###################
#
# 2016
#
###################

# Verify that sheet is present, create object of sheet with all tabs
SCTC2016 <- gs_title("2016_EPSCoR_Aquatic_Ecology_Database")

#read in specific worksheet containing fishing data
effort2016 <- SCTC2016 %>% 
  gs_read(ws = "B Fishing Data",col_names=TRUE, skip=2) %>%
  
  #rename needed columns  
  rename(site.id=Site_ID,
         sample.event=Sample_Event) %>%
  
  #select desired columns
  select(site.id,River,sample.event,sample.event.num,Reach,deploy_dt,collect_dt) %>%
  
  #filter our blanks
  filter(!is.na(sample.event)) %>%
  
  # transform columns to desired classes
  transform(site.id=as.factor(site.id),
            sample.event=as.factor(sample.event),
            sample.event.num=as.factor(sample.event.num)) 


# bind 2015 & 2016 data; format datetimes
effort <- bind_rows(effort2015,effort2016) %>%
  transform(collect_dt = anytime(collect_dt),
            deploy_dt = anytime(deploy_dt)) %>%
  mutate(year = year(collect_dt)) 



# Define Temporal Extents of proposed simulations 

# "Simulation" temporal extent
# Time period from first date of each season to first date of following season

# prepare data to tibble form
sim_extents <- effort %>% 
 # select(-site.id) %>%
  group_by(sample.event,Reach,River, year, sample.event.num) %>% 
  nest() %>% 
  mutate(season.start = map_dbl(data, ~min(.$deploy_dt)),
         season.end = map_dbl(data, ~max(.$deploy_dt))
         ) %>%
  transform(season.start = anytime(season.start),
            season.end = anytime(season.end)) %>%
  arrange(River,Reach,year,sample.event.num)

# prepare data such that end of one season is beginning of next
t <- sim_extents %>%
  mutate(season.end_x = lead(season.end)) %>%
  select(-season.end) %>%
  arrange(River,year,sample.event.num)

# prepare data such that final season of each year has duration of 30 days.
# 2015 had 3 seasons, 2016 had 4 seasons.

t_end15 <- t %>%
  filter(sample.event.num == 3,
         year == 2015) %>%
  mutate(season.end = season.start + days(30)) %>%
  select(-season.end_x)

t_end16 <- t %>%
  filter(sample.event.num == 4,
         year == 2016) %>%
  mutate(season.end = season.start + days(30)) %>%
  select(-season.end_x)

t_end15_1 <- t %>%
  filter(sample.event.num != 3,
         year != 2016) %>%
  rename(season.end = season.end_x)

t_end16_1 <- t %>%
  filter(sample.event.num != 4,
         year != 2015) %>%
  rename(season.end = season.end_x)

# final version of simulation extents
sim_extents <- bind_rows(t_end15_1,t_end15,t_end16_1,t_end16) %>%
  rename(Stream_Name = River,
         sim.start = season.start,
         sim.end = season.end) %>%
  transform(year = as.factor(year)) %>% 
  mutate(start_doy = yday(sim.start),
         end_doy = yday(sim.end)) %>%
  mutate(season = fct_recode(as.factor(sample.event.num),         #code sampling events as seasons
                             "Early Summer"="1", 
                             "Mid-Summer" = "2",
                             "Late Summer" ="3",
                             "Fall" = "4")) %>%
  mutate(river_reach = paste(Reach,Stream_Name),
         days = end_doy - start_doy) %>%
  filter(sample.event != c("UPC-2015-1","MPC-2016-3"))    # we want to exclude both of these non-events

# remove temporary objects
rm(t,t_end15,t_end15_1,t_end16,t_end16_1,effort2015,effort2016)


Fish sampling periods


#  make plot that visualizes extent of each simulation period faceted by year and river

# order facets
sim_extents$river_reach <- factor(sim_extents$river_reach, levels = 
                                    c("Lower Beaver Creek",
                                      "Middle Beaver Creek",
                                      "Upper Beaver Creek",
                                      "Lower Russian River",
                                      "Middle Russian River",
                                      "Upper Russian River",
                                      "Lower Ptarmigan Creek",
                                      "Middle Ptarmigan Creek",
                                      "Lower Kenai River",
                                      "Middle Kenai River"))

# color plot
 p <- ggplot(sim_extents) +
  geom_segment(aes(x = start_doy, xend= end_doy,
                   y = year, yend = year, color = season),
               size = 3) + 
  facet_grid(river_reach~.) +
  scale_x_continuous(breaks = c(152,182,213,244),
                     labels = c("June","July","August","Sept")) +
  theme_bw() +
  ggtitle("Fish Sampling Periods") +
  xlab("") +
  ylab("") +
  theme(strip.text.y = element_text(angle = 0, size = 14)) +
  theme(plot.title = element_text(size= 30, face = "bold", hjust = 0.3)) +
  theme(legend.title=element_text(size=24, face = "bold")) +
  theme(axis.text.x = element_text(size = 14),
        axis.text.y = element_text(size = 14),
        legend.text = element_text(size = 14))
 
# print color plot
p


# print black and white plot
#p + scale_color_grey()


What is the range of lengths for seasons?

range(sim_extents$days)
## [1] 21 48

The range of days is within a season is 21 to 48.





Calculate catch summary data

# Fish ages used here were assigned by visual inspection of thresholds in length-frequency distribution charts as described in the script "fish_age_structure.R".  As time permits, these age assignment results will be compared with bayesian methods described in Sethi et al. 

# Prior to this approach, age-length key methods as described in Ch. 5 of Ogle 2016 ("Introductory fisheries analyses with R") were used. This appraoch was deemed to be unsuccessful in Spring 2018.


# Verify that sheets are present, create object of sheet with all tabs
SCTC2015 <- gs_title("2015 EPSCoR SCTC.xlsx")
SCTC2016 <- gs_title("2016_EPSCoR_Aquatic_Ecology_Database")

# read in 2015 length and weight data
fish_ages15 <- SCTC2015 %>% 
  gs_read(ws = "C 2015 Diet & LW Data",col_names=TRUE) %>%
  select(river,sample.id,sample.event,spp,Weight_g,Length_mm,Age_manual2,sample.event.num,Date,`Site ID`,reach_name, DS, Count,DOY,`Gear ID`) %>%
  mutate(season = fct_recode(as.factor(sample.event.num),         #code sampling events as seasons
                             "Early Summer"="1", 
                             "Mid-Summer" = "2",
                             "Late Summer" ="3")) %>%
  rename(wt_g = Weight_g,
         len_mm = Length_mm,
         age = Age_manual2,
         Hydrologic.Segment = reach_name,
         diet_sample = DS,
         Gear_ID = `Gear ID`) %>%
  transform(Date = anydate(Date)) %>%
  filter(spp %in% c("Coho", "Chinook")) %>% # filter out all species except Chinook and Coho
  filter(Gear_ID != "Aq. Net")  # filter out fish caught opportunistically with aquarium net for size evaluations

# read in length and weight worksheet 2016
fish_ages16 <- SCTC2016 %>% 
  gs_read(ws = "C 2016 Diet & LW Data",col_names=TRUE) %>%
  select(river,sample.id,sample.event,spp,Weight_g,Length_mm,Age_manual2,sample.event.num,Date,`Site ID`,reach_name,DS,Count,DOY) %>%
  mutate(season = fct_recode(as.factor(sample.event.num),         #code sampling events as seasons
                             "Early Summer"="1", 
                             "Mid-Summer" = "2",
                             "Late Summer" ="3",
                             "Fall" ="4")) %>%
  rename(wt_g = Weight_g,
         len_mm = Length_mm,
         age = Age_manual2,
         Hydrologic.Segment = reach_name,
         diet_sample = DS) %>%
  transform(Date = anydate(Date),
            wt_g = as.numeric(wt_g)) %>%
  filter(spp %in% c("Coho", "Chinook")) # filter out all species except Chinook and Coho


#join 2015 and 2016 data
fish_ages <- bind_rows(fish_ages15,fish_ages16)

#prep dataframe containing age data
dat <- fish_ages %>% 
  separate(sample.event, c("reach","year","sample.event.num"), sep = "-", remove = F) %>%
  unite(river_spp, river, spp, remove = F, sep = " ") %>%
  # manually order appearance of seasons and rivers for plots
  transform(year = as.factor(year),
            river = factor(river, levels=c("Beaver Creek",
                                              'Russian River',
                                              'Ptarmigan Creek',
                                              'Kenai River')),
            season = factor(season, levels = c("Early Summer",
                                     "Mid-Summer",
                                     "Late Summer",
                                     "Fall")), 
            river_spp = as.factor(river_spp),
            age = as.factor(age)) %>%
  mutate(river_spp = paste(river,"\n",spp)) %>%
  mutate(river_reach = paste(Hydrologic.Segment,river)) %>%
  mutate(river_spp_yr = paste(year,river,"\n",spp)) %>%
  mutate(doy = yday(Date)) %>%
  # filter out blanks where fish age/length/weight data is missing
  filter(!is.na(age),
         !is.na(wt_g),
         !is.na(len_mm)) %>%
 # age 2 fish: this cohort is sparse and likely departing for the ocean.
  filter(age != 2) %>%
    select(river_spp,river,river_reach,sample.id, sample.event,reach,year,sample.event.num,spp,wt_g,len_mm,
           age,season,Date,Site.ID,Hydrologic.Segment,Count,diet_sample,river_spp_yr,doy)

#housekeeping: remove Age 1 Chinook; not considered for growth models
age1chnk <- dat %>% filter(spp == "Chinook",
                           age == 1)
dat <- anti_join(dat,age1chnk)
rm(age1chnk)

# catch summary table; with age 1 chnk removed and all age 2 coho removed
catch_summary <- dat %>%
  group_by(year,river,season,river_reach,spp,age,diet_sample) %>%
  summarise(ct = sum(Count)) %>%
  spread(diet_sample,ct) %>%
  replace(is.na(.), 0) %>%
  mutate(total_fish = sum(N,Y)) %>%
  rename(diet_sample = Y,
         len_wt_only = N)

rm(fish_ages15,fish_ages16)


Fish catch data table


datatable(catch_summary, caption = "Juvenile Chinook and Coho Catch Data", 
          filter = 'top', options = list(
  pageLength = 5, autoWidth = TRUE)) #%>%
  #formatSignif(c("log.len_slope_annual","log.len_intercept_annual"), digits = 2) 


# 2015 & 2016 catch data

# Create bar graph(s) showing proportions (by spp, year, river, diet sample (y/n))  
# 1.) By overall drainage and year
#   a.) 2015
#   b.) 2016
#   c.) both years juxtaposed

# read in desired worksheet
catch15 <- SCTC2015 %>% 
  gs_read(ws = "C 2015 Diet & LW Data",col_names=TRUE) %>%
  select(Year,river,spp,Count,sample.event.num,DS) %>%
  mutate(season = fct_recode(as.factor(sample.event.num),         #code sampling events as seasons
                             "Early Summer"="1", 
                             "Mid-Summer" = "2",
                             "Late Summer" ="3")) %>%
  rename(diet_sample = DS) %>%
  filter(spp %in% c("Coho", "Chinook")) # filter out all species except Chinook and Coho

# create summary of catch by drainage, species, and lavaged vs non-lavaged
catch_summary15 <- catch15 %>%
  group_by(Year,river,spp,diet_sample) %>%
  summarise(ct = sum(Count)) %>%
  mutate(count_label = paste("( n=", as.character(ct),")")) #create count labels

# specify order of drainages
catch_summary15$river <- factor(catch_summary15$river, levels = c("Beaver Creek", 
                                                              "Russian River", 
                                                              "Ptarmigan Creek", 
                                                              "Kenai River"))

# create theme to plot 2015 catch data
c15 <- ggplot(catch_summary15, aes(x = spp, y = ct, fill = diet_sample)) +
  geom_bar(stat = "identity", color = "white") +
  facet_wrap(~river,nrow=1) +
  theme_bw() +
  scale_fill_discrete(name="Diet Sample\nCollected") +
  theme(legend.key.size = unit(1.7, 'lines')) +
  theme(plot.title = element_text(size = 26, face = "bold"),
        axis.text.x = element_text(size=14, face="bold"),
        axis.text.y = element_text(size=16, face="bold"),
        axis.title.x = element_text(size=22, face="bold"),
        axis.title.y = element_text(size = 22, face = "bold")) +
  theme(strip.text.x = element_text(size=16, face="bold")) +
  theme(legend.title = element_text(size=20, face="bold")) + 
  theme(legend.text = element_text(size = 16, face = "bold")) +
  xlab("Species") +
  ylab("Count") 


# plot and title for 2015 data
# c15 + ggtitle("2015 Juvenile Chinook and Coho\n Catch Data") # can un-hashtag to show 2015 plot


# 2016 

# read in desired worksheet
catch16 <- SCTC2016 %>% 
  gs_read(ws = "C 2016 Diet & LW Data",col_names=TRUE) %>%
  select(Year,river,spp,Count,sample.event.num,DS) %>%
  mutate(season = fct_recode(as.factor(sample.event.num),         #code sampling events as seasons
                             "Early Summer"="1", 
                             "Mid-Summer" = "2",
                             "Late Summer" ="3",
                             "Fall" = "4")) %>%
  rename(diet_sample = DS) %>%
  filter(spp %in% c("Coho", "Chinook")) # filter out all species except Chinook and Coho

# create summary of catch by drainage, species, and lavaged vs non-lavaged
catch_summary16 <- catch16 %>%
  group_by(Year,river,spp,diet_sample) %>%
  summarise(ct = sum(Count)) %>%
  mutate(count_label = paste("( n=", as.character(ct),")")) #create count labels

# specify order of drainages
catch_summary16$river <- factor(catch_summary16$river, levels = c("Beaver Creek", 
                                                              "Russian River", 
                                                              "Ptarmigan Creek", 
                                                              "Kenai River"))

# create theme to plot catch data
c16 <- ggplot(catch_summary16, aes(x = spp, y = ct, fill = diet_sample)) +
  geom_bar(stat = "identity", color = "white") +
  facet_wrap(~river,nrow=1) +
  theme_bw() +
  scale_fill_discrete(name="Diet Sample\nCollected") +
  theme(legend.key.size = unit(1.7, 'lines')) +
  theme(plot.title = element_text(size = 26, face = "bold"),
        axis.text.x = element_text(size=14, face="bold"),
        axis.text.y = element_text(size=16, face="bold"),
        axis.title.x = element_text(size=22, face="bold"),
        axis.title.y = element_text(size = 22, face = "bold")) +
  theme(strip.text.x = element_text(size=16, face="bold")) +
  theme(legend.title = element_text(size=20, face="bold")) + 
  theme(legend.text = element_text(size = 16, face = "bold")) +
  xlab("Species") +
  ylab("Count") 


# plot and title for 2016 data
# c16 + ggtitle("2016 Juvenile Chinook and Coho\n Catch Data") # can un-hashtag to show 2016 plot

# combined plot for 2015 and 2016 data

# create combined table
all <- bind_rows(catch_summary15,catch_summary16)

allplot <- ggplot(all, aes(x = spp, y = ct, fill = diet_sample, color = spp)) +
  geom_bar(stat = "identity", color = "white") +
  facet_wrap(Year~river,nrow=2) +
  theme_bw() +
  scale_fill_discrete(name="Diet Sample\nCollected") +
  theme(legend.key.size = unit(1.7, 'lines')) +
  theme(plot.title = element_text(size = 26, face = "bold"),
        axis.text.x = element_text(size=14, face="bold"),
        axis.text.y = element_text(size=16, face="bold"),
        axis.title.x = element_text(size=22, face="bold"),
        axis.title.y = element_text(size = 22, face = "bold")) +
  theme(strip.text.x = element_text(size=14, face="bold")) +
  theme(legend.title = element_text(size=20, face="bold")) + 
  theme(legend.text = element_text(size = 16, face = "bold")) +
  xlab("Species") +
  ylab("Count") 

allplot





Linear mixed models

To explore spatial/temporal scale of growth


GLME exploration of size data (following example on https://ourcodingclub.github.io/2017/03/15/mixed-models.html)


At what temporal/spatial scale may we consider and compare growth rates?

  • Our hypothesis involves an assumption that growth rates differ in different places and times. Is this true? We are looking to control for the effects of year, age, spp. (We are antcipating that different rivers/sites have different growth rates.)

  • Is there an association between fork length and day (for each of our species/age/year cohorts) after controlling for variation in rivers and sites within rivers?


Age 0 plot diagnotsics
layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page 
plot(basic.lm.0)


Age 1 plot diagnostics
layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page 
plot(basic.lm.1)


Discussion: there is clearly a relationship between doy (Day of Year) and fork length in both years, but the basic model of doy + age doesn’t explain much of the variation.


What drives the variation in the relationship between doy and size? Let’s consider growth slopes at smaller temporal and spatial scales.


Calculate annual growth trends at temporal/spatial scale of year ~ spp ~ age ~ river ~ river_reach


# calc slope of each size trend for len and wt...??
all_slopes <- dat %>%
  transform(river_spp_yr = as.factor(river_spp_yr),
            year = as.factor(year),
            spp = as.factor(spp),
            log.len_mm = as.numeric(log.len_mm)) %>%
  mutate(river_reach = paste(Hydrologic.Segment,river)) %>%
  select(river_spp_yr,Hydrologic.Segment,river_reach,river,year,spp,age,log.len_mm,doy) 

# length model formulation for growth trend
len_growth_model <- function(df) {
  lm(log.len_mm ~ doy, data = df)
}


# create table of length and weight trend slopes for each annual sequence 
size_age_spp_year <- all_slopes %>% 
  group_by(river,river_reach, spp, age, year) %>% 
  nest() %>% 
  # calculate slopes & intercept for len_mm ~ day of year
  mutate(
    age_len = map(data, len_growth_model),
    tidy_age_len = map(age_len, broom::tidy),
    log.len_slope_annual = map_dbl(tidy_age_len, ~.$estimate[2]),
    log.len_intercept_annual = map_dbl(tidy_age_len, ~.$estimate[[1]])) %>%
  select(year,river,river_reach, spp, age, log.len_slope_annual, log.len_intercept_annual)


Show annual linear growth rate summary table


datatable(size_age_spp_year, caption = "Juvenile Chinook and Coho Catch and Size Trend Summary Data", 
          filter = 'top', options = list(
  pageLength = 5, autoWidth = TRUE)) %>%
  formatSignif(c("log.len_slope_annual","log.len_intercept_annual"), digits = 2) 


Minimum standards for including size data from a site visit


We want to apply some minimum standards woth regards to site vists for including data as part of growth rate analyses/comparisons. We will exclude fish size data if:

  • a.) There were n < 3 fish obersations in a site visit (segregated by species/age)
  • b.) The overall annual growth relationship at the site is < 0 (likely indicative of data paucity rather than fish “shrinking” in size).


Apply minimum data standards

# filter out site visits with < 3 fish per analysis category
catch_summary_mod <- catch_summary %>%
  filter(total_fish >= 3)

# filter out negative/absent annual slopes
size_age_spp_year <- size_age_spp_year %>%
  filter(log.len_slope_annual > 0)

# join reduced datasets
size_age_spp_year <- inner_join(catch_summary_mod,size_age_spp_year, by = c("year","river_reach","river","spp","age"))


# apply filter to overall dataset
dat <- inner_join(dat, size_age_spp_year, by = c("year","river_reach","season","river","spp","age")) %>%
  select(-diet_sample.x) %>%
  rename(diet_sample = diet_sample.y)

# recreate catch summary with filter applied
catch_summary_filt <- dat %>%
  group_by(year,river,season,river_reach,spp,age,diet_sample) %>%
  summarise(ct = sum(Count)) %>%
  mutate(diff = ct-diet_sample) %>%
  rename(total_fish = ct,
         len_wt_only = diff) %>%
  select(year,river,season,river_reach,spp,age,len_wt_only,diet_sample,total_fish)


When data is segregated by year/site/season/species/age, 72% (108/150) of potential groups by site visit remain after applying these standards. 36 lost because fish count by site visit was n < 3 per spp/age; 6 lost becuase overall annual growth slope was m < 0.




datatable(catch_summary_filt, caption = "Selected Subset of Juvenile Chinook and Coho Catch Data", 
          filter = 'top', options = list(
  pageLength = 5, autoWidth = TRUE)) 


With minimum data inclusion standards applied, what does our empirical fork length distribution look like?

# Overall Coho lengths
dat %>%
  filter(spp == "Coho") %>% 
ggplot(aes(season,len_mm, fill = as.factor(age))) +
  geom_boxplot(outlier.size = 0) +
  facet_grid(.~year) +
  ggtitle("Overall Coho Lengths") +
  bm_theme 


# Overall Chinook lengths
dat %>%
  filter(spp == "Chinook") %>% 
ggplot(aes(season,len_mm, fill = as.factor(age))) +
  geom_boxplot(outlier.size = 0) +
  facet_grid(.~year) +
  ggtitle("Overall Chinook Lengths") +
  bm_theme 

For more detailed plots and segregation of fish size distributions, see http://rpubs.com/bmeyer/352129.


Visualize slopes of annual linear raw size trends from selected subset.

# Note: color switches between species in Age 0 vs Age 1 charts.  How to fix colors?

# Age 0 fish lengths
ggplot(data = subset(dat, age == 0),aes(doy,len_mm, color = spp, shape = Hydrologic.Segment)) + 
  facet_grid(year~river) +
  geom_jitter(width = 0.06) +
  geom_smooth(method = "lm") +
  ggtitle("Age 0 Fish Fork Lengths") + 
  theme_bw()


# Age 1 fish lengths
ggplot(data = subset(dat, age == 1),aes(doy,len_mm, color = spp, shape = Hydrologic.Segment)) + 
  facet_grid(year~river) +
  geom_jitter(width = 0.06) +
  geom_smooth(method = "lm") +
  ggtitle("Age 1 Fish Fork Lengths (Coho only)") + 
  theme_bw()


# Age 0 fish weights
#ggplot(data = subset(dat, age == 0),aes(doy,wt_g, color = spp, shape = Hydrologic.Segment)) + 
#  facet_grid(year~river) +
#  geom_jitter(width = 0.06) +
#  geom_smooth(method = "lm") +
#  ggtitle("Age 0 Fish Weights") + 
#  theme_bw()

# Age 1 fish weights
#ggplot(data = subset(dat, age == 1),aes(doy,wt_g, color = spp, shape = Hydrologic.Segment)) + 
#  facet_grid(year~river) +
#  geom_jitter(width = 0.06) +
#  geom_smooth(method = "lm") +
#  ggtitle("Age 1 Fish Weights (Coho only)") + 
 # theme_bw()

Looks like the Chinook almost always grow larger than the coho. Is this an artifact of small sample size or is there a possible mechanism?



Question: is there a way to check for linearity assumptions on all of these relationships, then just include the ones that “pass” ?



GLME Models


Other than the passage of time within a growing season, what else has noteworthy effects in predicting fish size? Use general linear mixed-models (GLME) to explore further.


Define fixed vs. random effects for GLME models.


Fixed: factors we anticipate will have an effect on the dependent/response variable (fish size). Random: factors we want to control for; we are not specifically interested in their impact on fish size.


NOTE: random effects generally must have > 2 levels.


FIXED EFFECTS * year (fixed, 2 levels) * species (fixed, 2 levels) * age (fixed, 2 levels) * river (fixed, 4 levels)


RANDOM EFFECTS * site, nested within river (random, 10 levels)


Notes:


Consider linear mixed effect models for log.length ~ doy + year + river + spp + age + (1|river_reach))

# load lme4
require(lme4)

# scale response variables
dat_scl <- dat %>%
  mutate(log.len_mm_scl = scale(log.len_mm))

### --> Do I also need to log-transform these response variables???

# fork length glme
fit.len <- lmer(log.len_mm_scl ~ doy + year + river + spp + age + (1|river_reach), data = dat_scl)
summary(fit.len)
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## log.len_mm_scl ~ doy + year + river + spp + age + (1 | river_reach)
##    Data: dat_scl
## 
## REML criterion at convergence: 5874.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.4133 -0.6831 -0.0457  0.6690  3.6117 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  river_reach (Intercept) 0.03518  0.1876  
##  Residual                0.24532  0.4953  
## Number of obs: 4049, groups:  river_reach, 10
## 
## Fixed effects:
##                        Estimate Std. Error t value
## (Intercept)          -2.3990313  0.1303917 -18.399
## doy                   0.0102780  0.0003314  31.012
## year2016              0.2662377  0.0182980  14.550
## riverRussian River   -0.0646612  0.1552262  -0.417
## riverPtarmigan Creek  0.0362479  0.1740294   0.208
## riverKenai River     -0.4676067  0.1736228  -2.693
## sppCoho              -0.5280689  0.0268281 -19.683
## age1                  1.8223222  0.0271785  67.050
## 
## Correlation of Fixed Effects:
##             (Intr) doy    yr2016 rvrRsR rvrPtC rvrKnR sppCoh
## doy         -0.494                                          
## year2016    -0.119 -0.071                                   
## rivrRssnRvr -0.615  0.034  0.020                            
## rvrPtrmgnCr -0.509 -0.040  0.037  0.441                     
## riverKenRvr -0.547 -0.007  0.032  0.449  0.395              
## sppCoho      0.048 -0.381  0.242 -0.040  0.014  0.059       
## age1        -0.313  0.474 -0.030  0.094 -0.018  0.065 -0.430
plot(fit.len)

Discussion: The nested random effect of river reach (site) accounts for 12.54% [Random Effects: Variances; (0.03518 / (0.03518 + 0.24532)) x 100] of the variation in fork length.


After controlling for reach within river; the magnitude of t-values for the fixed effects of age, year, and species are all notably larger than the others; indicating that degree to which these factors are predictive of size.

We can visualize this with a figure of effect size from the “jtools”" package:


Effect size results for model log.len_mm_scl ~ doy + year + river + spp + age + (1|river_reach)


# load jtools package
require(jtools)

# fork length, glme model summary
summ(fit.len)
## MODEL INFO:
## Observations: 4049
## Dependent Variable: log.len_mm_scl
## Type: Mixed effects linear regression 
## 
## MODEL FIT:
## AIC = 5894.09, BIC = 5957.15
## Pseudo-R² (fixed effects) = 0.72
## Pseudo-R² (total) = 0.75 
## 
## FIXED EFFECTS:
##                       Est. S.E. t val.    p    
## (Intercept)          -2.40 0.13 -18.40 0.00 ***
## doy                   0.01 0.00  31.01 0.00 ***
## year2016              0.27 0.02  14.55 0.00 ***
## riverRussian River   -0.06 0.16  -0.42 0.68    
## riverPtarmigan Creek  0.04 0.17   0.21 0.84    
## riverKenai River     -0.47 0.17  -2.69 0.01   *
## sppCoho              -0.53 0.03 -19.68 0.00 ***
## age1                  1.82 0.03  67.05 0.00 ***
## 
## p values calculated using Kenward-Roger d.f. = 20.79 
## 
## RANDOM EFFECTS:
##        Group   Parameter Std. Dev.
##  river_reach (Intercept)      0.19
##     Residual                  0.50
## 
## Grouping variables:
##        Group # groups  ICC
##  river_reach       10 0.13
plot_summs(fit.len)

Notes: age has the greatest magnitude of effect size on fish fork length. Species and year have smaller effects. River has the greatest level of uncertainty.


Decision: we will consider fish growth at the scale of segregated year ~ spp ~ age ~ reach ~ river. A priori hypothesis is still that growth differs among environments (rivers and/or river reaches) and is driven by diet & temperatrure.


AICc exploration of seasonal water temperature metrics and growth


Purpose: explore GLME models to understand the major influences on growth rates. Do any temperature metrics have significant influences?


Explore two different model set-ups:

1.) Season and mean seasonal temperature as additive effects on fork length; also potential interactions with season for year/spp/river/age.

2.) Season and seasonal maximum temperature as an additive effects on fork length; potential interactions with season for year/spp/river/age.

Others: –> non-linear models? (nlme)


Take Note:

From Burnham and Anderson 2002: " We suggest presenting a table to show the value of the maximized log-likelihood [log(L)], number of estimable parameters (K), the appropriate information criterion used (e.g., QAICc), the differences (Ai), and the Akaike weights (wi) for each model in the Results section."


1.) Is mean water temperature supported as a predictive factor in a GLME model?


# response variable:: growth rate (weight or length).  what factors are predictive of it?  are any temperature metrics predictive?

require(AICcmodavg)

# scale response variables
growth_temp_tbl_scl <- growth_temp_tbl %>%
  mutate(ls_growth_scl = scale(ls_growth))

### --> Do I also need to log-transform these response variables???

### see methods in https://rdrr.io/cran/AICcmodavg/man/AICcmodavg-package.html

require(lme4)

##set up  fork length glmm candidate models     
len.mods <- list()

# Develop list of candidate models based on a priori hypotheses

# 1.) Full model
len.mods[[1]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           + season_avg_temp
                           + season*season_avg_temp
                           + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)


# 2.) Reduced model 1; mean seasonal temp not a predictor variable
len.mods[[2]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           #+ season_avg_temp
                           + season*season_avg_temp
                           + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 3.) Reduced model 2; test
len.mods[[3]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           #+ season_avg_temp
                           #+ season*season_avg_temp
                           + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 4.) Reduced model 3; test
len.mods[[4]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                          # + season_avg_temp
                          # + season*season_avg_temp
                          # + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 6.) Reduced model 4; test
len.mods[[5]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                          # + season_avg_temp
                          # + season*season_avg_temp
                          # + season*river   
                          # + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 7.) Reduced model 5; test
len.mods[[6]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                          # + season_avg_temp
                          # + season*season_avg_temp
                          # + season*river   
                          # + season*spp
                          # + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

##assign names to each model
Modnames <- c("full","reduced1","reduced2","reduced3","reduced4","reduced5") 

##model selection table based on AICc
aictab(cand.set = len.mods, modnames = Modnames)

# question: what is "Res.LL (residual log-liklihood)"?

##compute evidence ratio
evidence(aictab(cand.set = len.mods, modnames = Modnames))
## 
## Evidence ratio between models 'reduced5' and 'reduced4':
## 11.21


2.) Is seasonal max water temperature supported as a predictive factor in a GLME model?

##set up  fork length glmm candidate models     
len.mods <- list()

# Develop list of candidate models based on a priori hypotheses

# 1.) Full model
len.mods[[1]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           + max_temp
                           + season*max_temp
                           + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)


# 2.) Reduced model 1; mean seasonal temp not a predictor variable
len.mods[[2]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           #+ max_temp
                           + season*max_temp
                           + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 3.) Reduced model 2; test
len.mods[[3]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           #+ max_temp
                           #+ season*max_temp
                           + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 4.) Reduced model 3; test
len.mods[[4]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                          # + max_temp
                          # + season*max_temp
                          # + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 6.) Reduced model 4; test
len.mods[[5]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                          # + max_temp
                          # + season*max_temp
                          # + season*river   
                          # + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 7.) Reduced model 5; test
len.mods[[6]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                          # + max_temp
                          # + season*max_temp
                          # + season*river   
                          # + season*spp
                          # + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)


##assign names to each model
Modnames <- c("full","reduced1","reduced2","reduced3","reduced4","reduced5") 

##model selection table based on AICc
aictab(cand.set = len.mods, modnames = Modnames)

# question: what is "Res.LL (residual log-liklihood)"?

##compute evidence ratio
evidence(aictab(cand.set = len.mods, modnames = Modnames))
## 
## Evidence ratio between models 'reduced5' and 'reduced4':
## 11.21


The models including either season mean temp or season max temp are not the highest ranked using AICc selection.


What about food? Do fish with more food grow faster? What if we include a “fullness” predictor; total calories per fish divided by mean fish length.


Create column that has a value of calories/fish weight for each fish with a diet sample

# import diet calorie info from "fish_diet_content.Rmd" results
diet_cal <- gs_read(gs_title("fish_diet_calories_by_event"), ws = "diet") %>%
  rename(Hydrologic.Segment = section) %>%
  transform(year = as.character(year),
            age = as.character(age))

growth_temp_tbl_scl <- inner_join(growth_temp_tbl_scl,diet_cal, 
                                  by = c("year","river","age","spp","Hydrologic.Segment","season","sample.event")) %>%
   # calculate "fullness"; average calories divided by average fish size
   mutate(fullness_len = avg_len / mean_j,
          fullness_wt = avg_wt / mean_j)


What does the overall relationship between growth rate and “fullness” look like?

ggplot(growth_temp_tbl_scl, aes(fullness_len,ls_growth)) +
  geom_jitter(width = 1) +
  xlim(0,100)


What does the overall relationship between seasonal temperature and growth look like?

# order facets
growth_temp_tbl_scl$season <- factor(growth_temp_tbl_scl$season, levels = 
                                    c("Early Summer",
                                      "Mid-Summer",
                                      "Late Summer"))

ggplot(growth_temp_tbl_scl, aes(season_avg_temp,ls_growth)) +
  facet_wrap(~season) +
  geom_boxplot() +
  geom_point(aes(color = river)) +
  xlim(10,17) +
  ggtitle("Season MEAN Water Temp vs. Size-Specific Growth Rate") +
  ylab("Length-Specific Growth\n (mm/mm * day")


ggplot(growth_temp_tbl_scl, aes(max_temp,ls_growth)) +
  facet_wrap(~season) +
  geom_boxplot() +
  geom_point(aes(color = river)) +
  xlim(10,17) +
  ggtitle("Season MAX Water Temp vs. Size-Specific Growth Rate") +
  ylab("Length-Specific Growth\n (mm/mm * day")


Try AICc model selection when the new “fullness” value is included as a predictor


##set up  fork length glmm candidate models     
len.mods <- list()

# Develop list of candidate models based on a priori hypotheses

# 1.) Full model
len.mods[[1]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           + season_avg_temp
                           + fullness_len
                           + season*season_avg_temp
                           + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)


# 2.) Reduced model 1; mean seasonal temp not a predictor variable
len.mods[[2]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           #+ season_avg_temp
                           + fullness_len
                           + season*season_avg_temp
                           + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 3.) Reduced model 2; test
len.mods[[3]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           #+ season_avg_temp
                           #+ fullness_len
                           + season*season_avg_temp
                           + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 4.) Reduced model 3; test
len.mods[[4]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           #+ season_avg_temp
                           #+ fullness_len
                           #+ season*season_avg_temp
                           + season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 6.) Reduced model 4; test
len.mods[[5]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           #+ season_avg_temp
                           #+ fullness_len
                           #+ season*season_avg_temp
                           #+ season*river   
                           + season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 7.) Reduced model 5; test
len.mods[[6]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           #+ season_avg_temp
                           #+ fullness_len
                           #+ season*season_avg_temp
                           #+ season*river   
                           #+ season*spp
                           + season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

# 7.) Reduced model 6; test
len.mods[[7]] <- lmer(ls_growth_scl ~ season  # scaled predictor variable
                           #+ season_avg_temp
                           #+ fullness_len
                           #+ season*season_avg_temp
                           #+ season*river   
                           #+ season*spp
                           #+ season*age
                           + (1|river_reach), data = growth_temp_tbl_scl)

##assign names to each model
Modnames <- c("full","reduced1","reduced2","reduced3","reduced4","reduced5","reduced6") 

##model selection table based on AICc
aictab(cand.set = len.mods, modnames = Modnames)

# question: what is "Res.LL (residual log-liklihood)"?

##compute evidence ratio
evidence(aictab(cand.set = len.mods, modnames = Modnames))
## 
## Evidence ratio between models 'reduced6' and 'reduced5':
## NaN



7/21/2018::: Something isn’t quite working right here yet !

7/23:: this document is in the process of being transferred from “bioenerg_data_input_selection_v2” and can successfully knit up to this point.