Country Island Tern Chick Diets 1998 - 2019

Author

Rikki Clark

What are Terns Eating at Country Island, NS?

The end goal of working with Tern diet data collected by Canadian Wildlife Service (CWS) on Country Island, NS yearly from 1998 to 2019 is to establish if there is any correlation between diet shifts of Common Terns (COTE) and Arctic Terns (ARTE) and local climate/weather events. Roseate Terns (ROST) are included in the data set but will not be included in the final analysis as they are a special case being studied in detail by another graduate student.

Code

library(tidyverse)
library(readr)
library(rmarkdown)
library(dplyr)
library(tinytex)
library(kableExtra)
library(ggmap)
library(ggspatial)
library(sf)
library(leaflet)
library(mapview)
library(htmlwidgets)
library(ggplot2) #loading all libraires I need
library(magick)
library(png)
library(grid)
library(ggeffects)
library(modelr)
library(mgcv)
library(broom)
library(jtools)
library(ggstance)
library(MASS)
library(pscl)
ternpair <- image_read('ternpair.jpg')#reads in image I have saved into my directory
image_write(ternpair, path = 'ternpair.png', format = "png") #saves the image as a png file, which is better to work with
ternnest<- image_read("ternnest.jpg")
image_write(ternnest, path= "ternnest.png", format = "png")#these lines are same as above

ternpairpng <- image_read("ternpair.png")|>#puts image in an object to work with
image_rotate(90) |>#rotates image 90 degrees so it's properly oriented
  image_scale("100")#resized the image proportionally so width is 100 pixels
image_write(ternpairpng, path = "matedARTE.png")#saves the image 

ternnestpng <- image_read("ternnest.png")|>
  image_scale("300")
image_write(ternnestpng, path="COTEnest.png") #all same as above for mated ARTE

The Data

This data has been collected, stored, and provided by CWS and the Country Island seabird breeding monitoring program.

Code

myterns <- read.csv("tern_feeding_rates.csv") #reading in my data set and placing it in an object called "myterns"

Data Wrangling

Reformatting the Data Set to a Long Format

The data set initially had a separate variable column for each prey species, which can make data wrangling more difficult or tedious, particularly if I want to know the sum, mean, median, etc of different prey species taken. In order to more easily manipulate data for analysis, I have rearranged the data set.¹

Code

#This is thanks in large part to Paige, who showed me how
myternslong <- myterns %>% #creating a new object that will hold the pivoted data set
  # pivot data into long format
  pivot_longer(cols = starts_with("num_"), # all species columns start the same
               names_to = "prey_species", # name the new column
               names_prefix = "num_", # remove that prefix from all values
               values_to = "num_observed", # name the column ind. count
               values_drop_na = TRUE) %>% # remove NAs
  filter(num_observed != 0) %>% # remove values of zeros
    # for a bar chart you need 1 row for each observation
  map_df(., rep, .$num_observed) %>% # repeats each row the n times in observed column
  mutate(num_observed = 1, # changes the values all to 1 
         prey_species = str_replace_all(prey_species, pattern = "_", replacement = " ")) %>% 
  # HOW TO CHANGE SPECIES INTO 1 CATEGORY
  # you need to have the last line with the TRUE statement, but if you want to make
  # any other changes, structure it exactly like the first prey_species %in% line
  mutate(prey_species = case_when(prey_species %in% c("ter inv", "unknown", "unknown fish", "unknown inv", "unknown sp") ~ "unknown",
                                  prey_species == "atl saury" ~ "atlantic saury",
                                  prey_species == "amphipod sp" ~ "amphipods",
                                  prey_species == "mar inv" ~ "marine invertebrates",
                                  TRUE ~ prey_species),
         prey_species = str_to_title(prey_species))

Saving the Reformatted Data Set as a .csv File

It is good practice to save your rearranged and cleaned data as a new .csv file that can be shared with others or used for further analysis.²

Code

write.csv(x= myternslong, file = "tern_feeding_longer.csv") #saving the pivoted data set as a .csv file so it can be used by others as well as for my own further analysis
longtern <- read.csv("tern_feeding_longer.csv") #reading the pivoted data set in so it can be used for summary tables

Summary Tables

Total Prey Capture by Each Tern Species

While the feeding ecology between COTE and ARTE is similar, there are some differences that may result in greater foraging success by one species over the other at Country Island, NS.

Code

terntable1 <- longtern %>% #make a new object for data wrangling to maintain integrity of original data set
  filter(tern_species != "ROST", year != "2023") %>% #filter out unwanted observations - anything related to ROST and year 2023
  group_by(tern_species, year, prey_species) %>% #group the data first by the species of tern, then by the year, then by the prey species
  summarise(number_depredated = sum(num_observed, na.rm = T)) %>% #make a summary to turn into a table that will show the the total number of each prey species observed being eaten by each tern species each year
  ungroup() #this is important so that R doesn't think you're still working with the above groupings for future wrangling

terntable1 %>% 
   kbl(caption = "Total Prey of Arctic and Common Terns by Year") %>% #kbl makes the table, caption gives the title of the table
  kable_classic(full_width = F, html_font = "Cambria") #gives the aesthetic of the table and specifies the font

Total Prey of Arctic and Common Terns by Year
tern_species	year	prey_species	number_depredated
ARTE	1998	Butterfish	1
ARTE	1998	Hake	358
ARTE	1998	Lumpfish	1
ARTE	1998	Marine Invert	3
ARTE	1998	Mottledsculpin	18
ARTE	1998	Mummichug	2
ARTE	1998	Needlefish	2
ARTE	1998	Sandlance	24
ARTE	1998	Stickleback	6
ARTE	1998	Unknown	468
ARTE	1999	Hake	331
ARTE	1999	Herring	22
ARTE	1999	Marine Invert	2
ARTE	1999	Sandlance	490
ARTE	1999	Stickleback	3
ARTE	1999	Unknown	143
ARTE	2000	Hake	300
ARTE	2000	Herring	27
ARTE	2000	Marine Invert	44
ARTE	2000	Pipefish	2
ARTE	2000	Pollock	7
ARTE	2000	Sandlance	77
ARTE	2000	Stickleback	7
ARTE	2000	Unknown	221
ARTE	2001	Hake	84
ARTE	2001	Herring	44
ARTE	2001	Sandlance	79
ARTE	2001	Unknown	85
ARTE	2002	Hake	95
ARTE	2002	Herring	17
ARTE	2002	Marine Invert	6
ARTE	2002	Sandlance	58
ARTE	2002	Snipefish	1
ARTE	2002	Terrestrial Invert	1
ARTE	2002	Unknown	114
ARTE	2003	Hake	194
ARTE	2003	Herring	28
ARTE	2003	Pollock	2
ARTE	2003	Sandlance	19
ARTE	2003	Stickleback	1
ARTE	2003	Unknown	46
ARTE	2004	Butterfish	1
ARTE	2004	Cod	3
ARTE	2004	Hake	141
ARTE	2004	Herring	12
ARTE	2004	Lumpfish	1
ARTE	2004	Marine Invert	81
ARTE	2004	Pollock	8
ARTE	2004	Redfish	2
ARTE	2004	Sandlance	83
ARTE	2004	Stickleback	3
ARTE	2004	Unknown	55
ARTE	2005	Hake	208
ARTE	2005	Herring	2
ARTE	2005	Marine Invert	21
ARTE	2005	Sandlance	18
ARTE	2005	Stickleback	8
ARTE	2005	Terrestrial Invert	1
ARTE	2005	Unknown	115
ARTE	2006	Butterfish	1
ARTE	2006	Cod	4
ARTE	2006	Hake	63
ARTE	2006	Herring	1
ARTE	2006	Sandlance	36
ARTE	2006	Unknown	71
ARTE	2007	Butterfish	1
ARTE	2007	Cod	10
ARTE	2007	Hake	277
ARTE	2007	Herring	21
ARTE	2007	Lumpfish	5
ARTE	2007	Marine Invert	3
ARTE	2007	Pollock	7
ARTE	2007	Sandlance	212
ARTE	2007	Stickleback	6
ARTE	2007	Unknown	134
ARTE	2008	Cod	14
ARTE	2008	Hake	354
ARTE	2008	Herring	102
ARTE	2008	Lumpfish	2
ARTE	2008	Marine Invert	51
ARTE	2008	Sandlance	247
ARTE	2008	Stickleback	5
ARTE	2008	Terrestrial Invert	64
ARTE	2008	Unknown	189
ARTE	2009	Butterfish	16
ARTE	2009	Cod	2
ARTE	2009	Hake	236
ARTE	2009	Herring	44
ARTE	2009	Lumpfish	1
ARTE	2009	Marine Invert	30
ARTE	2009	Needlefish	1
ARTE	2009	Pollock	7
ARTE	2009	Sandlance	95
ARTE	2009	Stickleback	1
ARTE	2009	Terrestrial Invert	3
ARTE	2009	Unknown	102
ARTE	2010	Butterfish	4
ARTE	2010	Cod	1
ARTE	2010	Hake	100
ARTE	2010	Herring	96
ARTE	2010	Lumpfish	1
ARTE	2010	Needlefish	5
ARTE	2010	Pollock	2
ARTE	2010	Sandlance	44
ARTE	2010	Snipefish	7
ARTE	2010	Stickleback	3
ARTE	2010	Terrestrial Invert	5
ARTE	2010	Unknown	181
ARTE	2011	Butterfish	4
ARTE	2011	Capelin	15
ARTE	2011	Hake	194
ARTE	2011	Herring	15
ARTE	2011	Mackerel	1
ARTE	2011	Marine Invert	23
ARTE	2011	Needlefish	4
ARTE	2011	Sandlance	133
ARTE	2011	Silverside	21
ARTE	2011	Snipefish	4
ARTE	2011	Stickleback	14
ARTE	2011	Terrestrial Invert	39
ARTE	2011	Unknown	397
ARTE	2011	Unknown Invert	85
ARTE	2011	Unknown Species	41
ARTE	2012	Butterfish	14
ARTE	2012	Hake	397
ARTE	2012	Herring	14
ARTE	2012	Marine Invert	82
ARTE	2012	Pollock	1
ARTE	2012	Sandlance	4
ARTE	2012	Snipefish	16
ARTE	2012	Stickleback	6
ARTE	2012	Terrestrial Invert	4
ARTE	2012	Unknown	141
ARTE	2012	Unknown Species	100
ARTE	2013	Amphipod	4
ARTE	2013	Atlanticsaury	5
ARTE	2013	Butterfish	1
ARTE	2013	Capelin	1
ARTE	2013	Euphausiid	1
ARTE	2013	Hake	250
ARTE	2013	Herring	14
ARTE	2013	Marine Invert	2
ARTE	2013	Pollock	6
ARTE	2013	Sandlance	16
ARTE	2013	Sculpin	1
ARTE	2013	Silverside	4
ARTE	2013	Stickleback	3
ARTE	2013	Terrestrial Invert	2
ARTE	2013	Unknown	141
ARTE	2014	Atlanticsaury	2
ARTE	2014	Euphausiid	3
ARTE	2014	Hake	232
ARTE	2014	Herring	5
ARTE	2014	Mackerel	3
ARTE	2014	Marine Invert	2
ARTE	2014	Mummichug	2
ARTE	2014	Pollock	1
ARTE	2014	Redfish	4
ARTE	2014	Sandlance	20
ARTE	2014	Silverside	10
ARTE	2014	Snipefish	1
ARTE	2014	Unknown	71
ARTE	2015	Atlanticsaury	1
ARTE	2015	Butterfish	1
ARTE	2015	Euphausiid	10
ARTE	2015	Hake	162
ARTE	2015	Herring	22
ARTE	2015	Lumpfish	1
ARTE	2015	Sandlance	16
ARTE	2015	Snipefish	2
ARTE	2015	Stickleback	2
ARTE	2015	Unknown	114
ARTE	2015	Unknown Invert	4
ARTE	2016	Amphipod	2
ARTE	2016	Atlanticsaury	5
ARTE	2016	Euphausiid	1
ARTE	2016	Hake	466
ARTE	2016	Herring	6
ARTE	2016	Lumpfish	1
ARTE	2016	Marine Invert	240
ARTE	2016	Sandlance	39
ARTE	2016	Snipefish	15
ARTE	2016	Squid	1
ARTE	2016	Stickleback	1
ARTE	2016	Terrestrial Invert	1
ARTE	2016	Unknown	54
ARTE	2016	Unknown Invert	28
ARTE	2017	Amphipod	11
ARTE	2017	Butterfish	2
ARTE	2017	Euphausiid	18
ARTE	2017	Hake	376
ARTE	2017	Herring	7
ARTE	2017	Mackerel	1
ARTE	2017	Marine Invert	50
ARTE	2017	Sandlance	63
ARTE	2017	Stickleback	2
ARTE	2017	Terrestrial Invert	5
ARTE	2017	Unknown	138
ARTE	2017	Unknown Invert	17
ARTE	2018	Amphipod	1
ARTE	2018	Atlanticsaury	1
ARTE	2018	Cunner	1
ARTE	2018	Euphausiid	1
ARTE	2018	Hake	382
ARTE	2018	Herring	28
ARTE	2018	Marine Invert	62
ARTE	2018	Pollock	3
ARTE	2018	Sandlance	36
ARTE	2018	Squid	1
ARTE	2018	Stickleback	10
ARTE	2018	Terrestrial Invert	124
ARTE	2018	Unknown	147
ARTE	2018	Unknown Invert	81
ARTE	2019	Atlanticsaury	2
ARTE	2019	Euphausiid	21
ARTE	2019	Hake	212
ARTE	2019	Herring	9
ARTE	2019	Lumpfish	4
ARTE	2019	Mackerel	1
ARTE	2019	Marine Invert	13
ARTE	2019	Mummichug	1
ARTE	2019	Pollock	3
ARTE	2019	Redfish	2
ARTE	2019	Sandlance	98
ARTE	2019	Sculpin	12
ARTE	2019	Snipefish	3
ARTE	2019	Squid	7
ARTE	2019	Stickleback	30
ARTE	2019	Terrestrial Invert	214
ARTE	2019	Unknown	171
ARTE	2019	Unknown Invert	33
ARTE	2019	Unknown Species	3
COTE	1998	Hake	234
COTE	1998	Lumpfish	2
COTE	1998	Marine Invert	1
COTE	1998	Mottledsculpin	21
COTE	1998	Mummichug	6
COTE	1998	Needlefish	2
COTE	1998	Sandlance	40
COTE	1998	Stickleback	6
COTE	1998	Unknown	347
COTE	1999	Hake	61
COTE	1999	Herring	4
COTE	1999	Sandlance	190
COTE	1999	Unknown	58
COTE	2000	Butterfish	2
COTE	2000	Hake	175
COTE	2000	Herring	19
COTE	2000	Marine Invert	3
COTE	2000	Pollock	6
COTE	2000	Sandlance	45
COTE	2000	Stickleback	7
COTE	2000	Unknown	112
COTE	2001	Hake	34
COTE	2001	Herring	12
COTE	2001	Pollock	1
COTE	2001	Sandlance	55
COTE	2001	Stickleback	1
COTE	2001	Unknown	31
COTE	2002	Butterfish	3
COTE	2002	Hake	88
COTE	2002	Herring	14
COTE	2002	Marine Invert	1
COTE	2002	Pollock	12
COTE	2002	Sandlance	65
COTE	2002	Unknown	99
COTE	2003	Hake	314
COTE	2003	Herring	42
COTE	2003	Marine Invert	1
COTE	2003	Pollock	40
COTE	2003	Sandlance	65
COTE	2003	Stickleback	8
COTE	2003	Unknown	95
COTE	2004	Butterfish	4
COTE	2004	Cod	1
COTE	2004	Hake	98
COTE	2004	Herring	31
COTE	2004	Killifish	5
COTE	2004	Marine Invert	27
COTE	2004	Pollock	21
COTE	2004	Sandlance	160
COTE	2004	Stickleback	3
COTE	2004	Unknown	99
COTE	2005	Butterfish	1
COTE	2005	Hake	264
COTE	2005	Herring	12
COTE	2005	Lumpfish	1
COTE	2005	Marine Invert	9
COTE	2005	Pollock	7
COTE	2005	Sandlance	44
COTE	2005	Sculpin	2
COTE	2005	Stickleback	11
COTE	2005	Terrestrial Invert	1
COTE	2005	Unknown	200
COTE	2006	Cod	4
COTE	2006	Hake	129
COTE	2006	Herring	2
COTE	2006	Killifish	1
COTE	2006	Pollock	1
COTE	2006	Sandlance	90
COTE	2006	Stickleback	3
COTE	2006	Terrestrial Invert	1
COTE	2006	Unknown	108
COTE	2007	Cod	7
COTE	2007	Hake	215
COTE	2007	Herring	25
COTE	2007	Lumpfish	2
COTE	2007	Marine Invert	1
COTE	2007	Pollock	25
COTE	2007	Sandlance	275
COTE	2007	Stickleback	7
COTE	2007	Terrestrial Invert	1
COTE	2007	Unknown	102
COTE	2008	Cod	11
COTE	2008	Hake	226
COTE	2008	Herring	65
COTE	2008	Lumpfish	1
COTE	2008	Marine Invert	5
COTE	2008	Pollock	9
COTE	2008	Sandlance	98
COTE	2008	Stickleback	10
COTE	2008	Terrestrial Invert	91
COTE	2008	Unknown	122
COTE	2009	Butterfish	24
COTE	2009	Cod	9
COTE	2009	Hake	137
COTE	2009	Herring	12
COTE	2009	Lumpfish	4
COTE	2009	Marine Invert	20
COTE	2009	Needlefish	1
COTE	2009	Pollock	27
COTE	2009	Redfish	1
COTE	2009	Sandlance	76
COTE	2009	Stickleback	3
COTE	2009	Terrestrial Invert	1
COTE	2009	Unknown	67
COTE	2010	Butterfish	8
COTE	2010	Cod	4
COTE	2010	Hake	217
COTE	2010	Herring	123
COTE	2010	Lumpfish	3
COTE	2010	Needlefish	24
COTE	2010	Pollock	20
COTE	2010	Sandlance	98
COTE	2010	Sculpin	1
COTE	2010	Stickleback	11
COTE	2010	Terrestrial Invert	12
COTE	2010	Unknown	288
COTE	2011	Capelin	20
COTE	2011	Cunner	1
COTE	2011	Hake	113
COTE	2011	Herring	23
COTE	2011	Mackerel	1
COTE	2011	Marine Invert	2
COTE	2011	Mummichug	2
COTE	2011	Needlefish	3
COTE	2011	Pollock	4
COTE	2011	Redfish	2
COTE	2011	Sandlance	51
COTE	2011	Sculpin	1
COTE	2011	Silverside	5
COTE	2011	Snipefish	2
COTE	2011	Stickleback	11
COTE	2011	Terrestrial Invert	12
COTE	2011	Unknown	224
COTE	2011	Unknown Invert	14
COTE	2011	Unknown Species	6
COTE	2012	Butterfish	26
COTE	2012	Cunner	2
COTE	2012	Hake	454
COTE	2012	Herring	26
COTE	2012	Marine Invert	41
COTE	2012	Mummichug	2
COTE	2012	Pollock	4
COTE	2012	Redfish	1
COTE	2012	Sandlance	10
COTE	2012	Sculpin	1
COTE	2012	Silverside	1
COTE	2012	Snipefish	24
COTE	2012	Stickleback	4
COTE	2012	Terrestrial Invert	9
COTE	2012	Unknown	184
COTE	2012	Unknown Species	105
COTE	2013	Amphipod	6
COTE	2013	Atlanticsaury	7
COTE	2013	Butterfish	1
COTE	2013	Euphausiid	1
COTE	2013	Hake	320
COTE	2013	Herring	34
COTE	2013	Marine Invert	11
COTE	2013	Pollock	6
COTE	2013	Redfish	1
COTE	2013	Sandlance	28
COTE	2013	Silverside	1
COTE	2013	Snipefish	2
COTE	2013	Stickleback	4
COTE	2013	Terrestrial Invert	6
COTE	2013	Unknown	131
COTE	2013	Unknown Invert	1
COTE	2014	Euphausiid	4
COTE	2014	Hake	242
COTE	2014	Herring	7
COTE	2014	Lumpfish	5
COTE	2014	Mackerel	4
COTE	2014	Marine Invert	3
COTE	2014	Pollock	2
COTE	2014	Redfish	3
COTE	2014	Sandlance	11
COTE	2014	Silverside	13
COTE	2014	Snipefish	1
COTE	2014	Terrestrial Invert	1
COTE	2014	Unknown	133
COTE	2015	Atlanticsaury	1
COTE	2015	Butterfish	21
COTE	2015	Hake	128
COTE	2015	Herring	38
COTE	2015	Lumpfish	1
COTE	2015	Marine Invert	1
COTE	2015	Pollock	4
COTE	2015	Sandlance	27
COTE	2015	Snipefish	2
COTE	2015	Unknown	164
COTE	2015	Unknown Invert	1
COTE	2016	Amphipod	1
COTE	2016	Atlanticsaury	3
COTE	2016	Cunner	8
COTE	2016	Euphausiid	1
COTE	2016	Hake	252
COTE	2016	Herring	2
COTE	2016	Mackerel	1
COTE	2016	Marine Invert	9
COTE	2016	Pollock	2
COTE	2016	Redfish	1
COTE	2016	Sandlance	52
COTE	2016	Snipefish	5
COTE	2016	Terrestrial Invert	4
COTE	2016	Unknown	54
COTE	2016	Unknown Invert	7
COTE	2017	Butterfish	6
COTE	2017	Cunner	4
COTE	2017	Euphausiid	6
COTE	2017	Hake	407
COTE	2017	Herring	14
COTE	2017	Lumpfish	1
COTE	2017	Marine Invert	2
COTE	2017	Pollock	3
COTE	2017	Redfish	1
COTE	2017	Sandlance	34
COTE	2017	Stickleback	3
COTE	2017	Terrestrial Invert	14
COTE	2017	Unknown	132
COTE	2017	Unknown Invert	37
COTE	2018	Atlanticsaury	1
COTE	2018	Cunner	40
COTE	2018	Euphausiid	1
COTE	2018	Hake	226
COTE	2018	Herring	38
COTE	2018	Lumpfish	2
COTE	2018	Mackerel	1
COTE	2018	Marine Invert	12
COTE	2018	Pollock	13
COTE	2018	Sandlance	35
COTE	2018	Stickleback	11
COTE	2018	Terrestrial Invert	49
COTE	2018	Unknown	96
COTE	2018	Unknown Invert	25
COTE	2019	Atlanticsaury	1
COTE	2019	Cunner	4
COTE	2019	Euphausiid	18
COTE	2019	Hake	202
COTE	2019	Herring	2
COTE	2019	Lumpfish	4
COTE	2019	Marine Invert	6
COTE	2019	Mummichug	3
COTE	2019	Pollock	3
COTE	2019	Redfish	3
COTE	2019	Sandlance	56
COTE	2019	Sculpin	5
COTE	2019	Snipefish	8
COTE	2019	Squid	5
COTE	2019	Stickleback	42
COTE	2019	Terrestrial Invert	21
COTE	2019	Unknown	170
COTE	2019	Unknown Invert	23

Hake is a preferred prey item for both ARTE and COTE; however, ARTE show a clear and consistent preference for hake every year while COTE demonstrate a willingness to shift focus to similar species such as herring or sand lance, depending on the year. Reasons for such dietary shifts are currently unclear, but further understanding should be achieved when these dietary patterns are modeled with regional climate data from relevant years.

Visualization is always helpful.

Code

preyspplot<- ggplot(data = terntable1, mapping = aes(x=year, y = number_depredated,group= prey_species, col=prey_species))+ #says which data to use, what x and y axis will be, and to color-code data according to prey species
  geom_point()+
  geom_line()+
  labs(x="Year", y="Prey per year")+
  facet_wrap(~tern_species)+
  theme_classic()#all same as previous plots
preyspplot

Observation Effort for Each Tern Species

While climate and other environmental factors undoubtedly influence any differences in prey capture between years, it is crucial to account for human factors that influence data collection so as to draw accurate conclusions about diet shifts across time. The most important human factor to be accounted for in this study is observation effort. Effort must be quantified to establish if higher numbers of captured prey are truly reflective of what took place in the tern breeding colony or if they are reflective of greater observation effort.

One simple way to easily establish if differences in total prey capture across years is reflective of environmental factors rather than human factors is to compare the number of total prey observations per hour for each year. This can be calculated for each year using the following formula:³

$$O/hr=\frac{x_{1}+x_{2}+\cdots+x_{n}}{y_{1}+y_{2}+\cdots+y_{n}}$$

Code

terntable2<- longtern %>% 
  filter(tern_species != "ROST", year != "2023") %>% 
  group_by(tern_species, year) %>% #group by tern species and then by year
  summarise(prey_per_hour = sum(total_prey, na.rm = T)/sum(time_hr, na.rm = T)) %>% #take the sum of total prey -all prey observed regarless of species- and divide it by the sum of all hours of observation. na.rm=T removes and "NAs"
  ungroup()

terntable2 %>% 
   kbl(caption = "Total Prey Capture by ARTE and COTE per Hour For Each Year") %>% #kbl makes the table, caption gives the title of the table
  kable_classic(full_width = F, html_font = "Cambria")

Total Prey Capture by ARTE and COTE per Hour For Each Year
tern_species	year	prey_per_hour
ARTE	1998	3.535083
ARTE	1999	3.627144
ARTE	2000	3.035268
ARTE	2001	2.934931
ARTE	2002	2.715753
ARTE	2003	1.938567
ARTE	2004	2.430052
ARTE	2005	1.745283
ARTE	2006	1.810888
ARTE	2007	1.552172
ARTE	2008	2.656043
ARTE	2009	2.351301
ARTE	2010	2.394350
ARTE	2011	3.472732
ARTE	2012	2.806804
ARTE	2013	2.626386
ARTE	2014	1.830509
ARTE	2015	1.800600
ARTE	2016	6.196292
ARTE	2017	3.423790
ARTE	2018	4.957818
ARTE	2019	5.197019
COTE	1998	2.953718
COTE	1999	3.544728
COTE	2000	2.071816
COTE	2001	4.092024
COTE	2002	2.007092
COTE	2003	1.780576
COTE	2004	1.399775
COTE	2005	1.855072
COTE	2006	2.521418
COTE	2007	1.384294
COTE	2008	3.100313
COTE	2009	1.808901
COTE	2010	1.971840
COTE	2011	1.593694
COTE	2012	2.623602
COTE	2013	2.202691
COTE	2014	1.961539
COTE	2015	2.126779
COTE	2016	1.632183
COTE	2017	2.559564
COTE	2018	1.970596
COTE	2019	2.832028

When examining the number of successful prey captures observed per hour, differences in feeding rates across years and between ARTE and COTE are much easier to note. For example, ARTE had distinctly higher feeding rates in 2016, 6.20 prey items per hour, than did COTE in the same year, 1.97 prey items per hour. The wide range of prey items per hour across years is also notable given that some years show much higher success rates than others for both species, though years with higher catch rates do not always line up between the two species. Further examination not only of climate factors, but also of the feeding ecologies of ARTE and COTE will be required to gain insight into the driving factors of these shifts in foraging success.

It may be helpful to visualize this:

Code

prettytern<-image_read(path="prettyternpng.png")
plottern<- rasterGrob(prettytern, interpolate=T)
obsefplot<- ggplot(data = terntable2, mapping = aes(x=year, y = prey_per_hour,group= tern_species, col=tern_species))+ #says which data to use, what x and y axis will be, and to color-code data according to tern species
  geom_point()+
  geom_line()+
  labs(x="Year", y="Prey per Hour")+
  annotation_custom(plottern, xmin=2005, xmax=2010, ymin=4, ymax=6)+
  theme_classic()#all same as previous plots
obsefplot

Preliminary GAM

To look at the affects of climate on tern diet and foraging success, I can start by looking at the feeding rates (prey/hour/year) calculated in the Summary Tables section and compare that to North Atlantic Oscillation (NAO) data⁴.

Testing the code

I first need to merge my tern data with yearly NAO data to make a single data set on which I can run my General Additive Model, which I have chosen because I do not expect my data to be normally distributed.

Code

library(mgcv) #load library for GAM
annualnao<-read.csv(file="yearly_nao_pcbased.csv")#read in NAO data
ternclimate<-merge(x = terntable2, y = annualnao, by = "year", all = TRUE)#merge the tern data and the NAO data into a single data set in the object ternclimate - both data sets share a year column and annualnao will merge left into terntable2 aligned by year for all data points that share year
str(ternclimate)#check to make sure the merge worked properly

'data.frame':   48 obs. of  4 variables:
 $ year         : int  1998 1998 1999 1999 2000 2000 2001 2001 2002 2002 ...
 $ tern_species : chr  "ARTE" "COTE" "COTE" "ARTE" ...
 $ prey_per_hour: num  3.54 2.95 3.54 3.63 2.07 ...
 $ NAO          : num  -0.28 -0.28 0.57 0.57 0.65 0.65 -0.57 -0.57 0.34 0.34 ...

Code

gamtester<- gam(data=ternclimate, prey_per_hour~s(year)+s(NAO), method="REML")#tells the gam argument to use ternclimate, that I want to know the non-linear effects of year and NAO on number of prey caugh per hour - REML is Restricted Maximum Likelihood Method, and is recommended because it tends to be more reliable and stable for newbies like me.
summary(gamtester) #gives a summary of gam info


Family: gaussian 
Link function: identity 

Formula:
prey_per_hour ~ s(year) + s(NAO)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.6144     0.1356   19.27   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
          edf Ref.df     F p-value   
s(year) 2.790  3.458 4.362 0.00781 **
s(NAO)  1.104  1.197 0.449 0.48499   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.238   Deviance explained = 30.7%
-REML = 61.253  Scale est. = 0.80957   n = 44

Code

plot(gamtester, shade=T, shade.col = "lightgreen", se=T, shift = coef(gamtester)[1], pages = 1) #plots the gam, shade=T shades in the standard error, shade.col= allows me to pick the color, se=T says I want to see the standard error, shift=coef(gamtester)[1] lets me shift the y-axis to include the intercept so the y-axis is no longer relative - aka the number of prey captured per hour of observation around 2007 was actually close to 2

This was a good first test to make sure I can get the code to work, but I also need information about the prey types and how that has changed through the years. To keep my data wrangling intact and make a singular data set to work with, I will merge “ternclimate” with “terntable1.”

Additionally, I want a quick visualization of NAO through the years, as I suspect that NAO has a much greater effect than year does on prey captured, and it would be nice to see the actual data points plotted to help me understand if my models make sense.

Code

naoyearplot<- ggplot(data = ternclimate, mapping = aes(x=year, y = NAO,))+
  geom_point()+
  geom_line()+
  labs(x="Year", y="NAO")+
  theme_classic()#all same as previous plots
naoyearplot

The NAO in 2010 is much lower than in other years, and there is a large gap between that data point and the rest of the data. This will likely leave a large gap between between residuals/fitted values when NAO is ~-3 and the rest of the data because there are no years with NAO values between -3 and -0.6. However, I will leave this and associated data in the model for now because it can still provide useful information on the relationship between NAO and prey capture by terns at Country Island. I am hopeful that it will not skew the model.

Now I’ll merge terntable1 with ternclimate and check the structure of the new data set (ternshunt) to make sure the merge worked.

Code

ternshunt<-merge(x = ternclimate, y = terntable1, by = "year", all = TRUE)
str(ternshunt) #checking to make sure the merge worked - it did

'data.frame':   972 obs. of  7 variables:
 $ year             : int  1998 1998 1998 1998 1998 1998 1998 1998 1998 1998 ...
 $ tern_species.x   : chr  "ARTE" "ARTE" "ARTE" "ARTE" ...
 $ prey_per_hour    : num  3.54 3.54 3.54 3.54 3.54 ...
 $ NAO              : num  -0.28 -0.28 -0.28 -0.28 -0.28 -0.28 -0.28 -0.28 -0.28 -0.28 ...
 $ tern_species.y   : chr  "ARTE" "ARTE" "ARTE" "ARTE" ...
 $ prey_species     : chr  "Butterfish" "Hake" "Lumpfish" "Marine Invert" ...
 $ number_depredated: int  1 358 1 3 18 2 2 24 6 468 ...

I now have two tern species variables, one from each data set, but this won’t be a problem as they are the same so it won’t affect my modeling.

GAM with Default (Gaussian) Distribution

Now I can run some GAMS to see how NAO may affect the number of each prey species caught each year. I will split this up by tern species.

Code

COTEgam1<- ternshunt %>% 
  filter(tern_species.x!="ARTE", tern_species.y!="ARTE") #making an object with data for only COTE 
  
COTEgam1a<- gam(data=COTEgam1, number_depredated~s(NAO) + s(year)+ prey_species, method="REML")
summary(COTEgam1a) #note that the summary will show the "intercept" as the first factor in alphabetical order - in this case, Amphipods in prey species are top in alphabet so they are listed as "intercept"


Family: gaussian 
Link function: identity 

Formula:
number_depredated ~ s(NAO) + s(year) + prey_species

Parametric coefficients:
                                Estimate Std. Error t value Pr(>|t|)    
(Intercept)                      1.36207   32.53720   0.042   0.9666    
prey_speciesAtlanticsaury       -0.03648   38.42282  -0.001   0.9992    
prey_speciesButterfish           7.86081   35.70159   0.220   0.8259    
prey_speciesCapelin             25.02225   56.32612   0.444   0.6573    
prey_speciesCod                  2.75732   37.72824   0.073   0.9418    
prey_speciesCunner               6.42819   37.48099   0.172   0.8640    
prey_speciesEuphausiid           1.05906   37.48695   0.028   0.9775    
prey_speciesHake               206.08956   34.05373   6.052 6.03e-09 ***
prey_speciesHerring             25.60682   34.09578   0.751   0.4534    
prey_speciesKillifish            4.81243   46.18643   0.104   0.9171    
prey_speciesLumpfish            -0.41291   35.34284  -0.012   0.9907    
prey_speciesMackerel             1.66358   39.78509   0.042   0.9667    
prey_speciesMarine Invert        9.16032   34.32040   0.267   0.7898    
prey_speciesMottledsculpin      26.22700   56.92402   0.461   0.6454    
prey_speciesMummichug            1.58505   39.82607   0.040   0.9683    
prey_speciesNeedlefish           3.02961   40.04460   0.076   0.9398    
prey_speciesPollock              9.65150   34.14713   0.283   0.7777    
prey_speciesRedfish             -1.90137   36.28764  -0.052   0.9583    
prey_speciesSandlance           72.86229   34.05373   2.140   0.0335 *  
prey_speciesSculpin             -5.43982   38.53803  -0.141   0.8879    
prey_speciesSilverside           3.65289   39.76595   0.092   0.9269    
prey_speciesSnipefish            4.52180   36.80831   0.123   0.9023    
prey_speciesSquid               -6.29996   56.31643  -0.112   0.9110    
prey_speciesStickleback          7.24732   34.48375   0.210   0.8337    
prey_speciesTerrestrial Invert  12.32842   34.73703   0.355   0.7230    
prey_speciesUnknown            136.99865   34.05373   4.023 7.88e-05 ***
prey_speciesUnknown Invert      13.72587   36.82336   0.373   0.7097    
prey_speciesUnknown Species     55.20658   45.93123   1.202   0.2307    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
          edf Ref.df     F p-value  
s(NAO)  1.001  1.001 4.441  0.0362 *
s(year) 1.000  1.000 1.545  0.2152  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.645   Deviance explained = 68.6%
-REML = 1195.1  Scale est. = 2106.6    n = 252

Code

plot(COTEgam1a, shade=T, shade.col = "lightgreen", se=T, residuals=TRUE, pch=1, cex=0.7, shift = coef(COTEgam1a)[1]) #plots the gam, shade=T shades in the standard error, shade.col= allows me to pick the color, se=T says I want to see the standard error, shift=coef(gamtester)[1] lets me shift the y-axis to include the intercept so the y-axis is no longer relative but shows the actual predicted number of prey captured at specific NOA values, my residuals are shown as circles (set by pch=1 with their size set by cex=0.7)

We can see from the printed output as well as the plot that year does not have a significant effect on number of prey caught each year in the summer months, but NOA does (p<0.05). However, our residual sum of squares is HUGE (Scale est. = 2106.6), and we are predicting values below zero (it is not possible to catch less than 0 fish), so our Gaussian distribution is incorrect. Instead, we can try a distribution method that does not allow for negative values, like a poisson distribution. It’s common in the literature to use poisson distributions to handle count data, especially for these sorts of studies that involve observation watches to gather information on prey type and number of prey caught.

GAM with a Poisson Distribution

Code

COTEgam2<- gam(data=COTEgam1, number_depredated~s(NAO) + prey_species, family=poisson, method="REML")
summary(COTEgam2) #note that the summary will show the "intercept" as the first factor in alphabetical order - in this case, Amphipods in prey species are top in alphabet so they are listed as "intercept"


Family: poisson 
Link function: log 

Formula:
number_depredated ~ s(NAO) + prey_species

Parametric coefficients:
                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                     1.21170    0.37814   3.204  0.00135 ** 
prey_speciesAtlanticsaury      -0.20731    0.46900  -0.442  0.65847    
prey_speciesButterfish          1.00567    0.39171   2.567  0.01025 *  
prey_speciesCapelin             1.96731    0.44136   4.457 8.30e-06 ***
prey_speciesCod                 0.46046    0.41328   1.114  0.26521    
prey_speciesCunner              1.11432    0.40006   2.785  0.00535 ** 
prey_speciesEuphausiid          0.44194    0.41859   1.056  0.29107    
prey_speciesHake                4.10260    0.37840  10.842  < 2e-16 ***
prey_speciesHerring             2.03384    0.38055   5.345 9.07e-08 ***
prey_speciesKillifish          -0.19626    0.55645  -0.353  0.72431    
prey_speciesLumpfish           -0.37427    0.42604  -0.878  0.37967    
prey_speciesMackerel           -0.57345    0.53486  -1.072  0.28365    
prey_speciesMarine Invert       0.96265    0.38655   2.490  0.01276 *  
prey_speciesMottledsculpin      1.74303    0.43671   3.991 6.57e-05 ***
prey_speciesMummichug          -0.01886    0.46915  -0.040  0.96793    
prey_speciesNeedlefish          0.68840    0.42021   1.638  0.10137    
prey_speciesPollock             1.12587    0.38437   2.929  0.00340 ** 
prey_speciesRedfish            -0.70287    0.46897  -1.499  0.13393    
prey_speciesSandlance           3.06368    0.37893   8.085 6.21e-16 ***
prey_speciesSculpin            -0.59935    0.49328  -1.215  0.22435    
prey_speciesSilverside          0.41450    0.43936   0.943  0.34547    
prey_speciesSnipefish           0.67484    0.40714   1.658  0.09741 .  
prey_speciesSquid               0.43102    0.58611   0.735  0.46211    
prey_speciesStickleback         0.89458    0.38714   2.311  0.02085 *  
prey_speciesTerrestrial Invert  1.52561    0.38403   3.973 7.11e-05 ***
prey_speciesUnknown             3.69448    0.37855   9.760  < 2e-16 ***
prey_speciesUnknown Invert      1.58405    0.39025   4.059 4.93e-05 ***
prey_speciesUnknown Species     2.86525    0.39055   7.336 2.19e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
         edf Ref.df Chi.sq p-value    
s(NAO) 6.188  7.151  259.4  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.645   Deviance explained = 80.1%
-REML = 2837.2  Scale est. = 1         n = 252

Code

plot(COTEgam2, shade=T, shade.col = "lightgreen", se=T, residuals=TRUE, pch=1, cex=0.7, shift = coef(COTEgam2)[1])#you will still see negative residuals here because you haven't taken the antilog, but when we plot predicted values, there should be none below zero

We can see from the summary that this model explains roughly 80% of the effect that NAO and prey species has on number of prey caught per year. We can also see that the residual sum of squares in this model is 1 (Scale est. =1), which is much better than the model using the Gaussian distribution where the residual sum of squares was 2106.6. It should be noted that all the predicted values are positive for this model, so it does make biological sense.

Code

predict_response(COTEgam2) %>% plot(show_data=T, jitter=T, show_ci=TRUE, ci_style=c("dash"), colors="metro")#predicts response variables according to the model and then I can plot them with ggeffects plot() function, choosing to show the data, jitter the data so it's easier to see, show confidence intervals as dashed lines, and use a colour scheme

$NAO


$prey_species

I still need to run a check of the model to make sure it is the best one. Thankfully, R has gam.check() for that.

Code

gam.check(COTEgam2)##runs a the whole series of model validaton checks for me in just one command. I get a summary of stats tests to check for model fit as well as a QQ plot, a residuals vs linear predictions plot, a response vs fitted values plot, and a histogram of residuals.


Method: REML   Optimizer: outer newton
full convergence after 4 iterations.
Gradient range [1.692766e-05,1.692766e-05]
(score 2837.177 & scale 1).
Hessian positive definite, eigenvalue range [0.7326335,0.7326335].
Model rank =  37 / 37 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

         k'  edf k-index p-value
s(NAO) 9.00 6.19    0.98     0.4

We can see from looking at the outputs that there are a few problems with the current model. Ideally, our deviance of residuals will be a roughly straight line following the red regression line along the theoretical quantiles. Our histogram of residuals, while not terrible, does show a light left skew. In our residuals vs linear predictors plot, our residuals do not look randomly distributed left of the predictor value of 3, and our response vs fitted values plot is more widely distributed than ideal given we want our points to cluster around a 1-1 line to indicate a well-fitted model.

Our summary output shows that our k-index (a measure of the basis functions used to fit the model) is less than one, which indicates that the default number of basis functions in the model may be too low to properly capture the relationships. We can fix this by specifying the number of basis functions in the gam formula. Our current output indicates that k, our basis dimension, is 9. We will try increasing that to 20 to see if this fixes the issues with over-smoothing in our model (forcing the data into being more linear than is true).

Code

#Run the same gam as before with addition of specifying the basis dimensions
COTEgam3<- gam(data=COTEgam1, number_depredated~s(NAO,k=20) + prey_species, family=poisson, method="REML") #k=20 allows me to specify basis dimensions to fix over-smoothing in the model
summary(COTEgam3)


Family: poisson 
Link function: log 

Formula:
number_depredated ~ s(NAO, k = 20) + prey_species

Parametric coefficients:
                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                      1.3230     0.3791   3.489 0.000484 ***
prey_speciesAtlanticsaury       -0.3301     0.4695  -0.703 0.482032    
prey_speciesButterfish           0.8523     0.3928   2.170 0.030020 *  
prey_speciesCapelin              1.8549     0.4428   4.189 2.80e-05 ***
prey_speciesCod                  0.3516     0.4147   0.848 0.396577    
prey_speciesCunner               0.8942     0.4011   2.229 0.025786 *  
prey_speciesEuphausiid           0.3328     0.4193   0.794 0.427365    
prey_speciesHake                 3.9886     0.3795  10.509  < 2e-16 ***
prey_speciesHerring              1.9371     0.3816   5.076 3.86e-07 ***
prey_speciesKillifish           -0.1420     0.5580  -0.254 0.799180    
prey_speciesLumpfish            -0.5777     0.4272  -1.352 0.176295    
prey_speciesMackerel            -0.6499     0.5356  -1.214 0.224928    
prey_speciesMarine Invert        0.7757     0.3876   2.001 0.045392 *  
prey_speciesMottledsculpin       1.3423     0.4395   3.055 0.002254 ** 
prey_speciesMummichug           -0.3654     0.4704  -0.777 0.437299    
prey_speciesNeedlefish           0.5304     0.4215   1.258 0.208331    
prey_speciesPollock              1.0186     0.3855   2.643 0.008230 ** 
prey_speciesRedfish             -0.8440     0.4696  -1.797 0.072328 .  
prey_speciesSandlance            2.9497     0.3801   7.761 8.41e-15 ***
prey_speciesSculpin             -0.8606     0.4942  -1.742 0.081590 .  
prey_speciesSilverside           0.1978     0.4403   0.449 0.653218    
prey_speciesSnipefish            0.5079     0.4079   1.245 0.213060    
prey_speciesSquid                0.2007     0.5879   0.341 0.732777    
prey_speciesStickleback          0.7505     0.3883   1.933 0.053275 .  
prey_speciesTerrestrial Invert   1.3694     0.3850   3.556 0.000376 ***
prey_speciesUnknown              3.5805     0.3797   9.430  < 2e-16 ***
prey_speciesUnknown Invert       1.4604     0.3910   3.735 0.000187 ***
prey_speciesUnknown Species      2.4950     0.3921   6.364 1.97e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
        edf Ref.df Chi.sq p-value    
s(NAO) 18.8  18.98  939.9  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.722   Deviance explained = 83.7%
-REML = 2509.5  Scale est. = 1         n = 252

Code

plot(COTEgam3, shade=T, shade.col = "lightgreen", se=T, residuals=TRUE, pch=1, cex=0.7, shift = coef(COTEgam3)[1])

Unfortunately, it is now becoming more obvious that we may have to remove the 2010 data because it is causing problems in the model expected from an outlier. We will still run it through the gam.check.

Code

gam.check(COTEgam3)#runs a the whole series of model validaton checks for me in just one command. I get a summary of stats tests to check for model fit as well as a QQ plot, a residuals vs linear predictions plot, a response vs fitted values plot, and a histogram of residuals.


Method: REML   Optimizer: outer newton
full convergence after 7 iterations.
Gradient range [8.640124e-05,8.640124e-05]
(score 2509.469 & scale 1).
Hessian positive definite, eigenvalue range [6.626972,6.626972].
Model rank =  47 / 47 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

         k'  edf k-index p-value
s(NAO) 19.0 18.8    1.11    0.98

Similar issues are cropping up again, so we will try removing the outlier data. We will also try k=15, because the plot for k=20 showed more wiggliness (this is genuinely the term) than desired regardless of the outlier.

Code

COTEnoouts<- ternshunt %>% 
  filter(tern_species.x!="ARTE", tern_species.y!="ARTE", year !="2010")#filtering out the 2010 outlier
COTEgam4<- gam(data=COTEnoouts, number_depredated~s(NAO,k=15) + prey_species, family=poisson, method="REML") 
summary(COTEgam4)


Family: poisson 
Link function: log 

Formula:
number_depredated ~ s(NAO, k = 15) + prey_species

Parametric coefficients:
                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                      1.2962     0.3790   3.420 0.000626 ***
prey_speciesAtlanticsaury       -0.3337     0.4695  -0.711 0.477128    
prey_speciesButterfish           0.9389     0.3938   2.384 0.017115 *  
prey_speciesCapelin              1.8518     0.4427   4.183 2.88e-05 ***
prey_speciesCod                  0.5767     0.4188   1.377 0.168550    
prey_speciesCunner               0.8987     0.4009   2.242 0.024987 *  
prey_speciesEuphausiid           0.3124     0.4191   0.745 0.456055    
prey_speciesHake                 4.0189     0.3794  10.592  < 2e-16 ***
prey_speciesHerring              1.7659     0.3822   4.620 3.84e-06 ***
prey_speciesKillifish           -0.1406     0.5579  -0.252 0.801062    
prey_speciesLumpfish            -0.5299     0.4329  -1.224 0.220929    
prey_speciesMackerel            -0.6946     0.5354  -1.297 0.194577    
prey_speciesMarine Invert        0.7922     0.3875   2.044 0.040916 *  
prey_speciesMottledsculpin       1.3488     0.4394   3.070 0.002143 ** 
prey_speciesMummichug           -0.3652     0.4703  -0.777 0.437403    
prey_speciesNeedlefish          -0.6644     0.5578  -1.191 0.233620    
prey_speciesPollock              1.0194     0.3860   2.641 0.008267 ** 
prey_speciesRedfish             -0.8641     0.4695  -1.841 0.065695 .  
prey_speciesSandlance            2.9660     0.3800   7.805 5.95e-15 ***
prey_speciesSculpin             -0.6467     0.5053  -1.280 0.200637    
prey_speciesSilverside           0.1684     0.4401   0.383 0.702066    
prey_speciesSnipefish            0.4859     0.4078   1.192 0.233390    
prey_speciesSquid                0.1835     0.5877   0.312 0.754818    
prey_speciesStickleback          0.7978     0.3889   2.051 0.040233 *  
prey_speciesTerrestrial Invert   1.4505     0.3852   3.766 0.000166 ***
prey_speciesUnknown              3.5594     0.3796   9.376  < 2e-16 ***
prey_speciesUnknown Invert       1.4646     0.3908   3.747 0.000179 ***
prey_speciesUnknown Species      2.4993     0.3920   6.376 1.81e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
         edf Ref.df Chi.sq p-value    
s(NAO) 13.91     14  651.8  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =   0.71   Deviance explained =   83%
-REML = 2398.6  Scale est. = 1         n = 240

Code

plot(COTEgam4, shade=T, shade.col = "lightgreen", se=T, residuals=TRUE, pch=1, cex=0.7, shift = coef(COTEgam4)[1])

Code

gam.check(COTEgam4)


Method: REML   Optimizer: outer newton
full convergence after 8 iterations.
Gradient range [0.0006942665,0.0006942665]
(score 2398.59 & scale 1).
Hessian positive definite, eigenvalue range [6.105787,6.105787].
Model rank =  42 / 42 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

         k'  edf k-index p-value
s(NAO) 14.0 13.9    1.04    0.80

We are still seeing the same issues with the gam.check (after trying k values ranging from 5 to 20 with 20 being the maximum this data set will allow). We can see that the residuals are quite widely dispersed when plotted against the fitted values, indicating that the variance of the residuals is likely much higher than the mean. A poisson distribution assumes the mean and the variance will be roughly equal, but a negative binomial distribution accounts for a variance larger than the mean.

Code

COTEres1<- resid(COTEgam4)
COTEfit1<- fitted(COTEgam4)
ggplot()+
  geom_point(aes(x=COTEfit1,y=COTEres1))+
  geom_hline(yintercept=0, linetype='dashed', col='blue')+
  theme_bw(22)+ylab("Residuals")+xlab("Fitted values")

With one last visualization plotting residuals against fitted values, we can see there is more variation as fitted values rise above ~40, which falls outside the assumptions of a poisson distribution.

NOTE: I am uncertain if I might be able to say the residuals are fairly evenly dispersed around 0 with some outliers - I think they are, but I also guessed incorrectly when Trevor asked in class if the demo residuals were evenly distributed around 0 (I said no, but they were). We can check this with the nifty “create a dispersion function” trick.

Code

dispersion<-function(model,modeltype='gaussian')
{
  A<-sum(resid(model,type="pearson")^2)
  if(modeltype %in% c("g","p","qp","gaussian","poisson","quasipoisson"))
  {
    B<-length(resid(model))-length(coef(model))
  }
  if(modeltype %in% c("nb","negativebinomial"))
  {
    B<-length(resid(model))-(length(coef(model))+1)
  }
  DISP<<-A/B
  return(DISP)
}
dispersion(COTEgam4, modeltype="poisson")

[1] 20.2757

With a dispersion output of 20.28, we can now confidently say this model is overdispersed and the poisson is not the correct distribution.

GAM with Negative Binomial Distribution

We will first run the GAM without specifying the number of basis functions to see how it looks.

Code

COTEgam5<- gam(data=COTEnoouts, number_depredated~s(NAO) + prey_species, family= nb()) #family = nb() gives me a negative binomial distribution - r will automatically calculate the theta value for me in order to manage the overdispersion and display that value next to the family type in the summary
summary(COTEgam5)


Family: Negative Binomial(1.392) 
Link function: log 

Formula:
number_depredated ~ s(NAO) + prey_species

Parametric coefficients:
                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                     1.22528    0.71065   1.724   0.0847 .  
prey_speciesAtlanticsaury      -0.25760    0.85345  -0.302   0.7628    
prey_speciesButterfish          1.05845    0.77211   1.371   0.1704    
prey_speciesCapelin             1.89837    1.13225   1.677   0.0936 .  
prey_speciesCod                 0.59924    0.82479   0.727   0.4675    
prey_speciesCunner              1.10611    0.80079   1.381   0.1672    
prey_speciesEuphausiid          0.37742    0.81150   0.465   0.6419    
prey_speciesHake                4.09677    0.73450   5.578 2.44e-08 ***
prey_speciesHerring             1.85011    0.73710   2.510   0.0121 *  
prey_speciesKillifish          -0.14822    1.01462  -0.146   0.8839    
prey_speciesLumpfish           -0.40287    0.78797  -0.511   0.6092    
prey_speciesMackerel           -0.59997    0.91163  -0.658   0.5105    
prey_speciesMarine Invert       0.89369    0.74280   1.203   0.2289    
prey_speciesMottledsculpin      1.75811    1.12804   1.559   0.1191    
prey_speciesMummichug          -0.08985    0.87336  -0.103   0.9181    
prey_speciesNeedlefish         -0.52512    0.95249  -0.551   0.5814    
prey_speciesPollock             1.07874    0.74036   1.457   0.1451    
prey_speciesRedfish            -0.75387    0.81947  -0.920   0.3576    
prey_speciesSandlance           3.05664    0.73479   4.160 3.18e-05 ***
prey_speciesSculpin            -0.48533    0.89399  -0.543   0.5872    
prey_speciesSilverside          0.43186    0.85616   0.504   0.6140    
prey_speciesSnipefish           0.57847    0.79525   0.727   0.4670    
prey_speciesSquid               0.27075    1.19522   0.227   0.8208    
prey_speciesStickleback         0.87055    0.74669   1.166   0.2437    
prey_speciesTerrestrial Invert  1.56775    0.75163   2.086   0.0370 *  
prey_speciesUnknown             3.64531    0.73459   4.962 6.96e-07 ***
prey_speciesUnknown Invert      1.53785    0.78646   1.955   0.0505 .  
prey_speciesUnknown Species     2.71490    0.93519   2.903   0.0037 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
         edf Ref.df Chi.sq p-value
s(NAO) 1.001  1.002  2.228   0.136

R-sq.(adj) =  0.639   Deviance explained = 73.8%
-REML = 906.71  Scale est. = 1         n = 240

Code

plot(COTEgam5, shade=T, shade.col = "lightgreen", se=T, residuals=TRUE, pch=1, cex=0.7, shift = coef(COTEgam5)[1])

Code

gam.check(COTEgam5)


Method: REML   Optimizer: outer newton
full convergence after 7 iterations.
Gradient range [-0.0002732086,0.0001715235]
(score 906.7095 & scale 1).
Hessian positive definite, eigenvalue range [0.0002731124,93.50103].
Model rank =  37 / 37 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

       k' edf k-index p-value
s(NAO)  9   1     0.9    0.22

We can see from our GAM summary output that our dispersion is still greater than one (1.392 - this value is given in the family output with a negative binomial distribution), but it is a distinct improvement over a dispersion of 20.28! Additionally, our gam.check outputs look much better. The QQ plot shows a much straighter distribution of residuals that actually follow the theoretical quartiles line, and the histogram of residuals is much more normally shaped. The residuals vs linear predictions is more randomly distributed, though there is still some clustering closer to 0. The response vs fitted values still do not cluster around a 1-1 line; however, I have not been able to fix that with any additional attempts at smoothing or setting other parameters.

This appears to be the best model (at least thus far). It only explains 73.8% of deviance in the data set, but it is more important to be sure the model actually fits than incorrectly explain more of the deviance. Additionally, this model shows the effect of NAO on prey captured is not significant, but the effect of prey species is significant for prey such as hake (p<0.001), herring (p<0.05), and sand lance (p<0.001).

We can look at the predicted values from this model.

Code

predict_response(COTEgam5) %>% plot(show_data=T, jitter=T, show_ci=TRUE, ci_style=c("dash"), color="metro")

$NAO


$prey_species

Code

#this calulates the predicted values and plots them, showing the data. It creates one plot of predicted values for each of the parameters in my model (NAO and prey species). show_ci lets me show the confidence interval around my predicted mean line, and ci_style lets me choose how I want to visualize that (dashed line in this case). The colors argument has highlighted the mean of the predicted values (regression line for NAO plot and point for prey species plot).I am still trying to work out how to rotate the x-axis labels so they're legible... Tried to do this in ggplot2 and it didn't work well so not sure how to rotate the labels here.

While we can see there is a slight negative correlation between NAO and number of prey captured, it is not a significant correlation. The number of prey captured is significantly higher for hake, sand lance, herring, and, as always, prey items that could not be positively identified by observers before being eaten. The significance of these species is unsurprising because they are well recognized as preferred food sources for COTE.

GAM with Negative Binomial Distribution (ARTE)

Now to do the same model for ARTE using the same parameters.

Code

ARTEnoouts<- ternshunt %>% 
  filter(tern_species.x!="COTE", tern_species.y!="COTE", year !="2010")
ARTEgam1<- gam(data=ARTEnoouts, number_depredated~s(NAO) + prey_species, family= nb()) #family = nb() gives me a negative binomial distribution - r will automatically calculate the theta value for me in order to manage the overdispersion and display that value next to the family type in the summary
summary(ARTEgam1)


Family: Negative Binomial(1.182) 
Link function: log 

Formula:
number_depredated ~ s(NAO) + prey_species

Parametric coefficients:
                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                     1.52585    0.51771   2.947 0.003205 ** 
prey_speciesAtlanticsaury      -0.49867    0.68819  -0.725 0.468694    
prey_speciesButterfish         -0.10833    0.61649  -0.176 0.860514    
prey_speciesCapelin             0.45044    0.87489   0.515 0.606652    
prey_speciesCod                 0.38480    0.68411   0.562 0.573786    
prey_speciesCunner             -1.64569    1.45642  -1.130 0.258493    
prey_speciesEuphausiid          0.60762    0.63819   0.952 0.341041    
prey_speciesHake                4.00730    0.55525   7.217 5.31e-13 ***
prey_speciesHerring             1.57935    0.55888   2.826 0.004715 ** 
prey_speciesLumpfish           -0.80743    0.66353  -1.217 0.223654    
prey_speciesMackerel           -1.17940    0.80447  -1.466 0.142633    
prey_speciesMarine Invert       2.19906    0.56440   3.896 9.77e-05 ***
prey_speciesMottledsculpin      1.43994    1.08619   1.326 0.184945    
prey_speciesMummichug          -1.02619    0.86672  -1.184 0.236417    
prey_speciesNeedlefish         -0.74297    0.83778  -0.887 0.375170    
prey_speciesPipefish           -0.92159    1.27182  -0.725 0.468684    
prey_speciesPollock            -0.04925    0.61205  -0.080 0.935870    
prey_speciesRedfish            -0.57006    0.82237  -0.693 0.488190    
prey_speciesSandlance           2.93442    0.55557   5.282 1.28e-07 ***
prey_speciesSculpin             0.28818    0.88433   0.326 0.744521    
prey_speciesSilverside          0.85609    0.76238   1.123 0.261471    
prey_speciesSnipefish           0.27576    0.64488   0.428 0.668935    
prey_speciesSquid              -0.50250    0.81436  -0.617 0.537201    
prey_speciesStickleback         0.28536    0.57226   0.499 0.618026    
prey_speciesTerrestrial Invert  2.13232    0.59023   3.613 0.000303 ***
prey_speciesUnknown             3.47611    0.55537   6.259 3.87e-10 ***
prey_speciesUnknown Invert      2.11748    0.64601   3.278 0.001046 ** 
prey_speciesUnknown Species     2.29558    0.75276   3.050 0.002292 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
         edf Ref.df Chi.sq p-value
s(NAO) 3.014  3.689  4.876   0.221

R-sq.(adj) =  0.555   Deviance explained = 72.1%
-REML = 889.69  Scale est. = 1         n = 220

Code

plot(ARTEgam1, shade=T, shade.col = "lightgreen", se=T, residuals=TRUE, pch=1, cex=0.7, shift = coef(ARTEgam1)[1])

Code

gam.check(ARTEgam1)


Method: REML   Optimizer: outer newton
full convergence after 3 iterations.
Gradient range [-1.073624e-09,3.051843e-09]
(score 889.6921 & scale 1).
Hessian positive definite, eigenvalue range [0.3383374,95.24936].
Model rank =  37 / 37 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

         k'  edf k-index p-value
s(NAO) 9.00 3.01    0.88    0.26

Our gam.check gives very similar outputs to our COTE data, indicating that our negative binomial GAM is still our best fitting model for ARTE data (as it should be). Hake, herring, sand lance, and invertebrates are all indicated as statistically significant prey species for ARTE. Number of prey captured by ARTE appears stable until NAO reaches ~1, at which point there is a slight negative correlation; however, this relationship is not statistically significant.

Code

predict_response(ARTEgam1) %>% plot(show_data=T, jitter=T, show_ci=TRUE, ci_style=c("dash"), color="reefs")

$NAO


$prey_species

With our predicted values, we can see that the number of ARTE hunts predicted to be successful is greatest when NAO is around 1; we know that this relationship is not significant, but we can now see that the lack of significance is likely due to the large degree of variability, as indicated by the broad 95% confidence intervals.

My apologies for the x-axis labels - I’m still fighting with code on this.

Footnotes

I used pivot_longer to make “prey species” a single variable associated with “number observed,” which is the number of that species observed being eaten by terns in a given observation period.↩︎
I wrote the “long” data set out as a csv file and then read it back in so I may work with it to create summary tables.↩︎
O/hr =Observations per hour, $x$ = prey observed, and $y$ = length of observation period in hours↩︎
NAO Index Data provided by the Climate Analysis Section, NCAR, Boulder, USA, Hurrell (2003). Updated regularly. Accessed 15 March 2025.

Hurrell, James &, Phillips, Adam & National Center for Atmospheric Research Staff (Eds). Last modified 2024-04-18 “The Climate Data Guide: Hurrell North Atlantic Oscillation (NAO) Index (PC-based).” Retrieved from https://climatedataguide.ucar.edu/climate-data/hurrell-north-atlantic-oscillation-nao-index-pc-based on 2025-03-10.↩︎