Country Island Tern Chick Diets 1998 - 2019

Author

Rikki Clark

What are Terns Eating at Country Island, NS?

The end goal of working with Tern diet data collected by Canadian Wildlife Service (CWS) on Country Island, NS yearly from 1998 to 2019 is to establish if there is any correlation between diet shifts of Common Terns (COTE) and Arctic Terns (ARTE) and local climate/weather events. Roseate Terns (ROST) are included in the data set but will not be included in the final analysis as they are a special case being studied in detail by another graduate student.

Code
library(tidyverse)
library(readr)
library(rmarkdown)
library(dplyr)
library(tinytex)
library(kableExtra)
library(ggmap)
library(ggspatial)
library(sf)
library(leaflet)
library(mapview)
library(htmlwidgets)
library(ggplot2) #loading all libraires I need
library(magick)
library(png)
library(grid)
library(ggeffects)
library(modelr)
library(mgcv)
library(broom)
library(jtools)
library(ggstance)
library(MASS)
library(pscl)
ternpair <- image_read('ternpair.jpg')#reads in image I have saved into my directory
image_write(ternpair, path = 'ternpair.png', format = "png") #saves the image as a png file, which is better to work with
ternnest<- image_read("ternnest.jpg")
image_write(ternnest, path= "ternnest.png", format = "png")#these lines are same as above

ternpairpng <- image_read("ternpair.png")|>#puts image in an object to work with
image_rotate(90) |>#rotates image 90 degrees so it's properly oriented
  image_scale("100")#resized the image proportionally so width is 100 pixels
image_write(ternpairpng, path = "matedARTE.png")#saves the image 

ternnestpng <- image_read("ternnest.png")|>
  image_scale("300")
image_write(ternnestpng, path="COTEnest.png") #all same as above for mated ARTE
(a) Mated Arctic Terns
(b) Common Tern with Chick

Arctic and Common Terns at Country Island Breeding Colony

Figure 1

The Data

This data has been collected, stored, and provided by CWS and the Country Island seabird breeding monitoring program.

Code
myterns <- read.csv("tern_feeding_rates.csv") #reading in my data set and placing it in an object called "myterns"

Data Wrangling

Reformatting the Data Set to a Long Format

The data set initially had a separate variable column for each prey species, which can make data wrangling more difficult or tedious, particularly if I want to know the sum, mean, median, etc of different prey species taken. In order to more easily manipulate data for analysis, I have rearranged the data set.1

Code
#This is thanks in large part to Paige, who showed me how
myternslong <- myterns %>% #creating a new object that will hold the pivoted data set
  # pivot data into long format
  pivot_longer(cols = starts_with("num_"), # all species columns start the same
               names_to = "prey_species", # name the new column
               names_prefix = "num_", # remove that prefix from all values
               values_to = "num_observed", # name the column ind. count
               values_drop_na = TRUE) %>% # remove NAs
  filter(num_observed != 0) %>% # remove values of zeros
    # for a bar chart you need 1 row for each observation
  map_df(., rep, .$num_observed) %>% # repeats each row the n times in observed column
  mutate(num_observed = 1, # changes the values all to 1 
         prey_species = str_replace_all(prey_species, pattern = "_", replacement = " ")) %>% 
  # HOW TO CHANGE SPECIES INTO 1 CATEGORY
  # you need to have the last line with the TRUE statement, but if you want to make
  # any other changes, structure it exactly like the first prey_species %in% line
  mutate(prey_species = case_when(prey_species %in% c("ter inv", "unknown", "unknown fish", "unknown inv", "unknown sp") ~ "unknown",
                                  prey_species == "atl saury" ~ "atlantic saury",
                                  prey_species == "amphipod sp" ~ "amphipods",
                                  prey_species == "mar inv" ~ "marine invertebrates",
                                  TRUE ~ prey_species),
         prey_species = str_to_title(prey_species))

Saving the Reformatted Data Set as a .csv File

It is good practice to save your rearranged and cleaned data as a new .csv file that can be shared with others or used for further analysis.2

Code
write.csv(x= myternslong, file = "tern_feeding_longer.csv") #saving the pivoted data set as a .csv file so it can be used by others as well as for my own further analysis
longtern <- read.csv("tern_feeding_longer.csv") #reading the pivoted data set in so it can be used for summary tables

Summary Tables

Total Prey Capture by Each Tern Species

While the feeding ecology between COTE and ARTE is similar, there are some differences that may result in greater foraging success by one species over the other at Country Island, NS.

Code
terntable1 <- longtern %>% #make a new object for data wrangling to maintain integrity of original data set
  filter(tern_species != "ROST", year != "2023") %>% #filter out unwanted observations - anything related to ROST and year 2023
  group_by(tern_species, year, prey_species) %>% #group the data first by the species of tern, then by the year, then by the prey species
  summarise(number_depredated = sum(num_observed, na.rm = T)) %>% #make a summary to turn into a table that will show the the total number of each prey species observed being eaten by each tern species each year
  ungroup() #this is important so that R doesn't think you're still working with the above groupings for future wrangling

terntable1 %>% 
   kbl(caption = "Total Prey of Arctic and Common Terns by Year") %>% #kbl makes the table, caption gives the title of the table
  kable_classic(full_width = F, html_font = "Cambria") #gives the aesthetic of the table and specifies the font
Total Prey of Arctic and Common Terns by Year
tern_species year prey_species number_depredated
ARTE 1998 Butterfish 1
ARTE 1998 Hake 358
ARTE 1998 Lumpfish 1
ARTE 1998 Marine Invert 3
ARTE 1998 Mottledsculpin 18
ARTE 1998 Mummichug 2
ARTE 1998 Needlefish 2
ARTE 1998 Sandlance 24
ARTE 1998 Stickleback 6
ARTE 1998 Unknown 468
ARTE 1999 Hake 331
ARTE 1999 Herring 22
ARTE 1999 Marine Invert 2
ARTE 1999 Sandlance 490
ARTE 1999 Stickleback 3
ARTE 1999 Unknown 143
ARTE 2000 Hake 300
ARTE 2000 Herring 27
ARTE 2000 Marine Invert 44
ARTE 2000 Pipefish 2
ARTE 2000 Pollock 7
ARTE 2000 Sandlance 77
ARTE 2000 Stickleback 7
ARTE 2000 Unknown 221
ARTE 2001 Hake 84
ARTE 2001 Herring 44
ARTE 2001 Sandlance 79
ARTE 2001 Unknown 85
ARTE 2002 Hake 95
ARTE 2002 Herring 17
ARTE 2002 Marine Invert 6
ARTE 2002 Sandlance 58
ARTE 2002 Snipefish 1
ARTE 2002 Terrestrial Invert 1
ARTE 2002 Unknown 114
ARTE 2003 Hake 194
ARTE 2003 Herring 28
ARTE 2003 Pollock 2
ARTE 2003 Sandlance 19
ARTE 2003 Stickleback 1
ARTE 2003 Unknown 46
ARTE 2004 Butterfish 1
ARTE 2004 Cod 3
ARTE 2004 Hake 141
ARTE 2004 Herring 12
ARTE 2004 Lumpfish 1
ARTE 2004 Marine Invert 81
ARTE 2004 Pollock 8
ARTE 2004 Redfish 2
ARTE 2004 Sandlance 83
ARTE 2004 Stickleback 3
ARTE 2004 Unknown 55
ARTE 2005 Hake 208
ARTE 2005 Herring 2
ARTE 2005 Marine Invert 21
ARTE 2005 Sandlance 18
ARTE 2005 Stickleback 8
ARTE 2005 Terrestrial Invert 1
ARTE 2005 Unknown 115
ARTE 2006 Butterfish 1
ARTE 2006 Cod 4
ARTE 2006 Hake 63
ARTE 2006 Herring 1
ARTE 2006 Sandlance 36
ARTE 2006 Unknown 71
ARTE 2007 Butterfish 1
ARTE 2007 Cod 10
ARTE 2007 Hake 277
ARTE 2007 Herring 21
ARTE 2007 Lumpfish 5
ARTE 2007 Marine Invert 3
ARTE 2007 Pollock 7
ARTE 2007 Sandlance 212
ARTE 2007 Stickleback 6
ARTE 2007 Unknown 134
ARTE 2008 Cod 14
ARTE 2008 Hake 354
ARTE 2008 Herring 102
ARTE 2008 Lumpfish 2
ARTE 2008 Marine Invert 51
ARTE 2008 Sandlance 247
ARTE 2008 Stickleback 5
ARTE 2008 Terrestrial Invert 64
ARTE 2008 Unknown 189
ARTE 2009 Butterfish 16
ARTE 2009 Cod 2
ARTE 2009 Hake 236
ARTE 2009 Herring 44
ARTE 2009 Lumpfish 1
ARTE 2009 Marine Invert 30
ARTE 2009 Needlefish 1
ARTE 2009 Pollock 7
ARTE 2009 Sandlance 95
ARTE 2009 Stickleback 1
ARTE 2009 Terrestrial Invert 3
ARTE 2009 Unknown 102
ARTE 2010 Butterfish 4
ARTE 2010 Cod 1
ARTE 2010 Hake 100
ARTE 2010 Herring 96
ARTE 2010 Lumpfish 1
ARTE 2010 Needlefish 5
ARTE 2010 Pollock 2
ARTE 2010 Sandlance 44
ARTE 2010 Snipefish 7
ARTE 2010 Stickleback 3
ARTE 2010 Terrestrial Invert 5
ARTE 2010 Unknown 181
ARTE 2011 Butterfish 4
ARTE 2011 Capelin 15
ARTE 2011 Hake 194
ARTE 2011 Herring 15
ARTE 2011 Mackerel 1
ARTE 2011 Marine Invert 23
ARTE 2011 Needlefish 4
ARTE 2011 Sandlance 133
ARTE 2011 Silverside 21
ARTE 2011 Snipefish 4
ARTE 2011 Stickleback 14
ARTE 2011 Terrestrial Invert 39
ARTE 2011 Unknown 397
ARTE 2011 Unknown Invert 85
ARTE 2011 Unknown Species 41
ARTE 2012 Butterfish 14
ARTE 2012 Hake 397
ARTE 2012 Herring 14
ARTE 2012 Marine Invert 82
ARTE 2012 Pollock 1
ARTE 2012 Sandlance 4
ARTE 2012 Snipefish 16
ARTE 2012 Stickleback 6
ARTE 2012 Terrestrial Invert 4
ARTE 2012 Unknown 141
ARTE 2012 Unknown Species 100
ARTE 2013 Amphipod 4
ARTE 2013 Atlanticsaury 5
ARTE 2013 Butterfish 1
ARTE 2013 Capelin 1
ARTE 2013 Euphausiid 1
ARTE 2013 Hake 250
ARTE 2013 Herring 14
ARTE 2013 Marine Invert 2
ARTE 2013 Pollock 6
ARTE 2013 Sandlance 16
ARTE 2013 Sculpin 1
ARTE 2013 Silverside 4
ARTE 2013 Stickleback 3
ARTE 2013 Terrestrial Invert 2
ARTE 2013 Unknown 141
ARTE 2014 Atlanticsaury 2
ARTE 2014 Euphausiid 3
ARTE 2014 Hake 232
ARTE 2014 Herring 5
ARTE 2014 Mackerel 3
ARTE 2014 Marine Invert 2
ARTE 2014 Mummichug 2
ARTE 2014 Pollock 1
ARTE 2014 Redfish 4
ARTE 2014 Sandlance 20
ARTE 2014 Silverside 10
ARTE 2014 Snipefish 1
ARTE 2014 Unknown 71
ARTE 2015 Atlanticsaury 1
ARTE 2015 Butterfish 1
ARTE 2015 Euphausiid 10
ARTE 2015 Hake 162
ARTE 2015 Herring 22
ARTE 2015 Lumpfish 1
ARTE 2015 Sandlance 16
ARTE 2015 Snipefish 2
ARTE 2015 Stickleback 2
ARTE 2015 Unknown 114
ARTE 2015 Unknown Invert 4
ARTE 2016 Amphipod 2
ARTE 2016 Atlanticsaury 5
ARTE 2016 Euphausiid 1
ARTE 2016 Hake 466
ARTE 2016 Herring 6
ARTE 2016 Lumpfish 1
ARTE 2016 Marine Invert 240
ARTE 2016 Sandlance 39
ARTE 2016 Snipefish 15
ARTE 2016 Squid 1
ARTE 2016 Stickleback 1
ARTE 2016 Terrestrial Invert 1
ARTE 2016 Unknown 54
ARTE 2016 Unknown Invert 28
ARTE 2017 Amphipod 11
ARTE 2017 Butterfish 2
ARTE 2017 Euphausiid 18
ARTE 2017 Hake 376
ARTE 2017 Herring 7
ARTE 2017 Mackerel 1
ARTE 2017 Marine Invert 50
ARTE 2017 Sandlance 63
ARTE 2017 Stickleback 2
ARTE 2017 Terrestrial Invert 5
ARTE 2017 Unknown 138
ARTE 2017 Unknown Invert 17
ARTE 2018 Amphipod 1
ARTE 2018 Atlanticsaury 1
ARTE 2018 Cunner 1
ARTE 2018 Euphausiid 1
ARTE 2018 Hake 382
ARTE 2018 Herring 28
ARTE 2018 Marine Invert 62
ARTE 2018 Pollock 3
ARTE 2018 Sandlance 36
ARTE 2018 Squid 1
ARTE 2018 Stickleback 10
ARTE 2018 Terrestrial Invert 124
ARTE 2018 Unknown 147
ARTE 2018 Unknown Invert 81
ARTE 2019 Atlanticsaury 2
ARTE 2019 Euphausiid 21
ARTE 2019 Hake 212
ARTE 2019 Herring 9
ARTE 2019 Lumpfish 4
ARTE 2019 Mackerel 1
ARTE 2019 Marine Invert 13
ARTE 2019 Mummichug 1
ARTE 2019 Pollock 3
ARTE 2019 Redfish 2
ARTE 2019 Sandlance 98
ARTE 2019 Sculpin 12
ARTE 2019 Snipefish 3
ARTE 2019 Squid 7
ARTE 2019 Stickleback 30
ARTE 2019 Terrestrial Invert 214
ARTE 2019 Unknown 171
ARTE 2019 Unknown Invert 33
ARTE 2019 Unknown Species 3
COTE 1998 Hake 234
COTE 1998 Lumpfish 2
COTE 1998 Marine Invert 1
COTE 1998 Mottledsculpin 21
COTE 1998 Mummichug 6
COTE 1998 Needlefish 2
COTE 1998 Sandlance 40
COTE 1998 Stickleback 6
COTE 1998 Unknown 347
COTE 1999 Hake 61
COTE 1999 Herring 4
COTE 1999 Sandlance 190
COTE 1999 Unknown 58
COTE 2000 Butterfish 2
COTE 2000 Hake 175
COTE 2000 Herring 19
COTE 2000 Marine Invert 3
COTE 2000 Pollock 6
COTE 2000 Sandlance 45
COTE 2000 Stickleback 7
COTE 2000 Unknown 112
COTE 2001 Hake 34
COTE 2001 Herring 12
COTE 2001 Pollock 1
COTE 2001 Sandlance 55
COTE 2001 Stickleback 1
COTE 2001 Unknown 31
COTE 2002 Butterfish 3
COTE 2002 Hake 88
COTE 2002 Herring 14
COTE 2002 Marine Invert 1
COTE 2002 Pollock 12
COTE 2002 Sandlance 65
COTE 2002 Unknown 99
COTE 2003 Hake 314
COTE 2003 Herring 42
COTE 2003 Marine Invert 1
COTE 2003 Pollock 40
COTE 2003 Sandlance 65
COTE 2003 Stickleback 8
COTE 2003 Unknown 95
COTE 2004 Butterfish 4
COTE 2004 Cod 1
COTE 2004 Hake 98
COTE 2004 Herring 31
COTE 2004 Killifish 5
COTE 2004 Marine Invert 27
COTE 2004 Pollock 21
COTE 2004 Sandlance 160
COTE 2004 Stickleback 3
COTE 2004 Unknown 99
COTE 2005 Butterfish 1
COTE 2005 Hake 264
COTE 2005 Herring 12
COTE 2005 Lumpfish 1
COTE 2005 Marine Invert 9
COTE 2005 Pollock 7
COTE 2005 Sandlance 44
COTE 2005 Sculpin 2
COTE 2005 Stickleback 11
COTE 2005 Terrestrial Invert 1
COTE 2005 Unknown 200
COTE 2006 Cod 4
COTE 2006 Hake 129
COTE 2006 Herring 2
COTE 2006 Killifish 1
COTE 2006 Pollock 1
COTE 2006 Sandlance 90
COTE 2006 Stickleback 3
COTE 2006 Terrestrial Invert 1
COTE 2006 Unknown 108
COTE 2007 Cod 7
COTE 2007 Hake 215
COTE 2007 Herring 25
COTE 2007 Lumpfish 2
COTE 2007 Marine Invert 1
COTE 2007 Pollock 25
COTE 2007 Sandlance 275
COTE 2007 Stickleback 7
COTE 2007 Terrestrial Invert 1
COTE 2007 Unknown 102
COTE 2008 Cod 11
COTE 2008 Hake 226
COTE 2008 Herring 65
COTE 2008 Lumpfish 1
COTE 2008 Marine Invert 5
COTE 2008 Pollock 9
COTE 2008 Sandlance 98
COTE 2008 Stickleback 10
COTE 2008 Terrestrial Invert 91
COTE 2008 Unknown 122
COTE 2009 Butterfish 24
COTE 2009 Cod 9
COTE 2009 Hake 137
COTE 2009 Herring 12
COTE 2009 Lumpfish 4
COTE 2009 Marine Invert 20
COTE 2009 Needlefish 1
COTE 2009 Pollock 27
COTE 2009 Redfish 1
COTE 2009 Sandlance 76
COTE 2009 Stickleback 3
COTE 2009 Terrestrial Invert 1
COTE 2009 Unknown 67
COTE 2010 Butterfish 8
COTE 2010 Cod 4
COTE 2010 Hake 217
COTE 2010 Herring 123
COTE 2010 Lumpfish 3
COTE 2010 Needlefish 24
COTE 2010 Pollock 20
COTE 2010 Sandlance 98
COTE 2010 Sculpin 1
COTE 2010 Stickleback 11
COTE 2010 Terrestrial Invert 12
COTE 2010 Unknown 288
COTE 2011 Capelin 20
COTE 2011 Cunner 1
COTE 2011 Hake 113
COTE 2011 Herring 23
COTE 2011 Mackerel 1
COTE 2011 Marine Invert 2
COTE 2011 Mummichug 2
COTE 2011 Needlefish 3
COTE 2011 Pollock 4
COTE 2011 Redfish 2
COTE 2011 Sandlance 51
COTE 2011 Sculpin 1
COTE 2011 Silverside 5
COTE 2011 Snipefish 2
COTE 2011 Stickleback 11
COTE 2011 Terrestrial Invert 12
COTE 2011 Unknown 224
COTE 2011 Unknown Invert 14
COTE 2011 Unknown Species 6
COTE 2012 Butterfish 26
COTE 2012 Cunner 2
COTE 2012 Hake 454
COTE 2012 Herring 26
COTE 2012 Marine Invert 41
COTE 2012 Mummichug 2
COTE 2012 Pollock 4
COTE 2012 Redfish 1
COTE 2012 Sandlance 10
COTE 2012 Sculpin 1
COTE 2012 Silverside 1
COTE 2012 Snipefish 24
COTE 2012 Stickleback 4
COTE 2012 Terrestrial Invert 9
COTE 2012 Unknown 184
COTE 2012 Unknown Species 105
COTE 2013 Amphipod 6
COTE 2013 Atlanticsaury 7
COTE 2013 Butterfish 1
COTE 2013 Euphausiid 1
COTE 2013 Hake 320
COTE 2013 Herring 34
COTE 2013 Marine Invert 11
COTE 2013 Pollock 6
COTE 2013 Redfish 1
COTE 2013 Sandlance 28
COTE 2013 Silverside 1
COTE 2013 Snipefish 2
COTE 2013 Stickleback 4
COTE 2013 Terrestrial Invert 6
COTE 2013 Unknown 131
COTE 2013 Unknown Invert 1
COTE 2014 Euphausiid 4
COTE 2014 Hake 242
COTE 2014 Herring 7
COTE 2014 Lumpfish 5
COTE 2014 Mackerel 4
COTE 2014 Marine Invert 3
COTE 2014 Pollock 2
COTE 2014 Redfish 3
COTE 2014 Sandlance 11
COTE 2014 Silverside 13
COTE 2014 Snipefish 1
COTE 2014 Terrestrial Invert 1
COTE 2014 Unknown 133
COTE 2015 Atlanticsaury 1
COTE 2015 Butterfish 21
COTE 2015 Hake 128
COTE 2015 Herring 38
COTE 2015 Lumpfish 1
COTE 2015 Marine Invert 1
COTE 2015 Pollock 4
COTE 2015 Sandlance 27
COTE 2015 Snipefish 2
COTE 2015 Unknown 164
COTE 2015 Unknown Invert 1
COTE 2016 Amphipod 1
COTE 2016 Atlanticsaury 3
COTE 2016 Cunner 8
COTE 2016 Euphausiid 1
COTE 2016 Hake 252
COTE 2016 Herring 2
COTE 2016 Mackerel 1
COTE 2016 Marine Invert 9
COTE 2016 Pollock 2
COTE 2016 Redfish 1
COTE 2016 Sandlance 52
COTE 2016 Snipefish 5
COTE 2016 Terrestrial Invert 4
COTE 2016 Unknown 54
COTE 2016 Unknown Invert 7
COTE 2017 Butterfish 6
COTE 2017 Cunner 4
COTE 2017 Euphausiid 6
COTE 2017 Hake 407
COTE 2017 Herring 14
COTE 2017 Lumpfish 1
COTE 2017 Marine Invert 2
COTE 2017 Pollock 3
COTE 2017 Redfish 1
COTE 2017 Sandlance 34
COTE 2017 Stickleback 3
COTE 2017 Terrestrial Invert 14
COTE 2017 Unknown 132
COTE 2017 Unknown Invert 37
COTE 2018 Atlanticsaury 1
COTE 2018 Cunner 40
COTE 2018 Euphausiid 1
COTE 2018 Hake 226
COTE 2018 Herring 38
COTE 2018 Lumpfish 2
COTE 2018 Mackerel 1
COTE 2018 Marine Invert 12
COTE 2018 Pollock 13
COTE 2018 Sandlance 35
COTE 2018 Stickleback 11
COTE 2018 Terrestrial Invert 49
COTE 2018 Unknown 96
COTE 2018 Unknown Invert 25
COTE 2019 Atlanticsaury 1
COTE 2019 Cunner 4
COTE 2019 Euphausiid 18
COTE 2019 Hake 202
COTE 2019 Herring 2
COTE 2019 Lumpfish 4
COTE 2019 Marine Invert 6
COTE 2019 Mummichug 3
COTE 2019 Pollock 3
COTE 2019 Redfish 3
COTE 2019 Sandlance 56
COTE 2019 Sculpin 5
COTE 2019 Snipefish 8
COTE 2019 Squid 5
COTE 2019 Stickleback 42
COTE 2019 Terrestrial Invert 21
COTE 2019 Unknown 170
COTE 2019 Unknown Invert 23

Hake is a preferred prey item for both ARTE and COTE; however, ARTE show a clear and consistent preference for hake every year while COTE demonstrate a willingness to shift focus to similar species such as herring or sand lance, depending on the year. Reasons for such dietary shifts are currently unclear, but further understanding should be achieved when these dietary patterns are modeled with regional climate data from relevant years.

Visualization is always helpful.

Code
preyspplot<- ggplot(data = terntable1, mapping = aes(x=year, y = number_depredated,group= prey_species, col=prey_species))+ #says which data to use, what x and y axis will be, and to color-code data according to prey species
  geom_point()+
  geom_line()+
  labs(x="Year", y="Prey per year")+
  facet_wrap(~tern_species)+
  theme_classic()#all same as previous plots
preyspplot

Observation Effort for Each Tern Species

While climate and other environmental factors undoubtedly influence any differences in prey capture between years, it is crucial to account for human factors that influence data collection so as to draw accurate conclusions about diet shifts across time. The most important human factor to be accounted for in this study is observation effort. Effort must be quantified to establish if higher numbers of captured prey are truly reflective of what took place in the tern breeding colony or if they are reflective of greater observation effort.

One simple way to easily establish if differences in total prey capture across years is reflective of environmental factors rather than human factors is to compare the number of total prey observations per hour for each year. This can be calculated for each year using the following formula:3

$$O/hr=\frac{x_{1}+x_{2}+\cdots+x_{n}}{y_{1}+y_{2}+\cdots+y_{n}}$$

Code
terntable2<- longtern %>% 
  filter(tern_species != "ROST", year != "2023") %>% 
  group_by(tern_species, year) %>% #group by tern species and then by year
  summarise(prey_per_hour = sum(total_prey, na.rm = T)/sum(time_hr, na.rm = T)) %>% #take the sum of total prey -all prey observed regarless of species- and divide it by the sum of all hours of observation. na.rm=T removes and "NAs"
  ungroup()

terntable2 %>% 
   kbl(caption = "Total Prey Capture by ARTE and COTE per Hour For Each Year") %>% #kbl makes the table, caption gives the title of the table
  kable_classic(full_width = F, html_font = "Cambria")
Total Prey Capture by ARTE and COTE per Hour For Each Year
tern_species year prey_per_hour
ARTE 1998 3.535083
ARTE 1999 3.627144
ARTE 2000 3.035268
ARTE 2001 2.934931
ARTE 2002 2.715753
ARTE 2003 1.938567
ARTE 2004 2.430052
ARTE 2005 1.745283
ARTE 2006 1.810888
ARTE 2007 1.552172
ARTE 2008 2.656043
ARTE 2009 2.351301
ARTE 2010 2.394350
ARTE 2011 3.472732
ARTE 2012 2.806804
ARTE 2013 2.626386
ARTE 2014 1.830509
ARTE 2015 1.800600
ARTE 2016 6.196292
ARTE 2017 3.423790
ARTE 2018 4.957818
ARTE 2019 5.197019
COTE 1998 2.953718
COTE 1999 3.544728
COTE 2000 2.071816
COTE 2001 4.092024
COTE 2002 2.007092
COTE 2003 1.780576
COTE 2004 1.399775
COTE 2005 1.855072
COTE 2006 2.521418
COTE 2007 1.384294
COTE 2008 3.100313
COTE 2009 1.808901
COTE 2010 1.971840
COTE 2011 1.593694
COTE 2012 2.623602
COTE 2013 2.202691
COTE 2014 1.961539
COTE 2015 2.126779
COTE 2016 1.632183
COTE 2017 2.559564
COTE 2018 1.970596
COTE 2019 2.832028

When examining the number of successful prey captures observed per hour, differences in feeding rates across years and between ARTE and COTE are much easier to note. For example, ARTE had distinctly higher feeding rates in 2016, 6.20 prey items per hour, than did COTE in the same year, 1.97 prey items per hour. The wide range of prey items per hour across years is also notable given that some years show much higher success rates than others for both species, though years with higher catch rates do not always line up between the two species. Further examination not only of climate factors, but also of the feeding ecologies of ARTE and COTE will be required to gain insight into the driving factors of these shifts in foraging success.

It may be helpful to visualize this:

Code
prettytern<-image_read(path="prettyternpng.png")
plottern<- rasterGrob(prettytern, interpolate=T)
obsefplot<- ggplot(data = terntable2, mapping = aes(x=year, y = prey_per_hour,group= tern_species, col=tern_species))+ #says which data to use, what x and y axis will be, and to color-code data according to tern species
  geom_point()+
  geom_line()+
  labs(x="Year", y="Prey per Hour")+
  annotation_custom(plottern, xmin=2005, xmax=2010, ymin=4, ymax=6)+
  theme_classic()#all same as previous plots
obsefplot

Preliminary GAM

To look at the affects of climate on tern diet and foraging success, I can start by looking at the feeding rates (prey/hour/year) calculated in the Summary Tables section and compare that to North Atlantic Oscillation (NAO) data4.

Testing the code

I first need to merge my tern data with yearly NAO data to make a single data set on which I can run my General Additive Model, which I have chosen because I do not expect my data to be normally distributed.

Code
library(mgcv) #load library for GAM
annualnao<-read.csv(file="yearly_nao_pcbased.csv")#read in NAO data
ternclimate<-merge(x = terntable2, y = annualnao, by = "year", all = TRUE)#merge the tern data and the NAO data into a single data set in the object ternclimate - both data sets share a year column and annualnao will merge left into terntable2 aligned by year for all data points that share year
str(ternclimate)#check to make sure the merge worked properly
'data.frame':   48 obs. of  4 variables:
 $ year         : int  1998 1998 1999 1999 2000 2000 2001 2001 2002 2002 ...
 $ tern_species : chr  "ARTE" "COTE" "COTE" "ARTE" ...
 $ prey_per_hour: num  3.54 2.95 3.54 3.63 2.07 ...
 $ NAO          : num  -0.28 -0.28 0.57 0.57 0.65 0.65 -0.57 -0.57 0.34 0.34 ...
Code
gamtester<- gam(data=ternclimate, prey_per_hour~s(year)+s(NAO), method="REML")#tells the gam argument to use ternclimate, that I want to know the non-linear effects of year and NAO on number of prey caugh per hour - REML is Restricted Maximum Likelihood Method, and is recommended because it tends to be more reliable and stable for newbies like me.
summary(gamtester) #gives a summary of gam info

Family: gaussian 
Link function: identity 

Formula:
prey_per_hour ~ s(year) + s(NAO)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.6144     0.1356   19.27   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
          edf Ref.df     F p-value   
s(year) 2.790  3.458 4.362 0.00781 **
s(NAO)  1.104  1.197 0.449 0.48499   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.238   Deviance explained = 30.7%
-REML = 61.253  Scale est. = 0.80957   n = 44
Code
plot(gamtester, shade=T, shade.col = "lightgreen", se=T, shift = coef(gamtester)[1], pages = 1) #plots the gam, shade=T shades in the standard error, shade.col= allows me to pick the color, se=T says I want to see the standard error, shift=coef(gamtester)[1] lets me shift the y-axis to include the intercept so the y-axis is no longer relative - aka the number of prey captured per hour of observation around 2007 was actually close to 2

This was a good first test to make sure I can get the code to work, but I also need information about the prey types and how that has changed through the years. To keep my data wrangling intact and make a singular data set to work with, I will merge “ternclimate” with “terntable1.”

Additionally, I want a quick visualization of NAO through the years, as I suspect that NAO has a much greater effect than year does on prey captured, and it would be nice to see the actual data points plotted to help me understand if my models make sense.

Code
naoyearplot<- ggplot(data = ternclimate, mapping = aes(x=year, y = NAO,))+
  geom_point()+
  geom_line()+
  labs(x="Year", y="NAO")+
  theme_classic()#all same as previous plots
naoyearplot

The NAO in 2010 is much lower than in other years, and there is a large gap between that data point and the rest of the data. This will likely leave a large gap between between residuals/fitted values when NAO is ~-3 and the rest of the data because there are no years with NAO values between -3 and -0.6. However, I will leave this and associated data in the model for now because it can still provide useful information on the relationship between NAO and prey capture by terns at Country Island. I am hopeful that it will not skew the model.

Now I’ll merge terntable1 with ternclimate and check the structure of the new data set (ternshunt) to make sure the merge worked.

Code
ternshunt<-merge(x = ternclimate, y = terntable1, by = "year", all = TRUE)
str(ternshunt) #checking to make sure the merge worked - it did
'data.frame':   972 obs. of  7 variables:
 $ year             : int  1998 1998 1998 1998 1998 1998 1998 1998 1998 1998 ...
 $ tern_species.x   : chr  "ARTE" "ARTE" "ARTE" "ARTE" ...
 $ prey_per_hour    : num  3.54 3.54 3.54 3.54 3.54 ...
 $ NAO              : num  -0.28 -0.28 -0.28 -0.28 -0.28 -0.28 -0.28 -0.28 -0.28 -0.28 ...
 $ tern_species.y   : chr  "ARTE" "ARTE" "ARTE" "ARTE" ...
 $ prey_species     : chr  "Butterfish" "Hake" "Lumpfish" "Marine Invert" ...
 $ number_depredated: int  1 358 1 3 18 2 2 24 6 468 ...

I now have two tern species variables, one from each data set, but this won’t be a problem as they are the same so it won’t affect my modeling.

GAM with Default (Gaussian) Distribution

Now I can run some GAMS to see how NAO may affect the number of each prey species caught each year. I will split this up by tern species.

Code
COTEgam1<- ternshunt %>% 
  filter(tern_species.x!="ARTE", tern_species.y!="ARTE") #making an object with data for only COTE 
  
COTEgam1a<- gam(data=COTEgam1, number_depredated~s(NAO) + s(year)+ prey_species, method="REML")
summary(COTEgam1a) #note that the summary will show the "intercept" as the first factor in alphabetical order - in this case, Amphipods in prey species are top in alphabet so they are listed as "intercept"

Family: gaussian 
Link function: identity 

Formula:
number_depredated ~ s(NAO) + s(year) + prey_species

Parametric coefficients:
                                Estimate Std. Error t value Pr(>|t|)    
(Intercept)                      1.36207   32.53720   0.042   0.9666    
prey_speciesAtlanticsaury       -0.03648   38.42282  -0.001   0.9992    
prey_speciesButterfish           7.86081   35.70159   0.220   0.8259    
prey_speciesCapelin             25.02225   56.32612   0.444   0.6573    
prey_speciesCod                  2.75732   37.72824   0.073   0.9418    
prey_speciesCunner               6.42819   37.48099   0.172   0.8640    
prey_speciesEuphausiid           1.05906   37.48695   0.028   0.9775    
prey_speciesHake               206.08956   34.05373   6.052 6.03e-09 ***
prey_speciesHerring             25.60682   34.09578   0.751   0.4534    
prey_speciesKillifish            4.81243   46.18643   0.104   0.9171    
prey_speciesLumpfish            -0.41291   35.34284  -0.012   0.9907    
prey_speciesMackerel             1.66358   39.78509   0.042   0.9667    
prey_speciesMarine Invert        9.16032   34.32040   0.267   0.7898    
prey_speciesMottledsculpin      26.22700   56.92402   0.461   0.6454    
prey_speciesMummichug            1.58505   39.82607   0.040   0.9683    
prey_speciesNeedlefish           3.02961   40.04460   0.076   0.9398    
prey_speciesPollock              9.65150   34.14713   0.283   0.7777    
prey_speciesRedfish             -1.90137   36.28764  -0.052   0.9583    
prey_speciesSandlance           72.86229   34.05373   2.140   0.0335 *  
prey_speciesSculpin             -5.43982   38.53803  -0.141   0.8879    
prey_speciesSilverside           3.65289   39.76595   0.092   0.9269    
prey_speciesSnipefish            4.52180   36.80831   0.123   0.9023    
prey_speciesSquid               -6.29996   56.31643  -0.112   0.9110    
prey_speciesStickleback          7.24732   34.48375   0.210   0.8337    
prey_speciesTerrestrial Invert  12.32842   34.73703   0.355   0.7230    
prey_speciesUnknown            136.99865   34.05373   4.023 7.88e-05 ***
prey_speciesUnknown Invert      13.72587   36.82336   0.373   0.7097    
prey_speciesUnknown Species     55.20658   45.93123   1.202   0.2307    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
          edf Ref.df     F p-value  
s(NAO)  1.001  1.001 4.441  0.0362 *
s(year) 1.000  1.000 1.545  0.2152  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.645   Deviance explained = 68.6%
-REML = 1195.1  Scale est. = 2106.6    n = 252
Code
plot(COTEgam1a, shade=T, shade.col = "lightgreen", se=T, residuals=TRUE, pch=1, cex=0.7, shift = coef(COTEgam1a)[1]) #plots the gam, shade=T shades in the standard error, shade.col= allows me to pick the color, se=T says I want to see the standard error, shift=coef(gamtester)[1] lets me shift the y-axis to include the intercept so the y-axis is no longer relative but shows the actual predicted number of prey captured at specific NOA values, my residuals are shown as circles (set by pch=1 with their size set by cex=0.7) 

We can see from the printed output as well as the plot that year does not have a significant effect on number of prey caught each year in the summer months, but NOA does (p<0.05). However, our residual sum of squares is HUGE (Scale est. = 2106.6), and we are predicting values below zero (it is not possible to catch less than 0 fish), so our Gaussian distribution is incorrect. Instead, we can try a distribution method that does not allow for negative values, like a poisson distribution. It’s common in the literature to use poisson distributions to handle count data, especially for these sorts of studies that involve observation watches to gather information on prey type and number of prey caught.

GAM with a Poisson Distribution

Code
COTEgam2<- gam(data=COTEgam1, number_depredated~s(NAO) + prey_species, family=poisson, method="REML")
summary(COTEgam2) #note that the summary will show the "intercept" as the first factor in alphabetical order - in this case, Amphipods in prey species are top in alphabet so they are listed as "intercept"

Family: poisson 
Link function: log 

Formula:
number_depredated ~ s(NAO) + prey_species

Parametric coefficients:
                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                     1.21170    0.37814   3.204  0.00135 ** 
prey_speciesAtlanticsaury      -0.20731    0.46900  -0.442  0.65847    
prey_speciesButterfish          1.00567    0.39171   2.567  0.01025 *  
prey_speciesCapelin             1.96731    0.44136   4.457 8.30e-06 ***
prey_speciesCod                 0.46046    0.41328   1.114  0.26521    
prey_speciesCunner              1.11432    0.40006   2.785  0.00535 ** 
prey_speciesEuphausiid          0.44194    0.41859   1.056  0.29107    
prey_speciesHake                4.10260    0.37840  10.842  < 2e-16 ***
prey_speciesHerring             2.03384    0.38055   5.345 9.07e-08 ***
prey_speciesKillifish          -0.19626    0.55645  -0.353  0.72431    
prey_speciesLumpfish           -0.37427    0.42604  -0.878  0.37967    
prey_speciesMackerel           -0.57345    0.53486  -1.072  0.28365    
prey_speciesMarine Invert       0.96265    0.38655   2.490  0.01276 *  
prey_speciesMottledsculpin      1.74303    0.43671   3.991 6.57e-05 ***
prey_speciesMummichug          -0.01886    0.46915  -0.040  0.96793    
prey_speciesNeedlefish          0.68840    0.42021   1.638  0.10137    
prey_speciesPollock             1.12587    0.38437   2.929  0.00340 ** 
prey_speciesRedfish            -0.70287    0.46897  -1.499  0.13393    
prey_speciesSandlance           3.06368    0.37893   8.085 6.21e-16 ***
prey_speciesSculpin            -0.59935    0.49328  -1.215  0.22435    
prey_speciesSilverside          0.41450    0.43936   0.943  0.34547    
prey_speciesSnipefish           0.67484    0.40714   1.658  0.09741 .  
prey_speciesSquid               0.43102    0.58611   0.735  0.46211    
prey_speciesStickleback         0.89458    0.38714   2.311  0.02085 *  
prey_speciesTerrestrial Invert  1.52561    0.38403   3.973 7.11e-05 ***
prey_speciesUnknown             3.69448    0.37855   9.760  < 2e-16 ***
prey_speciesUnknown Invert      1.58405    0.39025   4.059 4.93e-05 ***
prey_speciesUnknown Species     2.86525    0.39055   7.336 2.19e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
         edf Ref.df Chi.sq p-value    
s(NAO) 6.188  7.151  259.4  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.645   Deviance explained = 80.1%
-REML = 2837.2  Scale est. = 1         n = 252
Code
plot(COTEgam2, shade=T, shade.col = "lightgreen", se=T, residuals=TRUE, pch=1, cex=0.7, shift = coef(COTEgam2)[1])#you will still see negative residuals here because you haven't taken the antilog, but when we plot predicted values, there should be none below zero

We can see from the summary that this model explains roughly 80% of the effect that NAO and prey species has on number of prey caught per year. We can also see that the residual sum of squares in this model is 1 (Scale est. =1), which is much better than the model using the Gaussian distribution where the residual sum of squares was 2106.6. It should be noted that all the predicted values are positive for this model, so it does make biological sense.

Code
predict_response(COTEgam2) %>% plot(show_data=T, jitter=T, show_ci=TRUE, ci_style=c("dash"), colors="metro")#predicts response variables according to the model and then I can plot them with ggeffects plot() function, choosing to show the data, jitter the data so it's easier to see, show confidence intervals as dashed lines, and use a colour scheme
$NAO


$prey_species

I still need to run a check of the model to make sure it is the best one. Thankfully, R has gam.check() for that.

Code
gam.check(COTEgam2)##runs a the whole series of model validaton checks for me in just one command. I get a summary of stats tests to check for model fit as well as a QQ plot, a residuals vs linear predictions plot, a response vs fitted values plot, and a histogram of residuals. 


Method: REML   Optimizer: outer newton
full convergence after 4 iterations.
Gradient range [1.692766e-05,1.692766e-05]
(score 2837.177 & scale 1).
Hessian positive definite, eigenvalue range [0.7326335,0.7326335].
Model rank =  37 / 37 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

         k'  edf k-index p-value
s(NAO) 9.00 6.19    0.98     0.4

We can see from looking at the outputs that there are a few problems with the current model. Ideally, our deviance of residuals will be a roughly straight line following the red regression line along the theoretical quantiles. Our histogram of residuals, while not terrible, does show a light left skew. In our residuals vs linear predictors plot, our residuals do not look randomly distributed left of the predictor value of 3, and our response vs fitted values plot is more widely distributed than ideal given we want our points to cluster around a 1-1 line to indicate a well-fitted model.

Our summary output shows that our k-index (a measure of the basis functions used to fit the model) is less than one, which indicates that the default number of basis functions in the model may be too low to properly capture the relationships. We can fix this by specifying the number of basis functions in the gam formula. Our current output indicates that k, our basis dimension, is 9. We will try increasing that to 20 to see if this fixes the issues with over-smoothing in our model (forcing the data into being more linear than is true).

Code
#Run the same gam as before with addition of specifying the basis dimensions
COTEgam3<- gam(data=COTEgam1, number_depredated~s(NAO,k=20) + prey_species, family=poisson, method="REML") #k=20 allows me to specify basis dimensions to fix over-smoothing in the model
summary(COTEgam3)

Family: poisson 
Link function: log 

Formula:
number_depredated ~ s(NAO, k = 20) + prey_species

Parametric coefficients:
                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                      1.3230     0.3791   3.489 0.000484 ***
prey_speciesAtlanticsaury       -0.3301     0.4695  -0.703 0.482032    
prey_speciesButterfish           0.8523     0.3928   2.170 0.030020 *  
prey_speciesCapelin              1.8549     0.4428   4.189 2.80e-05 ***
prey_speciesCod                  0.3516     0.4147   0.848 0.396577    
prey_speciesCunner               0.8942     0.4011   2.229 0.025786 *  
prey_speciesEuphausiid           0.3328     0.4193   0.794 0.427365    
prey_speciesHake                 3.9886     0.3795  10.509  < 2e-16 ***
prey_speciesHerring              1.9371     0.3816   5.076 3.86e-07 ***
prey_speciesKillifish           -0.1420     0.5580  -0.254 0.799180    
prey_speciesLumpfish            -0.5777     0.4272  -1.352 0.176295    
prey_speciesMackerel            -0.6499     0.5356  -1.214 0.224928    
prey_speciesMarine Invert        0.7757     0.3876   2.001 0.045392 *  
prey_speciesMottledsculpin       1.3423     0.4395   3.055 0.002254 ** 
prey_speciesMummichug           -0.3654     0.4704  -0.777 0.437299    
prey_speciesNeedlefish           0.5304     0.4215   1.258 0.208331    
prey_speciesPollock              1.0186     0.3855   2.643 0.008230 ** 
prey_speciesRedfish             -0.8440     0.4696  -1.797 0.072328 .  
prey_speciesSandlance            2.9497     0.3801   7.761 8.41e-15 ***
prey_speciesSculpin             -0.8606     0.4942  -1.742 0.081590 .  
prey_speciesSilverside           0.1978     0.4403   0.449 0.653218    
prey_speciesSnipefish            0.5079     0.4079   1.245 0.213060    
prey_speciesSquid                0.2007     0.5879   0.341 0.732777    
prey_speciesStickleback          0.7505     0.3883   1.933 0.053275 .  
prey_speciesTerrestrial Invert   1.3694     0.3850   3.556 0.000376 ***
prey_speciesUnknown              3.5805     0.3797   9.430  < 2e-16 ***
prey_speciesUnknown Invert       1.4604     0.3910   3.735 0.000187 ***
prey_speciesUnknown Species      2.4950     0.3921   6.364 1.97e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
        edf Ref.df Chi.sq p-value    
s(NAO) 18.8  18.98  939.9  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.722   Deviance explained = 83.7%
-REML = 2509.5  Scale est. = 1         n = 252
Code
plot(COTEgam3, shade=T, shade.col = "lightgreen", se=T, residuals=TRUE, pch=1, cex=0.7, shift = coef(COTEgam3)[1])

Unfortunately, it is now becoming more obvious that we may have to remove the 2010 data because it is causing problems in the model expected from an outlier. We will still run it through the gam.check.

Code
gam.check(COTEgam3)#runs a the whole series of model validaton checks for me in just one command. I get a summary of stats tests to check for model fit as well as a QQ plot, a residuals vs linear predictions plot, a response vs fitted values plot, and a histogram of residuals. 


Method: REML   Optimizer: outer newton
full convergence after 7 iterations.
Gradient range [8.640124e-05,8.640124e-05]
(score 2509.469 & scale 1).
Hessian positive definite, eigenvalue range [6.626972,6.626972].
Model rank =  47 / 47 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

         k'  edf k-index p-value
s(NAO) 19.0 18.8    1.11    0.98

Similar issues are cropping up again, so we will try removing the outlier data. We will also try k=15, because the plot for k=20 showed more wiggliness (this is genuinely the term) than desired regardless of the outlier.

Code
COTEnoouts<- ternshunt %>% 
  filter(tern_species.x!="ARTE", tern_species.y!="ARTE", year !="2010")#filtering out the 2010 outlier
COTEgam4<- gam(data=COTEnoouts, number_depredated~s(NAO,k=15) + prey_species, family=poisson, method="REML") 
summary(COTEgam4)

Family: poisson 
Link function: log 

Formula:
number_depredated ~ s(NAO, k = 15) + prey_species

Parametric coefficients:
                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                      1.2962     0.3790   3.420 0.000626 ***
prey_speciesAtlanticsaury       -0.3337     0.4695  -0.711 0.477128    
prey_speciesButterfish           0.9389     0.3938   2.384 0.017115 *  
prey_speciesCapelin              1.8518     0.4427   4.183 2.88e-05 ***
prey_speciesCod                  0.5767     0.4188   1.377 0.168550    
prey_speciesCunner               0.8987     0.4009   2.242 0.024987 *  
prey_speciesEuphausiid           0.3124     0.4191   0.745 0.456055    
prey_speciesHake                 4.0189     0.3794  10.592  < 2e-16 ***
prey_speciesHerring              1.7659     0.3822   4.620 3.84e-06 ***
prey_speciesKillifish           -0.1406     0.5579  -0.252 0.801062    
prey_speciesLumpfish            -0.5299     0.4329  -1.224 0.220929    
prey_speciesMackerel            -0.6946     0.5354  -1.297 0.194577    
prey_speciesMarine Invert        0.7922     0.3875   2.044 0.040916 *  
prey_speciesMottledsculpin       1.3488     0.4394   3.070 0.002143 ** 
prey_speciesMummichug           -0.3652     0.4703  -0.777 0.437403    
prey_speciesNeedlefish          -0.6644     0.5578  -1.191 0.233620    
prey_speciesPollock              1.0194     0.3860   2.641 0.008267 ** 
prey_speciesRedfish             -0.8641     0.4695  -1.841 0.065695 .  
prey_speciesSandlance            2.9660     0.3800   7.805 5.95e-15 ***
prey_speciesSculpin             -0.6467     0.5053  -1.280 0.200637    
prey_speciesSilverside           0.1684     0.4401   0.383 0.702066    
prey_speciesSnipefish            0.4859     0.4078   1.192 0.233390    
prey_speciesSquid                0.1835     0.5877   0.312 0.754818    
prey_speciesStickleback          0.7978     0.3889   2.051 0.040233 *  
prey_speciesTerrestrial Invert   1.4505     0.3852   3.766 0.000166 ***
prey_speciesUnknown              3.5594     0.3796   9.376  < 2e-16 ***
prey_speciesUnknown Invert       1.4646     0.3908   3.747 0.000179 ***
prey_speciesUnknown Species      2.4993     0.3920   6.376 1.81e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
         edf Ref.df Chi.sq p-value    
s(NAO) 13.91     14  651.8  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =   0.71   Deviance explained =   83%
-REML = 2398.6  Scale est. = 1         n = 240
Code
plot(COTEgam4, shade=T, shade.col = "lightgreen", se=T, residuals=TRUE, pch=1, cex=0.7, shift = coef(COTEgam4)[1])

Code
gam.check(COTEgam4)


Method: REML   Optimizer: outer newton
full convergence after 8 iterations.
Gradient range [0.0006942665,0.0006942665]
(score 2398.59 & scale 1).
Hessian positive definite, eigenvalue range [6.105787,6.105787].
Model rank =  42 / 42 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

         k'  edf k-index p-value
s(NAO) 14.0 13.9    1.04    0.80

We are still seeing the same issues with the gam.check (after trying k values ranging from 5 to 20 with 20 being the maximum this data set will allow). We can see that the residuals are quite widely dispersed when plotted against the fitted values, indicating that the variance of the residuals is likely much higher than the mean. A poisson distribution assumes the mean and the variance will be roughly equal, but a negative binomial distribution accounts for a variance larger than the mean.

Code
COTEres1<- resid(COTEgam4)
COTEfit1<- fitted(COTEgam4)
ggplot()+
  geom_point(aes(x=COTEfit1,y=COTEres1))+
  geom_hline(yintercept=0, linetype='dashed', col='blue')+
  theme_bw(22)+ylab("Residuals")+xlab("Fitted values")

With one last visualization plotting residuals against fitted values, we can see there is more variation as fitted values rise above ~40, which falls outside the assumptions of a poisson distribution.

NOTE: I am uncertain if I might be able to say the residuals are fairly evenly dispersed around 0 with some outliers - I think they are, but I also guessed incorrectly when Trevor asked in class if the demo residuals were evenly distributed around 0 (I said no, but they were). We can check this with the nifty “create a dispersion function” trick.

Code
dispersion<-function(model,modeltype='gaussian')
{
  A<-sum(resid(model,type="pearson")^2)
  if(modeltype %in% c("g","p","qp","gaussian","poisson","quasipoisson"))
  {
    B<-length(resid(model))-length(coef(model))
  }
  if(modeltype %in% c("nb","negativebinomial"))
  {
    B<-length(resid(model))-(length(coef(model))+1)
  }
  DISP<<-A/B
  return(DISP)
}
dispersion(COTEgam4, modeltype="poisson")
[1] 20.2757

With a dispersion output of 20.28, we can now confidently say this model is overdispersed and the poisson is not the correct distribution.

GAM with Negative Binomial Distribution

We will first run the GAM without specifying the number of basis functions to see how it looks.

Code
COTEgam5<- gam(data=COTEnoouts, number_depredated~s(NAO) + prey_species, family= nb()) #family = nb() gives me a negative binomial distribution - r will automatically calculate the theta value for me in order to manage the overdispersion and display that value next to the family type in the summary
summary(COTEgam5)

Family: Negative Binomial(1.392) 
Link function: log 

Formula:
number_depredated ~ s(NAO) + prey_species

Parametric coefficients:
                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                     1.22528    0.71065   1.724   0.0847 .  
prey_speciesAtlanticsaury      -0.25760    0.85345  -0.302   0.7628    
prey_speciesButterfish          1.05845    0.77211   1.371   0.1704    
prey_speciesCapelin             1.89837    1.13225   1.677   0.0936 .  
prey_speciesCod                 0.59924    0.82479   0.727   0.4675    
prey_speciesCunner              1.10611    0.80079   1.381   0.1672    
prey_speciesEuphausiid          0.37742    0.81150   0.465   0.6419    
prey_speciesHake                4.09677    0.73450   5.578 2.44e-08 ***
prey_speciesHerring             1.85011    0.73710   2.510   0.0121 *  
prey_speciesKillifish          -0.14822    1.01462  -0.146   0.8839    
prey_speciesLumpfish           -0.40287    0.78797  -0.511   0.6092    
prey_speciesMackerel           -0.59997    0.91163  -0.658   0.5105    
prey_speciesMarine Invert       0.89369    0.74280   1.203   0.2289    
prey_speciesMottledsculpin      1.75811    1.12804   1.559   0.1191    
prey_speciesMummichug          -0.08985    0.87336  -0.103   0.9181    
prey_speciesNeedlefish         -0.52512    0.95249  -0.551   0.5814    
prey_speciesPollock             1.07874    0.74036   1.457   0.1451    
prey_speciesRedfish            -0.75387    0.81947  -0.920   0.3576    
prey_speciesSandlance           3.05664    0.73479   4.160 3.18e-05 ***
prey_speciesSculpin            -0.48533    0.89399  -0.543   0.5872    
prey_speciesSilverside          0.43186    0.85616   0.504   0.6140    
prey_speciesSnipefish           0.57847    0.79525   0.727   0.4670    
prey_speciesSquid               0.27075    1.19522   0.227   0.8208    
prey_speciesStickleback         0.87055    0.74669   1.166   0.2437    
prey_speciesTerrestrial Invert  1.56775    0.75163   2.086   0.0370 *  
prey_speciesUnknown             3.64531    0.73459   4.962 6.96e-07 ***
prey_speciesUnknown Invert      1.53785    0.78646   1.955   0.0505 .  
prey_speciesUnknown Species     2.71490    0.93519   2.903   0.0037 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
         edf Ref.df Chi.sq p-value
s(NAO) 1.001  1.002  2.228   0.136

R-sq.(adj) =  0.639   Deviance explained = 73.8%
-REML = 906.71  Scale est. = 1         n = 240
Code
plot(COTEgam5, shade=T, shade.col = "lightgreen", se=T, residuals=TRUE, pch=1, cex=0.7, shift = coef(COTEgam5)[1])

Code
gam.check(COTEgam5)


Method: REML   Optimizer: outer newton
full convergence after 7 iterations.
Gradient range [-0.0002732086,0.0001715235]
(score 906.7095 & scale 1).
Hessian positive definite, eigenvalue range [0.0002731124,93.50103].
Model rank =  37 / 37 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

       k' edf k-index p-value
s(NAO)  9   1     0.9    0.22

We can see from our GAM summary output that our dispersion is still greater than one (1.392 - this value is given in the family output with a negative binomial distribution), but it is a distinct improvement over a dispersion of 20.28! Additionally, our gam.check outputs look much better. The QQ plot shows a much straighter distribution of residuals that actually follow the theoretical quartiles line, and the histogram of residuals is much more normally shaped. The residuals vs linear predictions is more randomly distributed, though there is still some clustering closer to 0. The response vs fitted values still do not cluster around a 1-1 line; however, I have not been able to fix that with any additional attempts at smoothing or setting other parameters.

This appears to be the best model (at least thus far). It only explains 73.8% of deviance in the data set, but it is more important to be sure the model actually fits than incorrectly explain more of the deviance. Additionally, this model shows the effect of NAO on prey captured is not significant, but the effect of prey species is significant for prey such as hake (p<0.001), herring (p<0.05), and sand lance (p<0.001).

We can look at the predicted values from this model.

Code
predict_response(COTEgam5) %>% plot(show_data=T, jitter=T, show_ci=TRUE, ci_style=c("dash"), color="metro")
$NAO


$prey_species

Code
#this calulates the predicted values and plots them, showing the data. It creates one plot of predicted values for each of the parameters in my model (NAO and prey species). show_ci lets me show the confidence interval around my predicted mean line, and ci_style lets me choose how I want to visualize that (dashed line in this case). The colors argument has highlighted the mean of the predicted values (regression line for NAO plot and point for prey species plot).I am still trying to work out how to rotate the x-axis labels so they're legible... Tried to do this in ggplot2 and it didn't work well so not sure how to rotate the labels here.

While we can see there is a slight negative correlation between NAO and number of prey captured, it is not a significant correlation. The number of prey captured is significantly higher for hake, sand lance, herring, and, as always, prey items that could not be positively identified by observers before being eaten. The significance of these species is unsurprising because they are well recognized as preferred food sources for COTE.

GAM with Negative Binomial Distribution (ARTE)

Now to do the same model for ARTE using the same parameters.

Code
ARTEnoouts<- ternshunt %>% 
  filter(tern_species.x!="COTE", tern_species.y!="COTE", year !="2010")
ARTEgam1<- gam(data=ARTEnoouts, number_depredated~s(NAO) + prey_species, family= nb()) #family = nb() gives me a negative binomial distribution - r will automatically calculate the theta value for me in order to manage the overdispersion and display that value next to the family type in the summary
summary(ARTEgam1)

Family: Negative Binomial(1.182) 
Link function: log 

Formula:
number_depredated ~ s(NAO) + prey_species

Parametric coefficients:
                               Estimate Std. Error z value Pr(>|z|)    
(Intercept)                     1.52585    0.51771   2.947 0.003205 ** 
prey_speciesAtlanticsaury      -0.49867    0.68819  -0.725 0.468694    
prey_speciesButterfish         -0.10833    0.61649  -0.176 0.860514    
prey_speciesCapelin             0.45044    0.87489   0.515 0.606652    
prey_speciesCod                 0.38480    0.68411   0.562 0.573786    
prey_speciesCunner             -1.64569    1.45642  -1.130 0.258493    
prey_speciesEuphausiid          0.60762    0.63819   0.952 0.341041    
prey_speciesHake                4.00730    0.55525   7.217 5.31e-13 ***
prey_speciesHerring             1.57935    0.55888   2.826 0.004715 ** 
prey_speciesLumpfish           -0.80743    0.66353  -1.217 0.223654    
prey_speciesMackerel           -1.17940    0.80447  -1.466 0.142633    
prey_speciesMarine Invert       2.19906    0.56440   3.896 9.77e-05 ***
prey_speciesMottledsculpin      1.43994    1.08619   1.326 0.184945    
prey_speciesMummichug          -1.02619    0.86672  -1.184 0.236417    
prey_speciesNeedlefish         -0.74297    0.83778  -0.887 0.375170    
prey_speciesPipefish           -0.92159    1.27182  -0.725 0.468684    
prey_speciesPollock            -0.04925    0.61205  -0.080 0.935870    
prey_speciesRedfish            -0.57006    0.82237  -0.693 0.488190    
prey_speciesSandlance           2.93442    0.55557   5.282 1.28e-07 ***
prey_speciesSculpin             0.28818    0.88433   0.326 0.744521    
prey_speciesSilverside          0.85609    0.76238   1.123 0.261471    
prey_speciesSnipefish           0.27576    0.64488   0.428 0.668935    
prey_speciesSquid              -0.50250    0.81436  -0.617 0.537201    
prey_speciesStickleback         0.28536    0.57226   0.499 0.618026    
prey_speciesTerrestrial Invert  2.13232    0.59023   3.613 0.000303 ***
prey_speciesUnknown             3.47611    0.55537   6.259 3.87e-10 ***
prey_speciesUnknown Invert      2.11748    0.64601   3.278 0.001046 ** 
prey_speciesUnknown Species     2.29558    0.75276   3.050 0.002292 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
         edf Ref.df Chi.sq p-value
s(NAO) 3.014  3.689  4.876   0.221

R-sq.(adj) =  0.555   Deviance explained = 72.1%
-REML = 889.69  Scale est. = 1         n = 220
Code
plot(ARTEgam1, shade=T, shade.col = "lightgreen", se=T, residuals=TRUE, pch=1, cex=0.7, shift = coef(ARTEgam1)[1])

Code
gam.check(ARTEgam1)


Method: REML   Optimizer: outer newton
full convergence after 3 iterations.
Gradient range [-1.073624e-09,3.051843e-09]
(score 889.6921 & scale 1).
Hessian positive definite, eigenvalue range [0.3383374,95.24936].
Model rank =  37 / 37 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

         k'  edf k-index p-value
s(NAO) 9.00 3.01    0.88    0.26

Our gam.check gives very similar outputs to our COTE data, indicating that our negative binomial GAM is still our best fitting model for ARTE data (as it should be). Hake, herring, sand lance, and invertebrates are all indicated as statistically significant prey species for ARTE. Number of prey captured by ARTE appears stable until NAO reaches ~1, at which point there is a slight negative correlation; however, this relationship is not statistically significant.

Code
predict_response(ARTEgam1) %>% plot(show_data=T, jitter=T, show_ci=TRUE, ci_style=c("dash"), color="reefs")
$NAO


$prey_species

With our predicted values, we can see that the number of ARTE hunts predicted to be successful is greatest when NAO is around 1; we know that this relationship is not significant, but we can now see that the lack of significance is likely due to the large degree of variability, as indicated by the broad 95% confidence intervals.

My apologies for the x-axis labels - I’m still fighting with code on this.

Footnotes

  1. I used pivot_longer to make “prey species” a single variable associated with “number observed,” which is the number of that species observed being eaten by terns in a given observation period.↩︎

  2. I wrote the “long” data set out as a csv file and then read it back in so I may work with it to create summary tables.↩︎

  3. O/hr =Observations per hour, \(x\) = prey observed, and \(y\) = length of observation period in hours↩︎

  4. NAO Index Data provided by the Climate Analysis Section, NCAR, Boulder, USA, Hurrell (2003). Updated regularly. Accessed 15 March 2025.

    Hurrell, James &, Phillips, Adam & National Center for Atmospheric Research Staff (Eds). Last modified 2024-04-18 “The Climate Data Guide: Hurrell North Atlantic Oscillation (NAO) Index (PC-based).” Retrieved from https://climatedataguide.ucar.edu/climate-data/hurrell-north-atlantic-oscillation-nao-index-pc-based on 2025-03-10.↩︎