The end goal of working with Tern diet data collected by Canadian Wildlife Service (CWS) on Country Island, NS yearly from 1998 to 2019 is to establish if there is any correlation between diet shifts of Common Terns (COTE) and Arctic Terns (ARTE) and local climate/weather events. Roseate Terns (ROST) are included in the data set but will not be included in the final analysis as they are a special case being studied in detail by another graduate student.
Code
library(tidyverse)library(readr)library(rmarkdown)library(dplyr)library(tinytex)library(kableExtra)library(ggmap)library(ggspatial)library(sf)library(leaflet)library(mapview)library(htmlwidgets)library(ggplot2) #loading all libraires I needlibrary(magick)library(png)library(grid)library(ggeffects)library(modelr)library(mgcv)library(broom)library(jtools)library(ggstance)library(MASS)library(pscl)ternpair <-image_read('ternpair.jpg')#reads in image I have saved into my directoryimage_write(ternpair, path ='ternpair.png', format ="png") #saves the image as a png file, which is better to work withternnest<-image_read("ternnest.jpg")image_write(ternnest, path="ternnest.png", format ="png")#these lines are same as aboveternpairpng <-image_read("ternpair.png")|>#puts image in an object to work withimage_rotate(90) |>#rotates image 90 degrees so it's properly orientedimage_scale("100")#resized the image proportionally so width is 100 pixelsimage_write(ternpairpng, path ="matedARTE.png")#saves the image ternnestpng <-image_read("ternnest.png")|>image_scale("300")image_write(ternnestpng, path="COTEnest.png") #all same as above for mated ARTE
(a) Mated Arctic Terns
(b) Common Tern with Chick
Arctic and Common Terns at Country Island Breeding Colony
Figure 1
The Data
This data has been collected, stored, and provided by CWS and the Country Island seabird breeding monitoring program.
Code
myterns <-read.csv("tern_feeding_rates.csv") #reading in my data set and placing it in an object called "myterns"
Data Wrangling
Reformatting the Data Set to a Long Format
The data set initially had a separate variable column for each prey species, which can make data wrangling more difficult or tedious, particularly if I want to know the sum, mean, median, etc of different prey species taken. In order to more easily manipulate data for analysis, I have rearranged the data set.1
Code
#This is thanks in large part to Paige, who showed me howmyternslong <- myterns %>%#creating a new object that will hold the pivoted data set# pivot data into long formatpivot_longer(cols =starts_with("num_"), # all species columns start the samenames_to ="prey_species", # name the new columnnames_prefix ="num_", # remove that prefix from all valuesvalues_to ="num_observed", # name the column ind. countvalues_drop_na =TRUE) %>%# remove NAsfilter(num_observed !=0) %>%# remove values of zeros# for a bar chart you need 1 row for each observationmap_df(., rep, .$num_observed) %>%# repeats each row the n times in observed columnmutate(num_observed =1, # changes the values all to 1 prey_species =str_replace_all(prey_species, pattern ="_", replacement =" ")) %>%# HOW TO CHANGE SPECIES INTO 1 CATEGORY# you need to have the last line with the TRUE statement, but if you want to make# any other changes, structure it exactly like the first prey_species %in% linemutate(prey_species =case_when(prey_species %in%c("ter inv", "unknown", "unknown fish", "unknown inv", "unknown sp") ~"unknown", prey_species =="atl saury"~"atlantic saury", prey_species =="amphipod sp"~"amphipods", prey_species =="mar inv"~"marine invertebrates",TRUE~ prey_species),prey_species =str_to_title(prey_species))
Saving the Reformatted Data Set as a .csv File
It is good practice to save your rearranged and cleaned data as a new .csv file that can be shared with others or used for further analysis.2
Code
write.csv(x= myternslong, file ="tern_feeding_longer.csv") #saving the pivoted data set as a .csv file so it can be used by others as well as for my own further analysislongtern <-read.csv("tern_feeding_longer.csv") #reading the pivoted data set in so it can be used for summary tables
Summary Tables
Total Prey Capture by Each Tern Species
While the feeding ecology between COTE and ARTE is similar, there are some differences that may result in greater foraging success by one species over the other at Country Island, NS.
Code
terntable1 <- longtern %>%#make a new object for data wrangling to maintain integrity of original data setfilter(tern_species !="ROST", year !="2023") %>%#filter out unwanted observations - anything related to ROST and year 2023group_by(tern_species, year, prey_species) %>%#group the data first by the species of tern, then by the year, then by the prey speciessummarise(number_depredated =sum(num_observed, na.rm = T)) %>%#make a summary to turn into a table that will show the the total number of each prey species observed being eaten by each tern species each yearungroup() #this is important so that R doesn't think you're still working with the above groupings for future wranglingterntable1 %>%kbl(caption ="Total Prey of Arctic and Common Terns by Year") %>%#kbl makes the table, caption gives the title of the tablekable_classic(full_width = F, html_font ="Cambria") #gives the aesthetic of the table and specifies the font
Total Prey of Arctic and Common Terns by Year
tern_species
year
prey_species
number_depredated
ARTE
1998
Butterfish
1
ARTE
1998
Hake
358
ARTE
1998
Lumpfish
1
ARTE
1998
Marine Invert
3
ARTE
1998
Mottledsculpin
18
ARTE
1998
Mummichug
2
ARTE
1998
Needlefish
2
ARTE
1998
Sandlance
24
ARTE
1998
Stickleback
6
ARTE
1998
Unknown
468
ARTE
1999
Hake
331
ARTE
1999
Herring
22
ARTE
1999
Marine Invert
2
ARTE
1999
Sandlance
490
ARTE
1999
Stickleback
3
ARTE
1999
Unknown
143
ARTE
2000
Hake
300
ARTE
2000
Herring
27
ARTE
2000
Marine Invert
44
ARTE
2000
Pipefish
2
ARTE
2000
Pollock
7
ARTE
2000
Sandlance
77
ARTE
2000
Stickleback
7
ARTE
2000
Unknown
221
ARTE
2001
Hake
84
ARTE
2001
Herring
44
ARTE
2001
Sandlance
79
ARTE
2001
Unknown
85
ARTE
2002
Hake
95
ARTE
2002
Herring
17
ARTE
2002
Marine Invert
6
ARTE
2002
Sandlance
58
ARTE
2002
Snipefish
1
ARTE
2002
Terrestrial Invert
1
ARTE
2002
Unknown
114
ARTE
2003
Hake
194
ARTE
2003
Herring
28
ARTE
2003
Pollock
2
ARTE
2003
Sandlance
19
ARTE
2003
Stickleback
1
ARTE
2003
Unknown
46
ARTE
2004
Butterfish
1
ARTE
2004
Cod
3
ARTE
2004
Hake
141
ARTE
2004
Herring
12
ARTE
2004
Lumpfish
1
ARTE
2004
Marine Invert
81
ARTE
2004
Pollock
8
ARTE
2004
Redfish
2
ARTE
2004
Sandlance
83
ARTE
2004
Stickleback
3
ARTE
2004
Unknown
55
ARTE
2005
Hake
208
ARTE
2005
Herring
2
ARTE
2005
Marine Invert
21
ARTE
2005
Sandlance
18
ARTE
2005
Stickleback
8
ARTE
2005
Terrestrial Invert
1
ARTE
2005
Unknown
115
ARTE
2006
Butterfish
1
ARTE
2006
Cod
4
ARTE
2006
Hake
63
ARTE
2006
Herring
1
ARTE
2006
Sandlance
36
ARTE
2006
Unknown
71
ARTE
2007
Butterfish
1
ARTE
2007
Cod
10
ARTE
2007
Hake
277
ARTE
2007
Herring
21
ARTE
2007
Lumpfish
5
ARTE
2007
Marine Invert
3
ARTE
2007
Pollock
7
ARTE
2007
Sandlance
212
ARTE
2007
Stickleback
6
ARTE
2007
Unknown
134
ARTE
2008
Cod
14
ARTE
2008
Hake
354
ARTE
2008
Herring
102
ARTE
2008
Lumpfish
2
ARTE
2008
Marine Invert
51
ARTE
2008
Sandlance
247
ARTE
2008
Stickleback
5
ARTE
2008
Terrestrial Invert
64
ARTE
2008
Unknown
189
ARTE
2009
Butterfish
16
ARTE
2009
Cod
2
ARTE
2009
Hake
236
ARTE
2009
Herring
44
ARTE
2009
Lumpfish
1
ARTE
2009
Marine Invert
30
ARTE
2009
Needlefish
1
ARTE
2009
Pollock
7
ARTE
2009
Sandlance
95
ARTE
2009
Stickleback
1
ARTE
2009
Terrestrial Invert
3
ARTE
2009
Unknown
102
ARTE
2010
Butterfish
4
ARTE
2010
Cod
1
ARTE
2010
Hake
100
ARTE
2010
Herring
96
ARTE
2010
Lumpfish
1
ARTE
2010
Needlefish
5
ARTE
2010
Pollock
2
ARTE
2010
Sandlance
44
ARTE
2010
Snipefish
7
ARTE
2010
Stickleback
3
ARTE
2010
Terrestrial Invert
5
ARTE
2010
Unknown
181
ARTE
2011
Butterfish
4
ARTE
2011
Capelin
15
ARTE
2011
Hake
194
ARTE
2011
Herring
15
ARTE
2011
Mackerel
1
ARTE
2011
Marine Invert
23
ARTE
2011
Needlefish
4
ARTE
2011
Sandlance
133
ARTE
2011
Silverside
21
ARTE
2011
Snipefish
4
ARTE
2011
Stickleback
14
ARTE
2011
Terrestrial Invert
39
ARTE
2011
Unknown
397
ARTE
2011
Unknown Invert
85
ARTE
2011
Unknown Species
41
ARTE
2012
Butterfish
14
ARTE
2012
Hake
397
ARTE
2012
Herring
14
ARTE
2012
Marine Invert
82
ARTE
2012
Pollock
1
ARTE
2012
Sandlance
4
ARTE
2012
Snipefish
16
ARTE
2012
Stickleback
6
ARTE
2012
Terrestrial Invert
4
ARTE
2012
Unknown
141
ARTE
2012
Unknown Species
100
ARTE
2013
Amphipod
4
ARTE
2013
Atlanticsaury
5
ARTE
2013
Butterfish
1
ARTE
2013
Capelin
1
ARTE
2013
Euphausiid
1
ARTE
2013
Hake
250
ARTE
2013
Herring
14
ARTE
2013
Marine Invert
2
ARTE
2013
Pollock
6
ARTE
2013
Sandlance
16
ARTE
2013
Sculpin
1
ARTE
2013
Silverside
4
ARTE
2013
Stickleback
3
ARTE
2013
Terrestrial Invert
2
ARTE
2013
Unknown
141
ARTE
2014
Atlanticsaury
2
ARTE
2014
Euphausiid
3
ARTE
2014
Hake
232
ARTE
2014
Herring
5
ARTE
2014
Mackerel
3
ARTE
2014
Marine Invert
2
ARTE
2014
Mummichug
2
ARTE
2014
Pollock
1
ARTE
2014
Redfish
4
ARTE
2014
Sandlance
20
ARTE
2014
Silverside
10
ARTE
2014
Snipefish
1
ARTE
2014
Unknown
71
ARTE
2015
Atlanticsaury
1
ARTE
2015
Butterfish
1
ARTE
2015
Euphausiid
10
ARTE
2015
Hake
162
ARTE
2015
Herring
22
ARTE
2015
Lumpfish
1
ARTE
2015
Sandlance
16
ARTE
2015
Snipefish
2
ARTE
2015
Stickleback
2
ARTE
2015
Unknown
114
ARTE
2015
Unknown Invert
4
ARTE
2016
Amphipod
2
ARTE
2016
Atlanticsaury
5
ARTE
2016
Euphausiid
1
ARTE
2016
Hake
466
ARTE
2016
Herring
6
ARTE
2016
Lumpfish
1
ARTE
2016
Marine Invert
240
ARTE
2016
Sandlance
39
ARTE
2016
Snipefish
15
ARTE
2016
Squid
1
ARTE
2016
Stickleback
1
ARTE
2016
Terrestrial Invert
1
ARTE
2016
Unknown
54
ARTE
2016
Unknown Invert
28
ARTE
2017
Amphipod
11
ARTE
2017
Butterfish
2
ARTE
2017
Euphausiid
18
ARTE
2017
Hake
376
ARTE
2017
Herring
7
ARTE
2017
Mackerel
1
ARTE
2017
Marine Invert
50
ARTE
2017
Sandlance
63
ARTE
2017
Stickleback
2
ARTE
2017
Terrestrial Invert
5
ARTE
2017
Unknown
138
ARTE
2017
Unknown Invert
17
ARTE
2018
Amphipod
1
ARTE
2018
Atlanticsaury
1
ARTE
2018
Cunner
1
ARTE
2018
Euphausiid
1
ARTE
2018
Hake
382
ARTE
2018
Herring
28
ARTE
2018
Marine Invert
62
ARTE
2018
Pollock
3
ARTE
2018
Sandlance
36
ARTE
2018
Squid
1
ARTE
2018
Stickleback
10
ARTE
2018
Terrestrial Invert
124
ARTE
2018
Unknown
147
ARTE
2018
Unknown Invert
81
ARTE
2019
Atlanticsaury
2
ARTE
2019
Euphausiid
21
ARTE
2019
Hake
212
ARTE
2019
Herring
9
ARTE
2019
Lumpfish
4
ARTE
2019
Mackerel
1
ARTE
2019
Marine Invert
13
ARTE
2019
Mummichug
1
ARTE
2019
Pollock
3
ARTE
2019
Redfish
2
ARTE
2019
Sandlance
98
ARTE
2019
Sculpin
12
ARTE
2019
Snipefish
3
ARTE
2019
Squid
7
ARTE
2019
Stickleback
30
ARTE
2019
Terrestrial Invert
214
ARTE
2019
Unknown
171
ARTE
2019
Unknown Invert
33
ARTE
2019
Unknown Species
3
COTE
1998
Hake
234
COTE
1998
Lumpfish
2
COTE
1998
Marine Invert
1
COTE
1998
Mottledsculpin
21
COTE
1998
Mummichug
6
COTE
1998
Needlefish
2
COTE
1998
Sandlance
40
COTE
1998
Stickleback
6
COTE
1998
Unknown
347
COTE
1999
Hake
61
COTE
1999
Herring
4
COTE
1999
Sandlance
190
COTE
1999
Unknown
58
COTE
2000
Butterfish
2
COTE
2000
Hake
175
COTE
2000
Herring
19
COTE
2000
Marine Invert
3
COTE
2000
Pollock
6
COTE
2000
Sandlance
45
COTE
2000
Stickleback
7
COTE
2000
Unknown
112
COTE
2001
Hake
34
COTE
2001
Herring
12
COTE
2001
Pollock
1
COTE
2001
Sandlance
55
COTE
2001
Stickleback
1
COTE
2001
Unknown
31
COTE
2002
Butterfish
3
COTE
2002
Hake
88
COTE
2002
Herring
14
COTE
2002
Marine Invert
1
COTE
2002
Pollock
12
COTE
2002
Sandlance
65
COTE
2002
Unknown
99
COTE
2003
Hake
314
COTE
2003
Herring
42
COTE
2003
Marine Invert
1
COTE
2003
Pollock
40
COTE
2003
Sandlance
65
COTE
2003
Stickleback
8
COTE
2003
Unknown
95
COTE
2004
Butterfish
4
COTE
2004
Cod
1
COTE
2004
Hake
98
COTE
2004
Herring
31
COTE
2004
Killifish
5
COTE
2004
Marine Invert
27
COTE
2004
Pollock
21
COTE
2004
Sandlance
160
COTE
2004
Stickleback
3
COTE
2004
Unknown
99
COTE
2005
Butterfish
1
COTE
2005
Hake
264
COTE
2005
Herring
12
COTE
2005
Lumpfish
1
COTE
2005
Marine Invert
9
COTE
2005
Pollock
7
COTE
2005
Sandlance
44
COTE
2005
Sculpin
2
COTE
2005
Stickleback
11
COTE
2005
Terrestrial Invert
1
COTE
2005
Unknown
200
COTE
2006
Cod
4
COTE
2006
Hake
129
COTE
2006
Herring
2
COTE
2006
Killifish
1
COTE
2006
Pollock
1
COTE
2006
Sandlance
90
COTE
2006
Stickleback
3
COTE
2006
Terrestrial Invert
1
COTE
2006
Unknown
108
COTE
2007
Cod
7
COTE
2007
Hake
215
COTE
2007
Herring
25
COTE
2007
Lumpfish
2
COTE
2007
Marine Invert
1
COTE
2007
Pollock
25
COTE
2007
Sandlance
275
COTE
2007
Stickleback
7
COTE
2007
Terrestrial Invert
1
COTE
2007
Unknown
102
COTE
2008
Cod
11
COTE
2008
Hake
226
COTE
2008
Herring
65
COTE
2008
Lumpfish
1
COTE
2008
Marine Invert
5
COTE
2008
Pollock
9
COTE
2008
Sandlance
98
COTE
2008
Stickleback
10
COTE
2008
Terrestrial Invert
91
COTE
2008
Unknown
122
COTE
2009
Butterfish
24
COTE
2009
Cod
9
COTE
2009
Hake
137
COTE
2009
Herring
12
COTE
2009
Lumpfish
4
COTE
2009
Marine Invert
20
COTE
2009
Needlefish
1
COTE
2009
Pollock
27
COTE
2009
Redfish
1
COTE
2009
Sandlance
76
COTE
2009
Stickleback
3
COTE
2009
Terrestrial Invert
1
COTE
2009
Unknown
67
COTE
2010
Butterfish
8
COTE
2010
Cod
4
COTE
2010
Hake
217
COTE
2010
Herring
123
COTE
2010
Lumpfish
3
COTE
2010
Needlefish
24
COTE
2010
Pollock
20
COTE
2010
Sandlance
98
COTE
2010
Sculpin
1
COTE
2010
Stickleback
11
COTE
2010
Terrestrial Invert
12
COTE
2010
Unknown
288
COTE
2011
Capelin
20
COTE
2011
Cunner
1
COTE
2011
Hake
113
COTE
2011
Herring
23
COTE
2011
Mackerel
1
COTE
2011
Marine Invert
2
COTE
2011
Mummichug
2
COTE
2011
Needlefish
3
COTE
2011
Pollock
4
COTE
2011
Redfish
2
COTE
2011
Sandlance
51
COTE
2011
Sculpin
1
COTE
2011
Silverside
5
COTE
2011
Snipefish
2
COTE
2011
Stickleback
11
COTE
2011
Terrestrial Invert
12
COTE
2011
Unknown
224
COTE
2011
Unknown Invert
14
COTE
2011
Unknown Species
6
COTE
2012
Butterfish
26
COTE
2012
Cunner
2
COTE
2012
Hake
454
COTE
2012
Herring
26
COTE
2012
Marine Invert
41
COTE
2012
Mummichug
2
COTE
2012
Pollock
4
COTE
2012
Redfish
1
COTE
2012
Sandlance
10
COTE
2012
Sculpin
1
COTE
2012
Silverside
1
COTE
2012
Snipefish
24
COTE
2012
Stickleback
4
COTE
2012
Terrestrial Invert
9
COTE
2012
Unknown
184
COTE
2012
Unknown Species
105
COTE
2013
Amphipod
6
COTE
2013
Atlanticsaury
7
COTE
2013
Butterfish
1
COTE
2013
Euphausiid
1
COTE
2013
Hake
320
COTE
2013
Herring
34
COTE
2013
Marine Invert
11
COTE
2013
Pollock
6
COTE
2013
Redfish
1
COTE
2013
Sandlance
28
COTE
2013
Silverside
1
COTE
2013
Snipefish
2
COTE
2013
Stickleback
4
COTE
2013
Terrestrial Invert
6
COTE
2013
Unknown
131
COTE
2013
Unknown Invert
1
COTE
2014
Euphausiid
4
COTE
2014
Hake
242
COTE
2014
Herring
7
COTE
2014
Lumpfish
5
COTE
2014
Mackerel
4
COTE
2014
Marine Invert
3
COTE
2014
Pollock
2
COTE
2014
Redfish
3
COTE
2014
Sandlance
11
COTE
2014
Silverside
13
COTE
2014
Snipefish
1
COTE
2014
Terrestrial Invert
1
COTE
2014
Unknown
133
COTE
2015
Atlanticsaury
1
COTE
2015
Butterfish
21
COTE
2015
Hake
128
COTE
2015
Herring
38
COTE
2015
Lumpfish
1
COTE
2015
Marine Invert
1
COTE
2015
Pollock
4
COTE
2015
Sandlance
27
COTE
2015
Snipefish
2
COTE
2015
Unknown
164
COTE
2015
Unknown Invert
1
COTE
2016
Amphipod
1
COTE
2016
Atlanticsaury
3
COTE
2016
Cunner
8
COTE
2016
Euphausiid
1
COTE
2016
Hake
252
COTE
2016
Herring
2
COTE
2016
Mackerel
1
COTE
2016
Marine Invert
9
COTE
2016
Pollock
2
COTE
2016
Redfish
1
COTE
2016
Sandlance
52
COTE
2016
Snipefish
5
COTE
2016
Terrestrial Invert
4
COTE
2016
Unknown
54
COTE
2016
Unknown Invert
7
COTE
2017
Butterfish
6
COTE
2017
Cunner
4
COTE
2017
Euphausiid
6
COTE
2017
Hake
407
COTE
2017
Herring
14
COTE
2017
Lumpfish
1
COTE
2017
Marine Invert
2
COTE
2017
Pollock
3
COTE
2017
Redfish
1
COTE
2017
Sandlance
34
COTE
2017
Stickleback
3
COTE
2017
Terrestrial Invert
14
COTE
2017
Unknown
132
COTE
2017
Unknown Invert
37
COTE
2018
Atlanticsaury
1
COTE
2018
Cunner
40
COTE
2018
Euphausiid
1
COTE
2018
Hake
226
COTE
2018
Herring
38
COTE
2018
Lumpfish
2
COTE
2018
Mackerel
1
COTE
2018
Marine Invert
12
COTE
2018
Pollock
13
COTE
2018
Sandlance
35
COTE
2018
Stickleback
11
COTE
2018
Terrestrial Invert
49
COTE
2018
Unknown
96
COTE
2018
Unknown Invert
25
COTE
2019
Atlanticsaury
1
COTE
2019
Cunner
4
COTE
2019
Euphausiid
18
COTE
2019
Hake
202
COTE
2019
Herring
2
COTE
2019
Lumpfish
4
COTE
2019
Marine Invert
6
COTE
2019
Mummichug
3
COTE
2019
Pollock
3
COTE
2019
Redfish
3
COTE
2019
Sandlance
56
COTE
2019
Sculpin
5
COTE
2019
Snipefish
8
COTE
2019
Squid
5
COTE
2019
Stickleback
42
COTE
2019
Terrestrial Invert
21
COTE
2019
Unknown
170
COTE
2019
Unknown Invert
23
Hake is a preferred prey item for both ARTE and COTE; however, ARTE show a clear and consistent preference for hake every year while COTE demonstrate a willingness to shift focus to similar species such as herring or sand lance, depending on the year. Reasons for such dietary shifts are currently unclear, but further understanding should be achieved when these dietary patterns are modeled with regional climate data from relevant years.
Visualization is always helpful.
Code
preyspplot<-ggplot(data = terntable1, mapping =aes(x=year, y = number_depredated,group= prey_species, col=prey_species))+#says which data to use, what x and y axis will be, and to color-code data according to prey speciesgeom_point()+geom_line()+labs(x="Year", y="Prey per year")+facet_wrap(~tern_species)+theme_classic()#all same as previous plotspreyspplot
Observation Effort for Each Tern Species
While climate and other environmental factors undoubtedly influence any differences in prey capture between years, it is crucial to account for human factors that influence data collection so as to draw accurate conclusions about diet shifts across time. The most important human factor to be accounted for in this study is observation effort. Effort must be quantified to establish if higher numbers of captured prey are truly reflective of what took place in the tern breeding colony or if they are reflective of greater observation effort.
One simple way to easily establish if differences in total prey capture across years is reflective of environmental factors rather than human factors is to compare the number of total prey observations per hour for each year. This can be calculated for each year using the following formula:3
terntable2<- longtern %>%filter(tern_species !="ROST", year !="2023") %>%group_by(tern_species, year) %>%#group by tern species and then by yearsummarise(prey_per_hour =sum(total_prey, na.rm = T)/sum(time_hr, na.rm = T)) %>%#take the sum of total prey -all prey observed regarless of species- and divide it by the sum of all hours of observation. na.rm=T removes and "NAs"ungroup()terntable2 %>%kbl(caption ="Total Prey Capture by ARTE and COTE per Hour For Each Year") %>%#kbl makes the table, caption gives the title of the tablekable_classic(full_width = F, html_font ="Cambria")
Total Prey Capture by ARTE and COTE per Hour For Each Year
tern_species
year
prey_per_hour
ARTE
1998
3.535083
ARTE
1999
3.627144
ARTE
2000
3.035268
ARTE
2001
2.934931
ARTE
2002
2.715753
ARTE
2003
1.938567
ARTE
2004
2.430052
ARTE
2005
1.745283
ARTE
2006
1.810888
ARTE
2007
1.552172
ARTE
2008
2.656043
ARTE
2009
2.351301
ARTE
2010
2.394350
ARTE
2011
3.472732
ARTE
2012
2.806804
ARTE
2013
2.626386
ARTE
2014
1.830509
ARTE
2015
1.800600
ARTE
2016
6.196292
ARTE
2017
3.423790
ARTE
2018
4.957818
ARTE
2019
5.197019
COTE
1998
2.953718
COTE
1999
3.544728
COTE
2000
2.071816
COTE
2001
4.092024
COTE
2002
2.007092
COTE
2003
1.780576
COTE
2004
1.399775
COTE
2005
1.855072
COTE
2006
2.521418
COTE
2007
1.384294
COTE
2008
3.100313
COTE
2009
1.808901
COTE
2010
1.971840
COTE
2011
1.593694
COTE
2012
2.623602
COTE
2013
2.202691
COTE
2014
1.961539
COTE
2015
2.126779
COTE
2016
1.632183
COTE
2017
2.559564
COTE
2018
1.970596
COTE
2019
2.832028
When examining the number of successful prey captures observed per hour, differences in feeding rates across years and between ARTE and COTE are much easier to note. For example, ARTE had distinctly higher feeding rates in 2016, 6.20 prey items per hour, than did COTE in the same year, 1.97 prey items per hour. The wide range of prey items per hour across years is also notable given that some years show much higher success rates than others for both species, though years with higher catch rates do not always line up between the two species. Further examination not only of climate factors, but also of the feeding ecologies of ARTE and COTE will be required to gain insight into the driving factors of these shifts in foraging success.
It may be helpful to visualize this:
Code
prettytern<-image_read(path="prettyternpng.png")plottern<-rasterGrob(prettytern, interpolate=T)obsefplot<-ggplot(data = terntable2, mapping =aes(x=year, y = prey_per_hour,group= tern_species, col=tern_species))+#says which data to use, what x and y axis will be, and to color-code data according to tern speciesgeom_point()+geom_line()+labs(x="Year", y="Prey per Hour")+annotation_custom(plottern, xmin=2005, xmax=2010, ymin=4, ymax=6)+theme_classic()#all same as previous plotsobsefplot
Preliminary GAM
To look at the affects of climate on tern diet and foraging success, I can start by looking at the feeding rates (prey/hour/year) calculated in the Summary Tables section and compare that to North Atlantic Oscillation (NAO) data4.
Testing the code
I first need to merge my tern data with yearly NAO data to make a single data set on which I can run my General Additive Model, which I have chosen because I do not expect my data to be normally distributed.
Code
library(mgcv) #load library for GAMannualnao<-read.csv(file="yearly_nao_pcbased.csv")#read in NAO dataternclimate<-merge(x = terntable2, y = annualnao, by ="year", all =TRUE)#merge the tern data and the NAO data into a single data set in the object ternclimate - both data sets share a year column and annualnao will merge left into terntable2 aligned by year for all data points that share yearstr(ternclimate)#check to make sure the merge worked properly
gamtester<-gam(data=ternclimate, prey_per_hour~s(year)+s(NAO), method="REML")#tells the gam argument to use ternclimate, that I want to know the non-linear effects of year and NAO on number of prey caugh per hour - REML is Restricted Maximum Likelihood Method, and is recommended because it tends to be more reliable and stable for newbies like me.summary(gamtester) #gives a summary of gam info
plot(gamtester, shade=T, shade.col ="lightgreen", se=T, shift =coef(gamtester)[1], pages =1) #plots the gam, shade=T shades in the standard error, shade.col= allows me to pick the color, se=T says I want to see the standard error, shift=coef(gamtester)[1] lets me shift the y-axis to include the intercept so the y-axis is no longer relative - aka the number of prey captured per hour of observation around 2007 was actually close to 2
This was a good first test to make sure I can get the code to work, but I also need information about the prey types and how that has changed through the years. To keep my data wrangling intact and make a singular data set to work with, I will merge “ternclimate” with “terntable1.”
Additionally, I want a quick visualization of NAO through the years, as I suspect that NAO has a much greater effect than year does on prey captured, and it would be nice to see the actual data points plotted to help me understand if my models make sense.
Code
naoyearplot<-ggplot(data = ternclimate, mapping =aes(x=year, y = NAO,))+geom_point()+geom_line()+labs(x="Year", y="NAO")+theme_classic()#all same as previous plotsnaoyearplot
The NAO in 2010 is much lower than in other years, and there is a large gap between that data point and the rest of the data. This will likely leave a large gap between between residuals/fitted values when NAO is ~-3 and the rest of the data because there are no years with NAO values between -3 and -0.6. However, I will leave this and associated data in the model for now because it can still provide useful information on the relationship between NAO and prey capture by terns at Country Island. I am hopeful that it will not skew the model.
Now I’ll merge terntable1 with ternclimate and check the structure of the new data set (ternshunt) to make sure the merge worked.
Code
ternshunt<-merge(x = ternclimate, y = terntable1, by ="year", all =TRUE)str(ternshunt) #checking to make sure the merge worked - it did
I now have two tern species variables, one from each data set, but this won’t be a problem as they are the same so it won’t affect my modeling.
GAM with Default (Gaussian) Distribution
Now I can run some GAMS to see how NAO may affect the number of each prey species caught each year. I will split this up by tern species.
Code
COTEgam1<- ternshunt %>%filter(tern_species.x!="ARTE", tern_species.y!="ARTE") #making an object with data for only COTE COTEgam1a<-gam(data=COTEgam1, number_depredated~s(NAO) +s(year)+ prey_species, method="REML")summary(COTEgam1a) #note that the summary will show the "intercept" as the first factor in alphabetical order - in this case, Amphipods in prey species are top in alphabet so they are listed as "intercept"
plot(COTEgam1a, shade=T, shade.col ="lightgreen", se=T, residuals=TRUE, pch=1, cex=0.7, shift =coef(COTEgam1a)[1]) #plots the gam, shade=T shades in the standard error, shade.col= allows me to pick the color, se=T says I want to see the standard error, shift=coef(gamtester)[1] lets me shift the y-axis to include the intercept so the y-axis is no longer relative but shows the actual predicted number of prey captured at specific NOA values, my residuals are shown as circles (set by pch=1 with their size set by cex=0.7)
We can see from the printed output as well as the plot that year does not have a significant effect on number of prey caught each year in the summer months, but NOA does (p<0.05). However, our residual sum of squares is HUGE (Scale est. = 2106.6), and we are predicting values below zero (it is not possible to catch less than 0 fish), so our Gaussian distribution is incorrect. Instead, we can try a distribution method that does not allow for negative values, like a poisson distribution. It’s common in the literature to use poisson distributions to handle count data, especially for these sorts of studies that involve observation watches to gather information on prey type and number of prey caught.
GAM with a Poisson Distribution
Code
COTEgam2<-gam(data=COTEgam1, number_depredated~s(NAO) + prey_species, family=poisson, method="REML")summary(COTEgam2) #note that the summary will show the "intercept" as the first factor in alphabetical order - in this case, Amphipods in prey species are top in alphabet so they are listed as "intercept"
plot(COTEgam2, shade=T, shade.col ="lightgreen", se=T, residuals=TRUE, pch=1, cex=0.7, shift =coef(COTEgam2)[1])#you will still see negative residuals here because you haven't taken the antilog, but when we plot predicted values, there should be none below zero
We can see from the summary that this model explains roughly 80% of the effect that NAO and prey species has on number of prey caught per year. We can also see that the residual sum of squares in this model is 1 (Scale est. =1), which is much better than the model using the Gaussian distribution where the residual sum of squares was 2106.6. It should be noted that all the predicted values are positive for this model, so it does make biological sense.
Code
predict_response(COTEgam2) %>%plot(show_data=T, jitter=T, show_ci=TRUE, ci_style=c("dash"), colors="metro")#predicts response variables according to the model and then I can plot them with ggeffects plot() function, choosing to show the data, jitter the data so it's easier to see, show confidence intervals as dashed lines, and use a colour scheme
$NAO
$prey_species
I still need to run a check of the model to make sure it is the best one. Thankfully, R has gam.check() for that.
Code
gam.check(COTEgam2)##runs a the whole series of model validaton checks for me in just one command. I get a summary of stats tests to check for model fit as well as a QQ plot, a residuals vs linear predictions plot, a response vs fitted values plot, and a histogram of residuals.
Method: REML Optimizer: outer newton
full convergence after 4 iterations.
Gradient range [1.692766e-05,1.692766e-05]
(score 2837.177 & scale 1).
Hessian positive definite, eigenvalue range [0.7326335,0.7326335].
Model rank = 37 / 37
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(NAO) 9.00 6.19 0.98 0.4
We can see from looking at the outputs that there are a few problems with the current model. Ideally, our deviance of residuals will be a roughly straight line following the red regression line along the theoretical quantiles. Our histogram of residuals, while not terrible, does show a light left skew. In our residuals vs linear predictors plot, our residuals do not look randomly distributed left of the predictor value of 3, and our response vs fitted values plot is more widely distributed than ideal given we want our points to cluster around a 1-1 line to indicate a well-fitted model.
Our summary output shows that our k-index (a measure of the basis functions used to fit the model) is less than one, which indicates that the default number of basis functions in the model may be too low to properly capture the relationships. We can fix this by specifying the number of basis functions in the gam formula. Our current output indicates that k, our basis dimension, is 9. We will try increasing that to 20 to see if this fixes the issues with over-smoothing in our model (forcing the data into being more linear than is true).
Code
#Run the same gam as before with addition of specifying the basis dimensionsCOTEgam3<-gam(data=COTEgam1, number_depredated~s(NAO,k=20) + prey_species, family=poisson, method="REML") #k=20 allows me to specify basis dimensions to fix over-smoothing in the modelsummary(COTEgam3)
Unfortunately, it is now becoming more obvious that we may have to remove the 2010 data because it is causing problems in the model expected from an outlier. We will still run it through the gam.check.
Code
gam.check(COTEgam3)#runs a the whole series of model validaton checks for me in just one command. I get a summary of stats tests to check for model fit as well as a QQ plot, a residuals vs linear predictions plot, a response vs fitted values plot, and a histogram of residuals.
Method: REML Optimizer: outer newton
full convergence after 7 iterations.
Gradient range [8.640124e-05,8.640124e-05]
(score 2509.469 & scale 1).
Hessian positive definite, eigenvalue range [6.626972,6.626972].
Model rank = 47 / 47
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(NAO) 19.0 18.8 1.11 0.98
Similar issues are cropping up again, so we will try removing the outlier data. We will also try k=15, because the plot for k=20 showed more wiggliness (this is genuinely the term) than desired regardless of the outlier.
Code
COTEnoouts<- ternshunt %>%filter(tern_species.x!="ARTE", tern_species.y!="ARTE", year !="2010")#filtering out the 2010 outlierCOTEgam4<-gam(data=COTEnoouts, number_depredated~s(NAO,k=15) + prey_species, family=poisson, method="REML") summary(COTEgam4)
Method: REML Optimizer: outer newton
full convergence after 8 iterations.
Gradient range [0.0006942665,0.0006942665]
(score 2398.59 & scale 1).
Hessian positive definite, eigenvalue range [6.105787,6.105787].
Model rank = 42 / 42
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(NAO) 14.0 13.9 1.04 0.80
We are still seeing the same issues with the gam.check (after trying k values ranging from 5 to 20 with 20 being the maximum this data set will allow). We can see that the residuals are quite widely dispersed when plotted against the fitted values, indicating that the variance of the residuals is likely much higher than the mean. A poisson distribution assumes the mean and the variance will be roughly equal, but a negative binomial distribution accounts for a variance larger than the mean.
With one last visualization plotting residuals against fitted values, we can see there is more variation as fitted values rise above ~40, which falls outside the assumptions of a poisson distribution.
NOTE: I am uncertain if I might be able to say the residuals are fairly evenly dispersed around 0 with some outliers - I think they are, but I also guessed incorrectly when Trevor asked in class if the demo residuals were evenly distributed around 0 (I said no, but they were). We can check this with the nifty “create a dispersion function” trick.
With a dispersion output of 20.28, we can now confidently say this model is overdispersed and the poisson is not the correct distribution.
GAM with Negative Binomial Distribution
We will first run the GAM without specifying the number of basis functions to see how it looks.
Code
COTEgam5<-gam(data=COTEnoouts, number_depredated~s(NAO) + prey_species, family=nb()) #family = nb() gives me a negative binomial distribution - r will automatically calculate the theta value for me in order to manage the overdispersion and display that value next to the family type in the summarysummary(COTEgam5)
Method: REML Optimizer: outer newton
full convergence after 7 iterations.
Gradient range [-0.0002732086,0.0001715235]
(score 906.7095 & scale 1).
Hessian positive definite, eigenvalue range [0.0002731124,93.50103].
Model rank = 37 / 37
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(NAO) 9 1 0.9 0.22
We can see from our GAM summary output that our dispersion is still greater than one (1.392 - this value is given in the family output with a negative binomial distribution), but it is a distinct improvement over a dispersion of 20.28! Additionally, our gam.check outputs look much better. The QQ plot shows a much straighter distribution of residuals that actually follow the theoretical quartiles line, and the histogram of residuals is much more normally shaped. The residuals vs linear predictions is more randomly distributed, though there is still some clustering closer to 0. The response vs fitted values still do not cluster around a 1-1 line; however, I have not been able to fix that with any additional attempts at smoothing or setting other parameters.
This appears to be the best model (at least thus far). It only explains 73.8% of deviance in the data set, but it is more important to be sure the model actually fits than incorrectly explain more of the deviance. Additionally, this model shows the effect of NAO on prey captured is not significant, but the effect of prey species is significant for prey such as hake (p<0.001), herring (p<0.05), and sand lance (p<0.001).
We can look at the predicted values from this model.
#this calulates the predicted values and plots them, showing the data. It creates one plot of predicted values for each of the parameters in my model (NAO and prey species). show_ci lets me show the confidence interval around my predicted mean line, and ci_style lets me choose how I want to visualize that (dashed line in this case). The colors argument has highlighted the mean of the predicted values (regression line for NAO plot and point for prey species plot).I am still trying to work out how to rotate the x-axis labels so they're legible... Tried to do this in ggplot2 and it didn't work well so not sure how to rotate the labels here.
While we can see there is a slight negative correlation between NAO and number of prey captured, it is not a significant correlation. The number of prey captured is significantly higher for hake, sand lance, herring, and, as always, prey items that could not be positively identified by observers before being eaten. The significance of these species is unsurprising because they are well recognized as preferred food sources for COTE.
GAM with Negative Binomial Distribution (ARTE)
Now to do the same model for ARTE using the same parameters.
Code
ARTEnoouts<- ternshunt %>%filter(tern_species.x!="COTE", tern_species.y!="COTE", year !="2010")ARTEgam1<-gam(data=ARTEnoouts, number_depredated~s(NAO) + prey_species, family=nb()) #family = nb() gives me a negative binomial distribution - r will automatically calculate the theta value for me in order to manage the overdispersion and display that value next to the family type in the summarysummary(ARTEgam1)
Method: REML Optimizer: outer newton
full convergence after 3 iterations.
Gradient range [-1.073624e-09,3.051843e-09]
(score 889.6921 & scale 1).
Hessian positive definite, eigenvalue range [0.3383374,95.24936].
Model rank = 37 / 37
Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.
k' edf k-index p-value
s(NAO) 9.00 3.01 0.88 0.26
Our gam.check gives very similar outputs to our COTE data, indicating that our negative binomial GAM is still our best fitting model for ARTE data (as it should be). Hake, herring, sand lance, and invertebrates are all indicated as statistically significant prey species for ARTE. Number of prey captured by ARTE appears stable until NAO reaches ~1, at which point there is a slight negative correlation; however, this relationship is not statistically significant.
With our predicted values, we can see that the number of ARTE hunts predicted to be successful is greatest when NAO is around 1; we know that this relationship is not significant, but we can now see that the lack of significance is likely due to the large degree of variability, as indicated by the broad 95% confidence intervals.
My apologies for the x-axis labels - I’m still fighting with code on this.
Footnotes
I used pivot_longer to make “prey species” a single variable associated with “number observed,” which is the number of that species observed being eaten by terns in a given observation period.↩︎
I wrote the “long” data set out as a csv file and then read it back in so I may work with it to create summary tables.↩︎
O/hr =Observations per hour, \(x\) = prey observed, and \(y\) = length of observation period in hours↩︎
NAO Index Data provided by the Climate Analysis Section, NCAR, Boulder, USA, Hurrell (2003). Updated regularly. Accessed 15 March 2025.