In this project, we will be exploring data on the biological makeup of marine life in Alaska. This dataset, collected and maintained by the Nutritional Ecology Laboratory, contains detailed information on various species, from small organisms like phytoplankton to large animals such as whales and seals. This data was last updated in April 2024 and covers a wide range of factors, including the size, age, and developmental stage of the creatures sampled, along with seasonal information and energy content. It seems like an incredible resource for understanding the nutritional profile of Alaska’s diverse marine species and ecosystems. Our specific focus is on different salmon species, which are critical to Alaska’s ecosystems, economy, and cultural heritage. By analyzing different species of salmon, we aim to uncover trends over time and across various areas/regions of Alaska by looking at the weight and size of these salmon. Splitting the data by region (and subregion) as well as by species will allow us to detect patterns that could indicate changes in the salmon population’s. We will be able to see differences in size over time, as well as over different parts of the state to see the interplay between these variables. Ultimately, we hope to reveal insights into how environmental factors may be affecting salmon—a topic of interest for scientists, fishers, and all those connected to Alaska’s marine environment. Through visualizations and analysis, we plan to tell the story of these changes and what they might mean for the future of salmon in Alaska. There has been significant research conducted on the nutritional ecology and biochemical characteristics of salmon and other marine species for the use of food wed dynamics and fisheries management. Included in these are studies about the changes in size of salmon over time and hopefully our project will find evidence that supports this prior research.
The lipid dataset of Alaskan fish, marine mammals, and invertebrates comes from the Nutritional Ecology Laboratory, which is part of the Alaska Fisheries Science Center (AFSC) and the Alaska Biological Laboratory (ABL). The data was collected through systematic sampling of various marine species, including different species of salmon, over several years. Samples were obtained from diverse environments within Alaska, including coastal waters, rivers, and lakes. The collection process involves field surveys and laboratory analysis, where researchers capture specimens, measure their size and age, and examine their biochemical composition. The dataset includes samples collected over multiple seasons and years, providing a snapshot of the species’ health. The last update to the dataset was on April 1, 2024, suggesting ongoing data collection efforts, while the earliest entries are from 1997. The key variables that we will be examining are species (speciesbio), region, subregion, weight and length.
There are five species of salmon that inhabit Alaska. These are the King (Chinook) salmon, Sockeye (Red) salmon, Coho (Silver) salmon, Chum (Dog/Keta) Salmon and Pink (Humpy) salmon and all species inhabit virtually all Alaskan waters. These fish are born in freshwater rivers and as they grow up they migrate to the Oceans surrounding Alaska where they spend their adult life. After some years, the salmon return to the very same rivers that they were born in to spawn and lay eggs in safer fresh water. After this process, the adult salmon die and their life cycle restarts. This cycle of fish is predictable as it happens in early-mid summer every year, and since the salmon return to the same rivers that they were born in, fishermen are able to predict where these large numbers of salmon with flock to. This industry is very important to the Alaska economy and culture with a total of $600 million in annual economic output being connected to Alaska’s salmon harvest1, while employing an estimated 62,200 jobs2. Thus, this industry also creates revenue though the use of seasonal workers who come to live/work in Alaska during the summer months when salmon are plentiful. This means that Alaskans have come to heavily rely and profit off of these annual salmon runs3. Since over-fishing is very realistic threat, the state’s government have put instated numerous precautions and regulations for both sport and commercial fishing to continue to harvest salmon as a renewable resource. Permits, strict fishing time windows and law enforcement all help to conduct this industry in a legal and sustainable way4. Thus, making the study of salmon and their living patterns of utmost importance to the state. One way that we can begin to address some of the questions that we have asked for our research project is to start by investigating where these fish actually are geographically.
To do this using our lipids dataset we begin by isolating all the salmon from the tens of thousands of observations recorded.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
library(ggridges)
library(lubridate)
library(sf)
## Linking to GEOS 3.7.2, GDAL 3.0.4, PROJ 6.3.2; sf_use_s2() is TRUE
library(rnaturalearth)
library(rnaturalearthdata)
##
## Attaching package: 'rnaturalearthdata'
##
## The following object is masked from 'package:rnaturalearth':
##
## countries110
library(viridis)
## Loading required package: viridisLite
AK_Animals <- read_csv("AK_Lipid_Data.csv")
## Rows: 105534 Columns: 46
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (26): project, bag_id, speciesbio, group_1, tissuebio, region, location...
## dbl (19): sin, month, year, length, lndsc, weight, wtdsc, lipid, moisture, ...
## dttm (1): date_coll
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
salmon_data <- AK_Animals |>
filter(str_detect(speciesbio, "salmon"))
First, let’s load a map of Alaska into R so that we can use it as the background of out graph.
#load USA and select Alaska
us_states <- ne_states(country = "United States of America", returnclass = "sf")
alaska <- us_states |> filter(name == "Alaska")
#crop out the Aleutian Islands
alaska_cropped <- st_crop(alaska, xmin = -180, xmax = -130, ymin = 50, ymax = 72)
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
Now that a map of Alaska has been loaded in, we will take each region that the salmon were recorded in, and map distinct coordinated to them. This allows us to map each region directly to the map easily. Each of these regions is a large expanse of nautical area, and instead of looking into the hundreds of sub-locations and sub-regions utilized in this dataset, I have decided to take a more broad approach and map each region to an approximate coordinate pair. For example, if a salmon was observed in the Bering Sea, instead of looking directly at which channel or section of the Bering Sea it was in, we have grouped all salmon from this region into one approximate point that is displayed right in the middle of the Bering Sea.
#create coordinate data.frame
coords <- data.frame(
region = c("Arctic", "Bering Sea", "Gulf of Alaska", "PWS", "Pacific Northwest", "SEAK", "Yukon River"),
lat = c(66.634359, 56.9073, 57.91283, 60.645608, 51.259666, 57.199463, 62.78037 ),
lon = c(-168.60228, -178.1395, -146.63892,-147.227734, -128.906534,-133.811323,-165.102547))
Next, we conglomerate all observed salmon by decade. To do this, we take all observations’ dates and organize them by year and then put all years from each decade into their own group.
#Sort the salmon into their respective decade based on date WE CAN CHANGE THIS TO 5 YEARS also
salmon_data_decade <- salmon_data |>
mutate(decade = floor(year(ymd(date_coll)) / 10) * 10)
After further manipulation of our data we group and regroup our variable to organize it in a way such that we can plot it.
#Organize the data by region, decade, and species
salmon_summary <- salmon_data_decade |>
group_by(region, decade, speciesbio) |>
summarize(count = n(), .groups = 'drop') |>
group_by(region, decade) |>
mutate(total = sum(count), proportion = count / total) |>
ungroup()
#merge coordinates into the organized data with join
salmon_summary <- salmon_summary |>
left_join(coords, by = "region")
#plot that stuff
ggplot() +
geom_sf(data = alaska_cropped, fill = "white", color = "black") + #Alaska map coloring
geom_point(data = salmon_summary,
# Points for salmon distribution
aes(x = lon, y = lat, size = total, fill = speciesbio),
shape = 21, alpha = 0.95) +
scale_size(range = c(7, 30), trans = "sqrt", breaks = c(1250, 2500, 5000, 10000)) +
scale_fill_manual(
values = c("#D55E00", "#CC79A7", "#009E73", "#F0E442", "#0072B2"),
guide = guide_legend(override.aes = list(size = 6))) + #size of points in the legend
labs(
title = "Number of Observed Salmon Species in Alaska by Decade",
size = "Total Salmon",
fill = "Species",
x = "Longitude",
y = "Latitude" ) +
theme_minimal() +
facet_wrap(~ decade, labeller = labeller(decade = function(x) paste0(x, "s"))) +
theme(
strip.text = element_text(size = 16), #facet labels size
plot.title = element_text(size = 20), #title size
axis.text = element_text(size = 12), #axis label size
legend.key.size = unit(2, "lines"), #size of legend keys
legend.text = element_text(size = 16), #legend text size
legend.title = element_text(size = 18) #legend title size
)
Figure 1: This chart suggests that the scientists studying these five salmon species have observed tens of thousands of specimens over many different, vast nautical areas over the past four decades. For each salmon observed, the scientists recorded the date, species, region, location and sublocation. This chart combines all salmon observed from the same decade and puts them into a group. We then looked at the region where the salmon was observed (region being the most borad location category from this dataset) and estimated a lattitude/longitude for each region. These are then coordinated to a map of Alaska and shown by scale of total fish. Please note that this graph does not display the distribution of the different salmon species populations, but instead just depicts where, when and how many salmon the scientists looked at. The date represents the date that the salmon was observed, which are then put into their respective decade. The region is recorded as one of 7 regions across the state of Alaska, these are the Gulf of Alaska, Arctic Ocean, Bering Sea, Yukon River (mouth), Southeast Alaska, Prince Willian Sound, and Pacific Northwest. Overall, we can see that over these 4 decades, scientists have consistently made observatisn about sockey salmon in Southeast Alaska specifically. Thus, we can compare these sockeye in this region over time to see trends and make conclusions that are backed up by suffiently supporting data. There were no statistical tests used in the visulaization of this data.
Our project aims to investigate differences in size over time throughout different salmon species and regions of Alaska over time. In the figure above we are able to determine where and how much salmon are being observed. While this graph does not display or represent the distribution of salmon species or their population over time-as the data was not a catalog of all salmon in a population, but rather is just a representation of where these scientists happened to be while sampling and measuring these organisms. Thus, we cannot conclude trends about species populations or locations. Instead this graph aimed to provide evidence to which salmon we could use to conclude trends about. As explained in the figure caption, scientists have been sampling and measuring thousands sockeye salmon in Southeast Alaska across the last 4 decades. This means that we have a viable sample size of these sockeye salmon spanning 30+ years. Thus, looking at length, weight and other recorded features of these fish, can be expanded upon and conjectured about since we have consistently collected large amounts of data for the same species in the same region over time. This transitions us into our next figure where we investigate sockeye from Southeast Alaska (SEAK).
sockeye_seak <- salmon_data_decade |>
filter(region == "SEAK", speciesbio == "Sockeye salmon", length <= 400) |>
mutate(weight_length_ratio = weight / length)
ggplot(sockeye_seak, aes(x = length, y = weight)) +
geom_point(alpha = 0.6, color = "#56B4E9", size = 3) +
geom_smooth(method = "lm", color = "red", se = FALSE) +
facet_wrap(~ decade, ncol = 2, labeller = labeller(decade = function(x) paste0(x, "s"))) +
labs(
title = "Sockeye Salmon Weight vs. Length in SEAK Region by Decade",
x = "Length",
y = "Weight"
) +
theme_minimal() +
theme(
strip.text = element_text(size = 14),
plot.title = element_text(size = 16),
axis.text = element_text(size = 12),
axis.title = element_text(size = 14))
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
Let’s learn more about our salmon data, showing when and where they came from. First let’s look at when these salmon were collected:
salmon_data |>
filter(year < 2025) |>
ggplot() +
geom_density_ridges(aes(x = year, y = speciesbio, fill = speciesbio, color = speciesbio), alpha = .2, show.legend = FALSE) +
labs(
title = "Species of Salmon Collected by Year",
subtitle = "Source: https://catalog.data.gov/dataset/afsc-abl-lipid-dataset-of-alaskan-fish-marine-mammals-and-invertebrates1",
x = "Year Collected",
y = "Species of Salmon"
) +
theme_few()
## Picking joint bandwidth of 0.582
Figure 2: Most data for salmon was collected in 2011 through 2019, where most species contain similar, unimodal, distributions. The data was collected through many excursions from fishing in Alaska by recording different data about the biostatistics of different animals. Within our data, each row represents a different fish, with the year it was recorded and the species of the salmon also recorded.
We will use this data in an attempt to compare fish over the years, which we can do because as we see, we have data collected from many different years, with enough fish for all years. Observe that our data table for salmon alone has over 36,000 rows, so even if it appears that we do not have a lot of data for the earlier years from this graph, it may be hundreds of observations. Let’s look at where our data is collected.
ggplot(sockeye_seak, aes(x = year(ymd(date_coll)), y = weight_length_ratio)) +
geom_point(alpha = 0.6, color = "#56B4E9", size = 3) +
geom_smooth(method = "lm", color = "red", se = TRUE) +
labs(
title = "Sockeye Salmon Weight-to-Length Ratio in SEAK Region by Year",
x = "Year",
y = "Weight-to-Length Ratio"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16),
axis.text = element_text(size = 12),
axis.title = element_text(size = 14)
)
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 1 row containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
ggplot(sockeye_seak, aes(x = zone_fished)) +
geom_bar(fill = "#56B4E9", color = "black", alpha = 0.8) + # Bar plot
labs(
title = "Types of Tissue Collected from Sockeye Salmon in SEAK",
x = "Tissue Type",
y = "Count"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 16), # Title size
axis.text = element_text(size = 12), # Axis text size
axis.title = element_text(size = 14), # Axis title size
axis.text.x = element_text(angle = 45, hjust = 1) # Rotate x-axis labels
)
McDowell Group. “Economic Impact of Alaska’s Salmon Hatcheries Executive …” Alaska Department of Fish and Game, Oct. 2018, www.adfg.alaska.gov/static/fishing/PDFs/hatcheries/2018_alaskahatchery_executive_summary.pdf.↩︎
Peters, Tanna. “For Release: 2022 Economic Value of Alaska’s Seafood Industry Report.” Alaska Seafood Marketing Institute, 14 Jan. 2022, www.alaskaseafood.org/news/for-release-2022-economic-value-of-alaskas-seafood-industry-report/#:~:text=The%20seafood%20industry%20directly%20employs%2062%2C200%20workers%20in%20Alaska%20each,and%20over%206%2C500%20resident%20processors.↩︎
Fisheries, NOAA. “Economic Snapshot Shows Alaska Seafood Industry Suffered $1.8 Billion Loss 2022–2023.” NOAA, 15 Oct. 2024, www.fisheries.noaa.gov/feature-story/economic-snapshot-shows-alaska-seafood-industry-suffered-18-billion-loss-2022-2023.↩︎
“Fishing Regulations.” Sport Fishing Regulations, Alaska Department of Fish and Game, www.adfg.alaska.gov/%3Fadfg%3Dfishregulations.sport. Accessed 22 Nov. 2024.↩︎