Main Research Question
Which Month Lost the Most Salmon in Norway during the year 2020 and Is There a Relationship between Time and Losses?
Aquaculture plays a central role in global food production, and Atlantic salmon farming represents one of the most economically significant sectors within this industry. Norwegian Veterinary Institute regularly monitors fish health and welfare through national reporting systems, as Norway is one of the world’s leading producers of farmed salmon (Junior, 2025). As a result, tracking mortality and production losses is not only important for animal welfare, but also for economic sustainability and environmental impact.
The dataset used in this analysis is derived from the Norwegian Veterinary Institute’s Fish Health Report and associated public datasets, which compile monthly records of salmonid mortality and losses across Norway since 2020. These data are collected from official statistics, laboratory diagnostics, and field reports from fish health professionals and regulatory authorities such as the Norwegian Food Safety Authority. Together, these sources provide a comprehensive overview of trends in fish health, disease prevalence, and production outcomes in aquaculture systems. Mortality and loss events in farmed salmon populations can arise from a variety of biological and environmental factors. Infectious diseases such as pancreas disease and infectious salmon anemia, parasitic infestations like sea lice, and environmental stressors including temperature fluctuations and hypoxia have all been identified as major contributors to fish loss (Shafi, 2025; Hossain, 2025). In addition to direct mortality, “losses” may also include fish that are culled, escape, or are otherwise removed from production, making this metric broader than mortality alone. Monitoring these patterns over time allows researchers and policymakers to identify risk factors, evaluate management strategies, and work toward national goals of reducing fish mortality. Seasonal variation is often considered an important factor in aquaculture outcomes, as environmental conditions such as water temperature, oxygen levels, and pathogen dynamics can fluctuate throughout the year. Therefore, examining whether fish losses differ significantly across months may provide insight into potential seasonal risks or management challenges.
In this study, we analyzed monthly salmon loss data to determine whether there is a statistically significant relationship between time of year (month) and the number of losses observed. We Hypothesize that there is a relationship between the monthly losses and the month, which would suggest that losses are influenced by the seasonal factors.
Which Month Lost the Most Salmon in Norway during the year 2020 and Is There a Relationship between Time and Losses?
First, it is necessary to load the libraries needed to work on answering this research question.
library(tidytuesdayR)
library(dplyr)
library(ggplot2)
library(magrittr)
library(tidyr)
Now this data set is going to be loaded from tidy tuesday under Salmonid Mortality.
monthly_losses_data <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-03-17/monthly_losses_data.csv')
show_col_types=FALSE
Next is to explore the data set. There is a lot of data on here that is not necessary so filtering the data is necessary to answer our research question.
head(monthly_losses_data)
## # A tibble: 6 × 9
## species date geo_group region losses dead discarded escaped other
## <chr> <date> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 salmon 2020-01-01 area 1 31425 28126 3299 0 0
## 2 salmon 2020-01-01 area 2 324116 277888 46113 0 115
## 3 salmon 2020-01-01 area 3 844829 776983 63770 0 4076
## 4 salmon 2020-01-01 area 4 676852 623159 51823 0 1870
## 5 salmon 2020-01-01 area 5 109269 97627 11424 0 218
## 6 salmon 2020-01-01 area 6 548921 531193 15710 0 2018
Focusing on the salmon losses data from January 2020-December 2020 in the country Norway. In order to make a table of the data, we will pipe the data first to filter the year, months and region we are working with.
data_2020<-monthly_losses_data %>%
separate(date, into=c("year", "month", "day"), sep="-")
data_2020
## # A tibble: 2,808 × 11
## species year month day geo_group region losses dead discarded escaped
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 salmon 2020 01 01 area 1 31425 28126 3299 0
## 2 salmon 2020 01 01 area 2 324116 277888 46113 0
## 3 salmon 2020 01 01 area 3 844829 776983 63770 0
## 4 salmon 2020 01 01 area 4 676852 623159 51823 0
## 5 salmon 2020 01 01 area 5 109269 97627 11424 0
## 6 salmon 2020 01 01 area 6 548921 531193 15710 0
## 7 salmon 2020 01 01 area 7 231487 228847 2640 0
## 8 salmon 2020 01 01 area 8 442659 431218 11228 0
## 9 salmon 2020 01 01 area 9 311127 308298 2212 0
## 10 salmon 2020 01 01 area 10 288849 281808 3120 0
## # ℹ 2,798 more rows
## # ℹ 1 more variable: other <dbl>
Using the pipe function, it allowed for the separation of our data so Month, Day and Year are in different columns, and also in a new data frame.
norway_data<-data_2020 %>%
filter(year == "2020", region == "Norge", species =="salmon"
)
norway_data
## # A tibble: 12 × 11
## species year month day geo_group region losses dead discarded escaped
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 salmon 2020 01 01 country Norge 4937405 4625768 240880 19
## 2 salmon 2020 02 01 country Norge 4857333 4272057 192937 0
## 3 salmon 2020 03 01 country Norge 5155300 4725255 294044 1
## 4 salmon 2020 04 01 country Norge 5132123 4695889 252304 500
## 5 salmon 2020 05 01 country Norge 5219907 4425770 391975 0
## 6 salmon 2020 06 01 country Norge 4848872 3478598 341729 1
## 7 salmon 2020 07 01 country Norge 4543867 2914237 235560 0
## 8 salmon 2020 08 01 country Norge 5428356 4568796 205141 858
## 9 salmon 2020 09 01 country Norge 4718987 4336320 276315 0
## 10 salmon 2020 10 01 country Norge 5709293 5148822 326350 4825
## 11 salmon 2020 11 01 country Norge 4747988 4386612 329608 0
## 12 salmon 2020 12 01 country Norge 5003582 4572799 404380 11
## # ℹ 1 more variable: other <dbl>
Using piping again, this filters the data from other regions, focusing solely on Norway. There were alsontwo species in Norway which was rainbowtrout and salmon, we are only focusing on salmon losses, this also created a new data frame.
MonthLosses <- data.frame(
Month = norway_data$month,
Losses = norway_data$losses)
MonthLosses$Month<-month.abb[as.numeric(MonthLosses$Month)]
MonthLosses$Losses<-format(MonthLosses$Losses, big.mark = ",")
colnames(MonthLosses)<-c("Month", "Losses")
print(MonthLosses, row.names=FALSE)
## Month Losses
## Jan 4,937,405
## Feb 4,857,333
## Mar 5,155,300
## Apr 5,132,123
## May 5,219,907
## Jun 4,848,872
## Jul 4,543,867
## Aug 5,428,356
## Sep 4,718,987
## Oct 5,709,293
## Nov 4,747,988
## Dec 5,003,582
This table shows the months and salmon losses only. Some stylistic changes were made as well, by changing the months from dates like 01, 02, 03 to Jan., Feb., Mar., etc.
hist(norway_data$losses,
main = "Distribution of Salmon Losses (Norway, 2020)",
xlab = "Losses",
col = "steelblue",
border = "white")
This is a histogram to assess the distribution of salmon losses in 2020. In which appears to be approximately normally distributed but it also seems to be slightly skewed to the right.
ggplot(MonthLosses, aes(x = Month, y = Losses)) +
geom_bar(stat = "identity", fill = "steelblue") +
labs(
title = "Monthly Salmon Losses in Norway",
x = "Month",
y = "Salmon Losses"
) +
theme_minimal()
This is a bar graph that looks at monthly salmon losses by month. It is important to note that October seems to experience the most losses while July sees very little losses compared to other months.
A linear regression will be used because the data consisted of a single aggregated observation per month, making a trend-based analysis more appropriate than group comparison methods such as ANOVA. The model assessed whether salmon losses changed systematically over time.This will also be good to confirm which month experienced the most losses of salmon.
MonthLosses$MonthIndex <- 1:12
MonthLosses$Losses <- as.numeric(gsub(",", "", MonthLosses$Losses))
model <- lm(Losses ~ MonthIndex, data = MonthLosses)
summary(model)
##
## Call:
## lm(formula = Losses ~ MonthIndex, data = MonthLosses)
##
## Residuals:
## Min 1Q Median 3Q Max
## -484665 -206521 -54758 165884 661078
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4982604 209651 23.77 3.95e-10 ***
## MonthIndex 6561 28486 0.23 0.822
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 340600 on 10 degrees of freedom
## Multiple R-squared: 0.005277, Adjusted R-squared: -0.0942
## F-statistic: 0.05305 on 1 and 10 DF, p-value: 0.8225
ggplot(MonthLosses, aes(x = MonthIndex, y = Losses)) +
geom_point() +
geom_line() +
theme_minimal() +
labs(title = "Salmon Losses Over Time",
x = "Month",
y = "Losses")+
geom_smooth(method="lm")
## `geom_smooth()` using formula = 'y ~ x'
There is no evidence of a significant linear trend in salmon losses across months (p = 0.822). Month explains virtually none of the variation in losses (R² ≈ 0.005), indicating that fluctuations are likely due to other factors rather than time. However, it is very obvious that October experienced the most losses of salmon.
In order to check our linear regression assumptions and confirm its validity using the lm function and a QQplot of residuals and fitted values vs residuals.
lm(Losses ~ MonthIndex, data=MonthLosses)
##
## Call:
## lm(formula = Losses ~ MonthIndex, data = MonthLosses)
##
## Coefficients:
## (Intercept) MonthIndex
## 4982604 6561
qqnorm(model$residuals,
xlab = "Theoretical Quantiles",
ylab = "Sample Quantiles",
main = "Normal Q-Q Plot of Regression Residuals")
qqline(model$residuals, col = "blue")
plot(model$fitted.values, model$residuals,
xlab = "Fitted Values",
ylab = "Residuals",
main = "Residuals vs Fitted Values")
abline(h = 0, col = "blue")
Essentially, the model is saying Losses=4982604+6561⋅MonthIndex.Therefore, The model estimates a baseline loss of about 4.98 million salmon, with a small increase of approximately 6,561 losses per month.
The Q–Q plot shows that residuals are approximately normally distributed, with minor deviations at the tails.
The residuals vs fitted values plot shows no strong pattern, suggesting that the assumptions of linearity and constant variance are valid.
This analysis examined whether month predicts salmon losses in Norway during 2020. A linear regression model was used because the dataset contained one observation per month, which made the use of ANOVA inappropriate due to the lack of within-group replication. Treating month as a continuous variable allowed us to test for a trend over time.
The regression results showed no significant relationship between month and losses (p = 0.822). Although the model estimated a small positive slope, this effect was negligible and not statistically significant.
Diagnostic plots supported the validity of the model. Residuals were approximately normally distributed, and no clear pattern was observed in the residuals vs. fitted plot, suggesting that model assumptions were reasonably met. Therefore, the lack of significance reflects the data rather than a violation of assumptions.
Descriptively, October had the highest losses, but this difference was not statistically meaningful. The variation across months appears random rather than systematic.
These results suggest that salmon losses are likely driven by other factors, such as disease, environmental conditions, or management practices, rather than by month alone. Additionally, the small sample size (n = 12) limits statistical power and may reduce the ability to detect subtle trends.
This study aimed to determine whether there was a relationship between month and salmon losses in Norway during 2020 and to identify which month experienced the highest losses. The results showed no statistically significant relationship between month and salmon losses (p = 0.822), indicating that time of year alone does not explain variation in losses. Although October had the highest observed losses, these differences were not statistically meaningful. Overall, salmon losses are likely influenced by more complex environmental and biological factors rather than seasonal trends alone.
Hossain, K. Z. H. Z. (2025, November 3). Infectious threats to salmon aquaculture: epidemiological insights, pathological impacts, and management measures. Med Crave. https://www.medcrave.com/articles/det/30298/Infectious-threats-to-salmon-aquaculture-epidemiological-insights-pathological-impacts-and-management-measures
Junior. (2025, December 3). Norway farms half the world’s salmon while safeguarding wild populations. SEVENSEAS Media. https://sevenseasmedia.org/norway-salmon-farming/
Shafi M. E. (2025). A comprehensive review of summer mortality syndrome in fish: Causes, climate impact, and mitigation strategies. Journal of advanced veterinary and animal research, 12(3), 890–907. https://doi.org/10.5455/javar.2025.l951