Introduction

Aquaculture plays a central role in global food production, and Atlantic salmon farming represents one of the most economically significant sectors within this industry. Norwegian Veterinary Institute regularly monitors fish health and welfare through national reporting systems, as Norway is one of the world’s leading producers of farmed salmon (Junior, 2025). As a result, tracking mortality and production losses is not only important for animal welfare, but also for economic sustainability and environmental impact.

The dataset used in this analysis is derived from the Norwegian Veterinary Institute’s Fish Health Report and associated public datasets, which compile monthly records of salmonid mortality and losses across Norway since 2020. These data are collected from official statistics, laboratory diagnostics, and field reports from fish health professionals and regulatory authorities such as the Norwegian Food Safety Authority. Together, these sources provide a comprehensive overview of trends in fish health, disease prevalence, and production outcomes in aquaculture systems. Mortality and loss events in farmed salmon populations can arise from a variety of biological and environmental factors. Infectious diseases such as pancreas disease and infectious salmon anemia, parasitic infestations like sea lice, and environmental stressors including temperature fluctuations and hypoxia have all been identified as major contributors to fish loss (Shafi, 2025; Hossain, 2025). In addition to direct mortality, “losses” may also include fish that are culled, escape, or are otherwise removed from production, making this metric broader than mortality alone. Monitoring these patterns over time allows researchers and policymakers to identify risk factors, evaluate management strategies, and work toward national goals of reducing fish mortality. Seasonal variation is often considered an important factor in aquaculture outcomes, as environmental conditions such as water temperature, oxygen levels, and pathogen dynamics can fluctuate throughout the year. Therefore, examining whether fish losses differ significantly across months may provide insight into potential seasonal risks or management challenges.

In this study, we analyzed monthly salmon loss data to determine whether there is a statistically significant relationship between time of year (month) and the number of losses observed. We Hypothesize that there is a relationship between the monthly losses and the month, which would suggest that losses are influenced by the seasonal factors.

Main Research Question

Which Month Lost the Most Salmon in Norway during the year 2020 and Is There a Relationship between Time and Losses?

Load Libraries & Data

First, it is necessary to load the libraries needed to work on answering this research question.

library(tidytuesdayR)
library(dplyr)
library(ggplot2)
library(magrittr)
library(tidyr)

Load Dataset

Now this data set is going to be loaded from tidy tuesday under Salmonid Mortality.

monthly_losses_data <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-03-17/monthly_losses_data.csv')
show_col_types=FALSE

Explore Data

Next is to explore the data set. There is a lot of data on here that is not necessary so filtering the data is necessary to answer our research question.

head(monthly_losses_data)
## # A tibble: 6 × 9
##   species date       geo_group region losses   dead discarded escaped other
##   <chr>   <date>     <chr>     <chr>   <dbl>  <dbl>     <dbl>   <dbl> <dbl>
## 1 salmon  2020-01-01 area      1       31425  28126      3299       0     0
## 2 salmon  2020-01-01 area      2      324116 277888     46113       0   115
## 3 salmon  2020-01-01 area      3      844829 776983     63770       0  4076
## 4 salmon  2020-01-01 area      4      676852 623159     51823       0  1870
## 5 salmon  2020-01-01 area      5      109269  97627     11424       0   218
## 6 salmon  2020-01-01 area      6      548921 531193     15710       0  2018

Focusing on the salmon losses data from January 2020-December 2020 in the country Norway. In order to make a table of the data, we will pipe the data first to filter the year, months and region we are working with.

data_2020<-monthly_losses_data %>%
  separate(date, into=c("year", "month", "day"), sep="-")
data_2020 
## # A tibble: 2,808 × 11
##    species year  month day   geo_group region losses   dead discarded escaped
##    <chr>   <chr> <chr> <chr> <chr>     <chr>   <dbl>  <dbl>     <dbl>   <dbl>
##  1 salmon  2020  01    01    area      1       31425  28126      3299       0
##  2 salmon  2020  01    01    area      2      324116 277888     46113       0
##  3 salmon  2020  01    01    area      3      844829 776983     63770       0
##  4 salmon  2020  01    01    area      4      676852 623159     51823       0
##  5 salmon  2020  01    01    area      5      109269  97627     11424       0
##  6 salmon  2020  01    01    area      6      548921 531193     15710       0
##  7 salmon  2020  01    01    area      7      231487 228847      2640       0
##  8 salmon  2020  01    01    area      8      442659 431218     11228       0
##  9 salmon  2020  01    01    area      9      311127 308298      2212       0
## 10 salmon  2020  01    01    area      10     288849 281808      3120       0
## # ℹ 2,798 more rows
## # ℹ 1 more variable: other <dbl>

Using the pipe function, it allowed for the separation of our data so Month, Day and Year are in different columns, and also in a new data frame.

norway_data<-data_2020 %>%
  filter(year == "2020", region == "Norge", species =="salmon"
  )
norway_data
## # A tibble: 12 × 11
##    species year  month day   geo_group region  losses    dead discarded escaped
##    <chr>   <chr> <chr> <chr> <chr>     <chr>    <dbl>   <dbl>     <dbl>   <dbl>
##  1 salmon  2020  01    01    country   Norge  4937405 4625768    240880      19
##  2 salmon  2020  02    01    country   Norge  4857333 4272057    192937       0
##  3 salmon  2020  03    01    country   Norge  5155300 4725255    294044       1
##  4 salmon  2020  04    01    country   Norge  5132123 4695889    252304     500
##  5 salmon  2020  05    01    country   Norge  5219907 4425770    391975       0
##  6 salmon  2020  06    01    country   Norge  4848872 3478598    341729       1
##  7 salmon  2020  07    01    country   Norge  4543867 2914237    235560       0
##  8 salmon  2020  08    01    country   Norge  5428356 4568796    205141     858
##  9 salmon  2020  09    01    country   Norge  4718987 4336320    276315       0
## 10 salmon  2020  10    01    country   Norge  5709293 5148822    326350    4825
## 11 salmon  2020  11    01    country   Norge  4747988 4386612    329608       0
## 12 salmon  2020  12    01    country   Norge  5003582 4572799    404380      11
## # ℹ 1 more variable: other <dbl>

Using piping again, this filters the data from other regions, focusing solely on Norway. There were alsontwo species in Norway which was rainbowtrout and salmon, we are only focusing on salmon losses, this also created a new data frame.

MonthLosses <- data.frame(
  Month = norway_data$month,
  Losses = norway_data$losses)

MonthLosses$Month<-month.abb[as.numeric(MonthLosses$Month)]
MonthLosses$Losses<-format(MonthLosses$Losses, big.mark = ",")
colnames(MonthLosses)<-c("Month", "Losses")
print(MonthLosses, row.names=FALSE)
##  Month    Losses
##    Jan 4,937,405
##    Feb 4,857,333
##    Mar 5,155,300
##    Apr 5,132,123
##    May 5,219,907
##    Jun 4,848,872
##    Jul 4,543,867
##    Aug 5,428,356
##    Sep 4,718,987
##    Oct 5,709,293
##    Nov 4,747,988
##    Dec 5,003,582

This table shows the months and salmon losses only. Some stylistic changes were made as well, by changing the months from dates like 01, 02, 03 to Jan., Feb., Mar., etc.

hist(norway_data$losses,
     main = "Distribution of Salmon Losses (Norway, 2020)",
     xlab = "Losses",
     col = "steelblue",
     border = "white")

This is a histogram to assess the distribution of salmon losses in 2020. In which appears to be approximately normally distributed but it also seems to be slightly skewed to the right.

ggplot(MonthLosses, aes(x = Month, y = Losses)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(
    title = "Monthly Salmon Losses in Norway",
    x = "Month",
    y = "Salmon Losses"
  ) +
  theme_minimal()

This is a bar graph that looks at monthly salmon losses by month. It is important to note that October seems to experience the most losses while July sees very little losses compared to other months.

Statistical Tests

A linear regression will be used because the data consisted of a single aggregated observation per month, making a trend-based analysis more appropriate than group comparison methods such as ANOVA. The model assessed whether salmon losses changed systematically over time.This will also be good to confirm which month experienced the most losses of salmon.

MonthLosses$MonthIndex <- 1:12
MonthLosses$Losses <- as.numeric(gsub(",", "", MonthLosses$Losses))
model <- lm(Losses ~ MonthIndex, data = MonthLosses)
summary(model)
## 
## Call:
## lm(formula = Losses ~ MonthIndex, data = MonthLosses)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -484665 -206521  -54758  165884  661078 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4982604     209651   23.77 3.95e-10 ***
## MonthIndex      6561      28486    0.23    0.822    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 340600 on 10 degrees of freedom
## Multiple R-squared:  0.005277,   Adjusted R-squared:  -0.0942 
## F-statistic: 0.05305 on 1 and 10 DF,  p-value: 0.8225
ggplot(MonthLosses, aes(x = MonthIndex, y = Losses)) +
  geom_point() +
  geom_line() +
  theme_minimal() +
  labs(title = "Salmon Losses Over Time",
       x = "Month",
       y = "Losses")+
  geom_smooth(method="lm")
## `geom_smooth()` using formula = 'y ~ x'

There is no evidence of a significant linear trend in salmon losses across months (p = 0.822). Month explains virtually none of the variation in losses (R² ≈ 0.005), indicating that fluctuations are likely due to other factors rather than time. However, it is very obvious that October experienced the most losses of salmon.

In order to check our linear regression assumptions and confirm its validity using the lm function and a QQplot of residuals and fitted values vs residuals.

lm(Losses ~ MonthIndex, data=MonthLosses)
## 
## Call:
## lm(formula = Losses ~ MonthIndex, data = MonthLosses)
## 
## Coefficients:
## (Intercept)   MonthIndex  
##     4982604         6561
qqnorm(model$residuals,
       xlab = "Theoretical Quantiles",
       ylab = "Sample Quantiles",
       main = "Normal Q-Q Plot of Regression Residuals")

qqline(model$residuals, col = "blue")

plot(model$fitted.values, model$residuals,
     xlab = "Fitted Values",
     ylab = "Residuals",
     main = "Residuals vs Fitted Values")

abline(h = 0, col = "blue")

Essentially, the model is saying Losses=4982604+6561⋅MonthIndex.Therefore, The model estimates a baseline loss of about 4.98 million salmon, with a small increase of approximately 6,561 losses per month.

The Q–Q plot shows that residuals are approximately normally distributed, with minor deviations at the tails.

The residuals vs fitted values plot shows no strong pattern, suggesting that the assumptions of linearity and constant variance are valid.

Discussion

This analysis examined whether month predicts salmon losses in Norway during 2020. A linear regression model was used because the dataset contained one observation per month, which made the use of ANOVA inappropriate due to the lack of within-group replication. Treating month as a continuous variable allowed us to test for a trend over time.

The regression results showed no significant relationship between month and losses (p = 0.822). Although the model estimated a small positive slope, this effect was negligible and not statistically significant.

Diagnostic plots supported the validity of the model. Residuals were approximately normally distributed, and no clear pattern was observed in the residuals vs. fitted plot, suggesting that model assumptions were reasonably met. Therefore, the lack of significance reflects the data rather than a violation of assumptions.

Descriptively, October had the highest losses, but this difference was not statistically meaningful. The variation across months appears random rather than systematic.

These results suggest that salmon losses are likely driven by other factors, such as disease, environmental conditions, or management practices, rather than by month alone. Additionally, the small sample size (n = 12) limits statistical power and may reduce the ability to detect subtle trends.

Conclusion

This study aimed to determine whether there was a relationship between month and salmon losses in Norway during 2020 and to identify which month experienced the highest losses. The results showed no statistically significant relationship between month and salmon losses (p = 0.822), indicating that time of year alone does not explain variation in losses. Although October had the highest observed losses, these differences were not statistically meaningful. Overall, salmon losses are likely influenced by more complex environmental and biological factors rather than seasonal trends alone.

Refrerences

Hossain, K. Z. H. Z. (2025, November 3). Infectious threats to salmon aquaculture: epidemiological insights, pathological impacts, and management measures. Med Crave. https://www.medcrave.com/articles/det/30298/Infectious-threats-to-salmon-aquaculture-epidemiological-insights-pathological-impacts-and-management-measures

Junior. (2025, December 3). Norway farms half the world’s salmon while safeguarding wild populations. SEVENSEAS Media. https://sevenseasmedia.org/norway-salmon-farming/

Shafi M. E. (2025). A comprehensive review of summer mortality syndrome in fish: Causes, climate impact, and mitigation strategies. Journal of advanced veterinary and animal research, 12(3), 890–907. https://doi.org/10.5455/javar.2025.l951