Part 1 - Introduction

Eurasian Cranes (Grus grus) are native to large regions extending from Northern and Western Europe, specifically across Eurasia, going further to Northern China. They are a medium sized bird species with a striking wing span of 7 ft (210 cm). In our data set, the cranes population over the course of 30 years is being analyzed.These cranes stop at Lake Hornborgasjön, Sweden during their yearly migration. Our response variable (year), in which the cranes are spotted, ranges between Spring and Autumn from 1994 to 2024. Our explanatory variable (crane observations), fluctuates over this period. Tracking bird population and their migration routes is essential for preserving critical habitats and conservation.

Image_of_Eurasian_Cranes (birdfact, 2022.)
Image_of_Eurasian_Cranes (birdfact, 2022.)
cranes <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-09-30/cranes.csv')
## Rows: 1548 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): comment
## dbl  (1): observations
## lgl  (1): weather_disruption
## date (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#knitr::include_graphics('what-is-a-group-of-cranes-called.png')

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggpubr)
library(dplyr)
library(ggplot2)
library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## 
## The following object is masked from 'package:dplyr':
## 
##     recode
## 
## The following object is masked from 'package:purrr':
## 
##     some
#Install crane data
cranes <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-09-30/cranes.csv')
## Rows: 1548 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): comment
## dbl  (1): observations
## lgl  (1): weather_disruption
## date (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
is.factor(cranes$weather_disruption)
## [1] FALSE
cranes$weather_disruption <-as.factor(cranes$weather_disruption)
#summarize & mutate the data into a new dataframe based on seasons and year and average cranes counted
crane_summary <- cranes %>%
  filter(!is.na(observations)) %>%
  mutate(
    year   = lubridate::year(date), #extract year and month from dates
    month  = lubridate::month(date),
    season = if_else(month <= 6, "Spring", "Autumn") #assign months into spring migration or fall migration
  ) %>%
  group_by(year, season) %>%       # group by both sseasons and year
  summarise(
    mean_count    = mean(observations), #calculate means
    se            = sd(observations) / sqrt(n()),   # caulcate standard error
    n_counts      = n(), #number of days that the cranes were counted in each season by year combo
    pct_disrupted = sum(weather_disruption == TRUE), # number of weather events
    .groups = "drop"
  )
crane_summary #view your new summarized dataframe
## # A tibble: 54 × 6
##     year season mean_count    se n_counts pct_disrupted
##    <dbl> <chr>       <dbl> <dbl>    <int>         <int>
##  1  1994 Spring      2750.  284.       28             0
##  2  1995 Spring      2274.  286.       37             0
##  3  1996 Spring      2314.  330.       27             0
##  4  1997 Spring      3341.  354.       34             0
##  5  1998 Spring      4603.  427.       36             0
##  6  1999 Spring      3842.  429.       31             0
##  7  2000 Spring      5658.  476.       33             0
##  8  2001 Spring      2946   404.       35             0
##  9  2002 Autumn      3143.  469.       15             0
## 10  2002 Spring      4030.  493.       38             0
## # ℹ 44 more rows
crane_plot <- ggplot(crane_summary, aes(x = year, y = mean_count, color = season)) + geom_line(size = 1.2) + geom_point(size = 3) + geom_errorbar(aes (ymin = mean_count - se, ymax = mean_count + se), width = 1) + labs(title = "Change of Crane Population over Time by Season", x = "Year", y = "Mean", color = "Season") + theme_pubr() + scale_color_manual(values = c("#8B4513","#B4EEB4")) 
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
crane_plot

crane_plot + facet_wrap(~season)

Part 2 - Main Research Question

Our research question is “Has the crane population at Lake Hornborgasjön changed over the past 30 years?

Part 3 - Exploring the Data (Descriptive Statistics)

To analyze the Spring and Autumn Crane sightings, we observed the mean, and created visuals (scatterplots and histograms) to discuss the patterns shown. For the Spring Observation scatterplot, we can see that there is more fluctuation between the crane sightings. The trend is still linear, but there are shifts in observations. However, when viewing the Autumn Observation scatterplot, we can see a more consistent linear trend. For a better visual of the fluctuation, we created histograms to analyze in depth. Despite the autumn scatterplot having a more consistent trend of crane sightings, the histogram shows a clear outlier. It is not drastic enough to skew data, but it is present. For Spring, the scatterplots show more variability in sightings, but a more normal distribution.

# mean observations
mean(cranes$observations, na.rm = TRUE)
## [1] 6063.494
## pipe data 
springdata <- crane_summary %>%
  filter(season == "Spring")
autumndata <- crane_summary %>%
  filter(season == "Autumn")

springdata
## # A tibble: 31 × 6
##     year season mean_count    se n_counts pct_disrupted
##    <dbl> <chr>       <dbl> <dbl>    <int>         <int>
##  1  1994 Spring      2750.  284.       28             0
##  2  1995 Spring      2274.  286.       37             0
##  3  1996 Spring      2314.  330.       27             0
##  4  1997 Spring      3341.  354.       34             0
##  5  1998 Spring      4603.  427.       36             0
##  6  1999 Spring      3842.  429.       31             0
##  7  2000 Spring      5658.  476.       33             0
##  8  2001 Spring      2946   404.       35             0
##  9  2002 Spring      4030.  493.       38             0
## 10  2003 Spring      5250.  666.       30             0
## # ℹ 21 more rows
autumndata
## # A tibble: 23 × 6
##     year season mean_count    se n_counts pct_disrupted
##    <dbl> <chr>       <dbl> <dbl>    <int>         <int>
##  1  2002 Autumn      3143.  469.       15             0
##  2  2003 Autumn      3334   521.       13             0
##  3  2004 Autumn      2581.  348.       13             0
##  4  2005 Autumn      4348.  722.       17             0
##  5  2006 Autumn      3354.  632.       21             0
##  6  2007 Autumn      3574.  631.       15             0
##  7  2008 Autumn      2568.  355.       17             0
##  8  2009 Autumn      4396.  606.       12             0
##  9  2010 Autumn      3209.  527.       13             0
## 10  2011 Autumn      3810   713.       11             0
## # ℹ 13 more rows
#scatterplot
ggplot(autumndata, aes(x = year, y = mean_count)) + 
  geom_point() + labs(title = "Autumn Crane Observation", x = "Years", y = "Total Cranes Seen") + theme_classic()

ggplot(springdata, aes(x = year, y = mean_count)) + 
  geom_point() + labs(title = "Spring Crane Observation", x = "Years", y = "Total Cranes Seen") + theme_classic()

# histogram

ggplot(autumndata, aes(x = mean_count)) + geom_histogram(bins = 8, fill = "#8B4513", color = "black") + theme_classic()

ggplot(springdata, aes(x = mean_count)) + geom_histogram(bins = 8, fill = "#B4EEB4", color = "black") + theme_classic() 

Part 4 - Statistical Tests (Inferential Statistics)

Before running statistical tests on a data set, it is critical to understanding the data you have to determine which test appropriately fits. To analyze the mean of Spring and Autumn’s data, we created a linear regression model for visualization. We also ran qqplots for each season to assess normality. Lastly, to trust the conclusion from the qqplots, we created histograms of the residuals from those plots. We chose these tests based on our evaluation of the Spring and Autumn histograms. These graphs display a relatively normal distribution, despite some outliers; these outliers are not large enough to skew the data but they do create a tail in both data sets. After the R squared values were obtained, we compared them between Autumn and Spring. For Autumn, we saw that its R-squared value is large (0.7382), with a small p-value (1.526e-07), suggesting statistical significance. However with Spring, we saw a lower R-squared value (0.642), with a lower p-value also (6.134e-08).

# spring linear regression
springregress <- lm(mean_count ~ year, data = springdata)
springregress
## 
## Call:
## lm(formula = mean_count ~ year, data = springdata)
## 
## Coefficients:
## (Intercept)         year  
##   -446502.6        225.3
summary(springregress)
## 
## Call:
## lm(formula = mean_count ~ year, data = springdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3679.1  -728.6   -69.5   761.0  4063.3 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -446502.57   62769.56  -7.113 7.93e-08 ***
## year            225.29      31.24   7.211 6.13e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1556 on 29 degrees of freedom
## Multiple R-squared:  0.642,  Adjusted R-squared:  0.6296 
## F-statistic:    52 on 1 and 29 DF,  p-value: 6.134e-08
# visual plot for spring data
ggplot(springdata, aes(x = year, y = mean_count)) + geom_point() + labs(title = "Spring's Crane Count", x = "Year", y = "Crane Observations") + geom_smooth(method = lm) + stat_cor(method = "pearson")
## `geom_smooth()` using formula = 'y ~ x'

# check for normality 
 ggqqplot(springdata, x = "mean_count")

# analyze residuals 
 hist(resid(springregress))

# autumn linear regression 
autumnregress <- lm(mean_count ~ year, data = autumndata)
autumnregress 
## 
## Call:
## lm(formula = mean_count ~ year, data = autumndata)
## 
## Coefficients:
## (Intercept)         year  
##   -643733.9        322.6
summary(autumnregress)
## 
## Call:
## lm(formula = mean_count ~ year, data = autumndata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1767.0  -942.6  -196.3   902.7  3647.2 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -643733.93   84394.00  -7.628 1.75e-07 ***
## year            322.62      41.92   7.695 1.53e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1334 on 21 degrees of freedom
## Multiple R-squared:  0.7382, Adjusted R-squared:  0.7258 
## F-statistic: 59.22 on 1 and 21 DF,  p-value: 1.526e-07
# visual for autumn data
ggplot(autumndata, aes(x = year, y = mean_count)) + geom_point() + labs(title = "Autumn's Crane Count", x = "Year", y = "Crane Observations") + geom_smooth(method = lm) + stat_cor(method = "pearson")
## `geom_smooth()` using formula = 'y ~ x'

library(ggpubr)

# check for normality
ggqqplot(autumndata, x = "mean_count")

#analyze residuals
hist(resid(autumnregress))

Part 4 - Discussion

Based on the descriptive and inferential statistics, we saw a change in Crane sightings over the course of 30 years. An upward trend of change was present in both our Spring and Autumn sets; however in Spring, there is more shift in sighting numbers, as show in the Spring Observation scatterplot. This could potentially be due to wind patterns being better for crane travel (github.com), or spring coming earlier or later at Lake Hornborgasjön.

Part 5 - Conclusion

Over the past 30 years, we can see a increase of Crane sightings at Lake. Between Autumn and Spring, the observations gradually increase over time, with some months showing lower sightings (Autumn -> p-value: 1.526e-07, Spring -> p-value: 6.134e-08).

Part 6 - References

Eurasian crane. International Crane Foundation. (2025, June 25). https://savingcranes.org/species/eurasian-crane/

How do we track wildlife migrations-and why?. The Nature Conservancy. (n.d.). https://www.nature.org/en-us/what-we-do/our-insights/animal-migrations-why-we-track-them/

We would like to credit Perplexity AI with helping us write a sufficient code to further pipe our crane_summary data into the “autumndata” and “springdata” subsets. This set the foundation for creating our linear models.