Eurasian Cranes (Grus grus) are native to large regions extending from Northern and Western Europe, specifically across Eurasia, going further to Northern China. They are a medium sized bird species with a striking wing span of 7 ft (210 cm). In our data set, the cranes population over the course of 30 years is being analyzed.These cranes stop at Lake Hornborgasjön, Sweden during their yearly migration. Our response variable (year), in which the cranes are spotted, ranges between Spring and Autumn from 1994 to 2024. Our explanatory variable (crane observations), fluctuates over this period. Tracking bird population and their migration routes is essential for preserving critical habitats and conservation.
cranes <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-09-30/cranes.csv')
## Rows: 1548 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): comment
## dbl (1): observations
## lgl (1): weather_disruption
## date (1): date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#knitr::include_graphics('what-is-a-group-of-cranes-called.png')
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggpubr)
library(dplyr)
library(ggplot2)
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
##
## The following object is masked from 'package:dplyr':
##
## recode
##
## The following object is masked from 'package:purrr':
##
## some
#Install crane data
cranes <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-09-30/cranes.csv')
## Rows: 1548 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): comment
## dbl (1): observations
## lgl (1): weather_disruption
## date (1): date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
is.factor(cranes$weather_disruption)
## [1] FALSE
cranes$weather_disruption <-as.factor(cranes$weather_disruption)
#summarize & mutate the data into a new dataframe based on seasons and year and average cranes counted
crane_summary <- cranes %>%
filter(!is.na(observations)) %>%
mutate(
year = lubridate::year(date), #extract year and month from dates
month = lubridate::month(date),
season = if_else(month <= 6, "Spring", "Autumn") #assign months into spring migration or fall migration
) %>%
group_by(year, season) %>% # group by both sseasons and year
summarise(
mean_count = mean(observations), #calculate means
se = sd(observations) / sqrt(n()), # caulcate standard error
n_counts = n(), #number of days that the cranes were counted in each season by year combo
pct_disrupted = sum(weather_disruption == TRUE), # number of weather events
.groups = "drop"
)
crane_summary #view your new summarized dataframe
## # A tibble: 54 × 6
## year season mean_count se n_counts pct_disrupted
## <dbl> <chr> <dbl> <dbl> <int> <int>
## 1 1994 Spring 2750. 284. 28 0
## 2 1995 Spring 2274. 286. 37 0
## 3 1996 Spring 2314. 330. 27 0
## 4 1997 Spring 3341. 354. 34 0
## 5 1998 Spring 4603. 427. 36 0
## 6 1999 Spring 3842. 429. 31 0
## 7 2000 Spring 5658. 476. 33 0
## 8 2001 Spring 2946 404. 35 0
## 9 2002 Autumn 3143. 469. 15 0
## 10 2002 Spring 4030. 493. 38 0
## # ℹ 44 more rows
crane_plot <- ggplot(crane_summary, aes(x = year, y = mean_count, color = season)) + geom_line(size = 1.2) + geom_point(size = 3) + geom_errorbar(aes (ymin = mean_count - se, ymax = mean_count + se), width = 1) + labs(title = "Change of Crane Population over Time by Season", x = "Year", y = "Mean", color = "Season") + theme_pubr() + scale_color_manual(values = c("#8B4513","#B4EEB4"))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
crane_plot
crane_plot + facet_wrap(~season)
Our research question is “Has the crane population at Lake Hornborgasjön changed over the past 30 years?
To analyze the Spring and Autumn Crane sightings, we observed the mean, and created visuals (scatterplots and histograms) to discuss the patterns shown. For the Spring Observation scatterplot, we can see that there is more fluctuation between the crane sightings. The trend is still linear, but there are shifts in observations. However, when viewing the Autumn Observation scatterplot, we can see a more consistent linear trend. For a better visual of the fluctuation, we created histograms to analyze in depth. Despite the autumn scatterplot having a more consistent trend of crane sightings, the histogram shows a clear outlier. It is not drastic enough to skew data, but it is present. For Spring, the scatterplots show more variability in sightings, but a more normal distribution.
# mean observations
mean(cranes$observations, na.rm = TRUE)
## [1] 6063.494
## pipe data
springdata <- crane_summary %>%
filter(season == "Spring")
autumndata <- crane_summary %>%
filter(season == "Autumn")
springdata
## # A tibble: 31 × 6
## year season mean_count se n_counts pct_disrupted
## <dbl> <chr> <dbl> <dbl> <int> <int>
## 1 1994 Spring 2750. 284. 28 0
## 2 1995 Spring 2274. 286. 37 0
## 3 1996 Spring 2314. 330. 27 0
## 4 1997 Spring 3341. 354. 34 0
## 5 1998 Spring 4603. 427. 36 0
## 6 1999 Spring 3842. 429. 31 0
## 7 2000 Spring 5658. 476. 33 0
## 8 2001 Spring 2946 404. 35 0
## 9 2002 Spring 4030. 493. 38 0
## 10 2003 Spring 5250. 666. 30 0
## # ℹ 21 more rows
autumndata
## # A tibble: 23 × 6
## year season mean_count se n_counts pct_disrupted
## <dbl> <chr> <dbl> <dbl> <int> <int>
## 1 2002 Autumn 3143. 469. 15 0
## 2 2003 Autumn 3334 521. 13 0
## 3 2004 Autumn 2581. 348. 13 0
## 4 2005 Autumn 4348. 722. 17 0
## 5 2006 Autumn 3354. 632. 21 0
## 6 2007 Autumn 3574. 631. 15 0
## 7 2008 Autumn 2568. 355. 17 0
## 8 2009 Autumn 4396. 606. 12 0
## 9 2010 Autumn 3209. 527. 13 0
## 10 2011 Autumn 3810 713. 11 0
## # ℹ 13 more rows
#scatterplot
ggplot(autumndata, aes(x = year, y = mean_count)) +
geom_point() + labs(title = "Autumn Crane Observation", x = "Years", y = "Total Cranes Seen") + theme_classic()
ggplot(springdata, aes(x = year, y = mean_count)) +
geom_point() + labs(title = "Spring Crane Observation", x = "Years", y = "Total Cranes Seen") + theme_classic()
# histogram
ggplot(autumndata, aes(x = mean_count)) + geom_histogram(bins = 8, fill = "#8B4513", color = "black") + theme_classic()
ggplot(springdata, aes(x = mean_count)) + geom_histogram(bins = 8, fill = "#B4EEB4", color = "black") + theme_classic()
Before running statistical tests on a data set, it is critical to understanding the data you have to determine which test appropriately fits. To analyze the mean of Spring and Autumn’s data, we created a linear regression model for visualization. We also ran qqplots for each season to assess normality. Lastly, to trust the conclusion from the qqplots, we created histograms of the residuals from those plots. We chose these tests based on our evaluation of the Spring and Autumn histograms. These graphs display a relatively normal distribution, despite some outliers; these outliers are not large enough to skew the data but they do create a tail in both data sets. After the R squared values were obtained, we compared them between Autumn and Spring. For Autumn, we saw that its R-squared value is large (0.7382), with a small p-value (1.526e-07), suggesting statistical significance. However with Spring, we saw a lower R-squared value (0.642), with a lower p-value also (6.134e-08).
# spring linear regression
springregress <- lm(mean_count ~ year, data = springdata)
springregress
##
## Call:
## lm(formula = mean_count ~ year, data = springdata)
##
## Coefficients:
## (Intercept) year
## -446502.6 225.3
summary(springregress)
##
## Call:
## lm(formula = mean_count ~ year, data = springdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3679.1 -728.6 -69.5 761.0 4063.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -446502.57 62769.56 -7.113 7.93e-08 ***
## year 225.29 31.24 7.211 6.13e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1556 on 29 degrees of freedom
## Multiple R-squared: 0.642, Adjusted R-squared: 0.6296
## F-statistic: 52 on 1 and 29 DF, p-value: 6.134e-08
# visual plot for spring data
ggplot(springdata, aes(x = year, y = mean_count)) + geom_point() + labs(title = "Spring's Crane Count", x = "Year", y = "Crane Observations") + geom_smooth(method = lm) + stat_cor(method = "pearson")
## `geom_smooth()` using formula = 'y ~ x'
# check for normality
ggqqplot(springdata, x = "mean_count")
# analyze residuals
hist(resid(springregress))
# autumn linear regression
autumnregress <- lm(mean_count ~ year, data = autumndata)
autumnregress
##
## Call:
## lm(formula = mean_count ~ year, data = autumndata)
##
## Coefficients:
## (Intercept) year
## -643733.9 322.6
summary(autumnregress)
##
## Call:
## lm(formula = mean_count ~ year, data = autumndata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1767.0 -942.6 -196.3 902.7 3647.2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -643733.93 84394.00 -7.628 1.75e-07 ***
## year 322.62 41.92 7.695 1.53e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1334 on 21 degrees of freedom
## Multiple R-squared: 0.7382, Adjusted R-squared: 0.7258
## F-statistic: 59.22 on 1 and 21 DF, p-value: 1.526e-07
# visual for autumn data
ggplot(autumndata, aes(x = year, y = mean_count)) + geom_point() + labs(title = "Autumn's Crane Count", x = "Year", y = "Crane Observations") + geom_smooth(method = lm) + stat_cor(method = "pearson")
## `geom_smooth()` using formula = 'y ~ x'
library(ggpubr)
# check for normality
ggqqplot(autumndata, x = "mean_count")
#analyze residuals
hist(resid(autumnregress))
Based on the descriptive and inferential statistics, we saw a change in Crane sightings over the course of 30 years. An upward trend of change was present in both our Spring and Autumn sets; however in Spring, there is more shift in sighting numbers, as show in the Spring Observation scatterplot. This could potentially be due to wind patterns being better for crane travel (github.com), or spring coming earlier or later at Lake Hornborgasjön.
Over the past 30 years, we can see a increase of Crane sightings at Lake. Between Autumn and Spring, the observations gradually increase over time, with some months showing lower sightings (Autumn -> p-value: 1.526e-07, Spring -> p-value: 6.134e-08).
Eurasian crane. International Crane Foundation. (2025, June 25). https://savingcranes.org/species/eurasian-crane/
How do we track wildlife migrations-and why?. The Nature Conservancy. (n.d.). https://www.nature.org/en-us/what-we-do/our-insights/animal-migrations-why-we-track-them/
We would like to credit Perplexity AI with helping us write a sufficient code to further pipe our crane_summary data into the “autumndata” and “springdata” subsets. This set the foundation for creating our linear models.