Introduction Snowfall is important for mountain ecosystems, water supply everywhere in the region, and season-dependent tourism in the US. Changes in snowfall patterns can indicate broader climate changes, with long-term environmental and economic implications. The following research question will be investigated in this project: how has annual snowfall at Paradise, Mt. Rainier National Park changed over time? Through this snow fall historical data analysis, trends and variability in snow falls over several decades are presented.
The snowfall records for Paradise a high-elevation site in Mt. Rainier National Park was used in the study. In order to have full winter seasons, snowfall is measured from 1 July of one year to 30 June of the next year. The dataset has 100 observations and three variables. year_start indicates the first year that the measurement snow fall begins, year_end indicates the year that the measurement finished and snowfall which reports the total measured snowfall in inches for the year. Some years have missing data because of road closures and records that are not available. This especially applies to World War II and the early 1950s. We downloaded this dataset from the National Park Service, a government website. The variables in that dataset would be good to examine long-term snowfall patterns. They also provide the information needed to assess changes in annual snowfall over time.
Data Analyse To answer the question “How has annual snowfall at Paradise, Mt. Rainier National Park changed over time?”, I will (1) import and inspect the dataset, (2) clean and prepare the variables needed for analysis, (3) compute descriptive statistics (overall and by decade) to understand typical snowfall and variability, and (4) create a trend visualization of annual snowfall over time. Finally, I will fit a simple linear regression model to quantify whether snowfall shows an increasing or decreasing pattern across years. This meets the project requirement to include EDA, data-wrangling functions, and at least one table/visualization.
install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.2.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
snow <- read_csv("snowfall.csv")
## Rows: 100 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): year_start, year_end, total_snow
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(snow)
## Rows: 100
## Columns: 3
## $ year_start <dbl> 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929,…
## $ year_end <dbl> 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930,…
## $ total_snow <dbl> 671, 723, 565, 551, 674, 373, 588, 405, 554, 390, 444, 751,…
summary(snow)
## year_start year_end total_snow
## Min. :1920 Min. :1921 Min. : 266.0
## 1st Qu.:1945 1st Qu.:1946 1st Qu.: 543.0
## Median :1970 Median :1970 Median : 624.0
## Mean :1970 Mean :1970 Mean : 638.1
## 3rd Qu.:1994 3rd Qu.:1995 3rd Qu.: 724.0
## Max. :2019 Max. :2020 Max. :1122.0
## NA's :9
# Check missing values (helpful for road-closure gaps mentioned in dataset description)
colSums(is.na(snow))
## year_start year_end total_snow
## 0 0 9
snow_clean <- snow %>%
rename(snowfall = total_snow) %>% # dataset column is named total_snow
filter(!is.na(snowfall)) %>% # remove any missing snowfall rows
mutate(
season = paste0(year_start, "-", year_end), # label winter season
decade = floor(year_end / 10) * 10 # group years into decades
) %>%
select(year_start, year_end, season, decade, snowfall)
glimpse(snow_clean)
## Rows: 91
## Columns: 5
## $ year_start <dbl> 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929,…
## $ year_end <dbl> 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930,…
## $ season <chr> "1920-1921", "1921-1922", "1922-1923", "1923-1924", "1924-1…
## $ decade <dbl> 1920, 1920, 1920, 1920, 1920, 1920, 1920, 1920, 1920, 1930,…
## $ snowfall <dbl> 671, 723, 565, 551, 674, 373, 588, 405, 554, 390, 444, 751,…
# Overall descriptive stats
snow_overall <- snow_clean %>%
summarise(
n = n(),
mean_snow = mean(snowfall),
median_snow = median(snowfall),
sd_snow = sd(snowfall),
min_snow = min(snowfall),
max_snow = max(snowfall)
)
snow_overall
## # A tibble: 1 × 6
## n mean_snow median_snow sd_snow min_snow max_snow
## <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 91 638. 624 164. 266 1122
# By decade
snow_by_decade <- snow_clean %>%
group_by(decade) %>%
summarise(
n = n(),
mean_snow = mean(snowfall),
median_snow = median(snowfall),
sd_snow = sd(snowfall),
min_snow = min(snowfall),
max_snow = max(snowfall),
.groups = "drop"
) %>%
arrange(decade)
snow_by_decade
## # A tibble: 11 × 7
## decade n mean_snow median_snow sd_snow min_snow max_snow
## <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1920 9 567. 565 118. 373 723
## 2 1930 10 545 558 134. 316 751
## 3 1940 6 501. 529 115. 313 661
## 4 1950 5 715 646 164. 602 1000
## 5 1960 10 582. 554. 122. 429 829
## 6 1970 10 758. 734. 231. 414 1122
## 7 1980 10 655. 663 96.8 460 779
## 8 1990 10 712. 698. 165. 499 1032.
## 9 2000 10 684. 704 160. 409 947
## 10 2010 10 649. 698 168. 266 907
## 11 2020 1 530 530 NA 530 530
# Plot annual snowfall over time with a linear trend line
ggplot(snow_clean, aes(x = year_end, y = snowfall)) +
geom_point() +
geom_smooth(method = "lm", se = TRUE) +
labs(
title = "Annual Snowfall at Paradise (Mt. Rainier NP) Over Time",
x = "Year (season ending year)",
y = "Annual snowfall (inches)"
)
## `geom_smooth()` using formula = 'y ~ x'
# Linear regression model to quantify trend
model <- lm(snowfall ~ year_end, data = snow_clean)
summary(model)
##
## Call:
## lm(formula = snowfall ~ year_end, data = snow_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -439.41 -87.51 -9.00 66.08 484.84
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2492.7819 1109.1376 -2.247 0.02708 *
## year_end 1.5872 0.5622 2.823 0.00587 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 157.8 on 89 degrees of freedom
## Multiple R-squared: 0.08219, Adjusted R-squared: 0.07188
## F-statistic: 7.97 on 1 and 89 DF, p-value: 0.005867
An analysis was done of long-term snowfall at Paradise, Mt. Rainier National Park using the National Park Service’s historical data. This study aimed to study whether there is any change in snowfall at this high ure snowy region has changed over the decades by looking at year-wise total snowfall. The descriptive stats and visuals proved there was large variation in snow year on year, which is typical given changing mountain conditions. The linear trend from the linear regression suggests variability in snowfall in winter but there is also evidence of some change in the long run instead of being constant.
As the results showed some variability and may even be showing a trend, this analysis only represents one site and uses only total yearly snowfall. Looking ahead, future research could usefully expand the study to include additional climate variables, such as average winter temperature, precipitation type, and snowpack depth, to help understand the potential drivers of these snowfall changes. By comparing the snowfall at different national parks or the higher and lower elevations of Mt. Rainier, we could also learn about regional weather patterns. A broader analysis using more sophisticated statistical models or more recent data would strengthen conclusions about future climate-induced changes in snowfall patterns.