Emeline Ngassiki

Introduction Snowfall is important for mountain ecosystems, water supply everywhere in the region, and season-dependent tourism in the US. Changes in snowfall patterns can indicate broader climate changes, with long-term environmental and economic implications. The following research question will be investigated in this project: how has annual snowfall at Paradise, Mt. Rainier National Park changed over time? Through this snow fall historical data analysis, trends and variability in snow falls over several decades are presented.

The snowfall records for Paradise a high-elevation site in Mt. Rainier National Park was used in the study. In order to have full winter seasons, snowfall is measured from 1 July of one year to 30 June of the next year. The dataset has 100 observations and three variables. year_start indicates the first year that the measurement snow fall begins, year_end indicates the year that the measurement finished and snowfall which reports the total measured snowfall in inches for the year. Some years have missing data because of road closures and records that are not available. This especially applies to World War II and the early 1950s. We downloaded this dataset from the National Park Service, a government website. The variables in that dataset would be good to examine long-term snowfall patterns. They also provide the information needed to assess changes in annual snowfall over time.

Data Analyse To answer the question “How has annual snowfall at Paradise, Mt. Rainier National Park changed over time?”, I will (1) import and inspect the dataset, (2) clean and prepare the variables needed for analysis, (3) compute descriptive statistics (overall and by decade) to understand typical snowfall and variability, and (4) create a trend visualization of annual snowfall over time. Finally, I will fit a simple linear regression model to quantify whether snowfall shows an increasing or decreasing pattern across years. This meets the project requirement to include EDA, data-wrangling functions, and at least one table/visualization.

install.packages("tidyverse")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.2.0

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

snow <- read_csv("snowfall.csv")

## Rows: 100 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): year_start, year_end, total_snow
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

glimpse(snow)

## Rows: 100
## Columns: 3
## $ year_start <dbl> 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929,…
## $ year_end   <dbl> 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930,…
## $ total_snow <dbl> 671, 723, 565, 551, 674, 373, 588, 405, 554, 390, 444, 751,…

summary(snow)

##    year_start      year_end      total_snow    
##  Min.   :1920   Min.   :1921   Min.   : 266.0  
##  1st Qu.:1945   1st Qu.:1946   1st Qu.: 543.0  
##  Median :1970   Median :1970   Median : 624.0  
##  Mean   :1970   Mean   :1970   Mean   : 638.1  
##  3rd Qu.:1994   3rd Qu.:1995   3rd Qu.: 724.0  
##  Max.   :2019   Max.   :2020   Max.   :1122.0  
##                                NA's   :9

# Check missing values (helpful for road-closure gaps mentioned in dataset description)
colSums(is.na(snow))

## year_start   year_end total_snow 
##          0          0          9

snow_clean <- snow %>%
  rename(snowfall = total_snow) %>%              # dataset column is named total_snow
  filter(!is.na(snowfall)) %>%                   # remove any missing snowfall rows
  mutate(
    season = paste0(year_start, "-", year_end),  # label winter season
    decade = floor(year_end / 10) * 10           # group years into decades
  ) %>%
  select(year_start, year_end, season, decade, snowfall)

glimpse(snow_clean)

## Rows: 91
## Columns: 5
## $ year_start <dbl> 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929,…
## $ year_end   <dbl> 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930,…
## $ season     <chr> "1920-1921", "1921-1922", "1922-1923", "1923-1924", "1924-1…
## $ decade     <dbl> 1920, 1920, 1920, 1920, 1920, 1920, 1920, 1920, 1920, 1930,…
## $ snowfall   <dbl> 671, 723, 565, 551, 674, 373, 588, 405, 554, 390, 444, 751,…

# Overall descriptive stats
snow_overall <- snow_clean %>%
  summarise(
    n = n(),
    mean_snow = mean(snowfall),
    median_snow = median(snowfall),
    sd_snow = sd(snowfall),
    min_snow = min(snowfall),
    max_snow = max(snowfall)
  )

snow_overall

## # A tibble: 1 × 6
##       n mean_snow median_snow sd_snow min_snow max_snow
##   <int>     <dbl>       <dbl>   <dbl>    <dbl>    <dbl>
## 1    91      638.         624    164.      266     1122

# By decade
snow_by_decade <- snow_clean %>%
  group_by(decade) %>%
  summarise(
    n = n(),
    mean_snow = mean(snowfall),
    median_snow = median(snowfall),
    sd_snow = sd(snowfall),
    min_snow = min(snowfall),
    max_snow = max(snowfall),
    .groups = "drop"
  ) %>%
  arrange(decade)

snow_by_decade

## # A tibble: 11 × 7
##    decade     n mean_snow median_snow sd_snow min_snow max_snow
##     <dbl> <int>     <dbl>       <dbl>   <dbl>    <dbl>    <dbl>
##  1   1920     9      567.        565    118.       373     723 
##  2   1930    10      545         558    134.       316     751 
##  3   1940     6      501.        529    115.       313     661 
##  4   1950     5      715         646    164.       602    1000 
##  5   1960    10      582.        554.   122.       429     829 
##  6   1970    10      758.        734.   231.       414    1122 
##  7   1980    10      655.        663     96.8      460     779 
##  8   1990    10      712.        698.   165.       499    1032.
##  9   2000    10      684.        704    160.       409     947 
## 10   2010    10      649.        698    168.       266     907 
## 11   2020     1      530         530     NA        530     530

# Plot annual snowfall over time with a linear trend line
ggplot(snow_clean, aes(x = year_end, y = snowfall)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE) +
  labs(
    title = "Annual Snowfall at Paradise (Mt. Rainier NP) Over Time",
    x = "Year (season ending year)",
    y = "Annual snowfall (inches)"
  )

## `geom_smooth()` using formula = 'y ~ x'

# Linear regression model to quantify trend
model <- lm(snowfall ~ year_end, data = snow_clean)
summary(model)

## 
## Call:
## lm(formula = snowfall ~ year_end, data = snow_clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -439.41  -87.51   -9.00   66.08  484.84 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -2492.7819  1109.1376  -2.247  0.02708 * 
## year_end        1.5872     0.5622   2.823  0.00587 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 157.8 on 89 degrees of freedom
## Multiple R-squared:  0.08219,    Adjusted R-squared:  0.07188 
## F-statistic:  7.97 on 1 and 89 DF,  p-value: 0.005867

An analysis was done of long-term snowfall at Paradise, Mt. Rainier National Park using the National Park Service’s historical data. This study aimed to study whether there is any change in snowfall at this high ure snowy region has changed over the decades by looking at year-wise total snowfall. The descriptive stats and visuals proved there was large variation in snow year on year, which is typical given changing mountain conditions. The linear trend from the linear regression suggests variability in snowfall in winter but there is also evidence of some change in the long run instead of being constant.

As the results showed some variability and may even be showing a trend, this analysis only represents one site and uses only total yearly snowfall. Looking ahead, future research could usefully expand the study to include additional climate variables, such as average winter temperature, precipitation type, and snowpack depth, to help understand the potential drivers of these snowfall changes. By comparing the snowfall at different national parks or the higher and lower elevations of Mt. Rainier, we could also learn about regional weather patterns. A broader analysis using more sophisticated statistical models or more recent data would strengthen conclusions about future climate-induced changes in snowfall patterns.

Emeline Ngassiki

2025-12-14