Code
hist<-read_csv("NABR_historic.csv")
recent <- read_csv("nearterm_data_2020-2024.csv")
combined <- rbind(hist, recent)This data visualization project, which was conducted for the 2024 DSAN Scholarship, explores the relationship between climate, vegetation, and ecosystem health across Utah’s National Parks. By cleaning and aggregating the given datasets, the project provides a comprehensive analysis of climate effects over the span of 30 years. Using advanced data visualizations, we uncover how variations in temperature and precipitation impact vegetation density and drought vulnerability in different regions of Utah’s national parks.
hist<-read_csv("NABR_historic.csv")
recent <- read_csv("nearterm_data_2020-2024.csv")
combined <- rbind(hist, recent)For the data cleaning step, we merged the historical data from NABR_historic.csv with the recent data from nearterm_data_2020-2024.csv. Instead of addressing missing values globally at the outset, we tackled them individually for each specific visualization. This approach prevented the removal of observations that may be crucial for other analyses and visualizations. By handling missing values separately for each visualization, we preserved as much data as possible while ensuring its accuracy. Additionally, we renamed variables to enhance clarity and accessibility.
The objective of this analysis is to gain an overview of each park’s location in Utah and explore their climatic conditions over the years, focusing on precipitation and temperature. To prepare the data for visualization, we performed several aggregations. We used four variables from our dataset: T_Annual, PPT_Annual, longitude (long), and latitude (lat). We aggregated the temperature and precipitation data by calculating the average annual temperature and precipitation for the years 1980 - 2024. This approach allowed us to represent each unique park by a single temperature and precipitation value based on its coordinates.
The purpose of the visualization below is to:
Identify the geographical locations of each park in Utah to understand how the parks are distributed.
Examine the added layers of precipitation and temperature in our visualization to better understand the climate within each park.
Investigate any geographical correlations between climate and location, aiming to clearly delineate the relationships not only between them but also among the climate variables themselves.
# Data Aggregation for the plot
aggregated<-combined %>%
filter(year %in% c(1980,2024)) %>%
select(long,lat,year, T_Annual, PPT_Annual) %>%
drop_na() %>%
mutate(Park_name = paste("Park", as.numeric(as.factor(paste(long, lat))))) %>%
select(Park_name, T_Annual, PPT_Annual, year) %>%
group_by(year, Park_name) %>%
summarise(temp = round(mean(T_Annual, na.rm = TRUE),2),
rain = round(mean(PPT_Annual, na.rm = TRUE),2)) %>%
group_by(Park_name) %>%
filter(n() >= 2) %>%
pivot_longer(cols = !c(Park_name,year), names_to = "variable", values_to = "value") %>%
pivot_wider(names_from = year, values_from = value)
write_csv(aggregated, "rain_temp.csv")In this section, we delve deeper into the connections not only between temperature and precipitation, as analyzed previously, but also explore how each climate variable correlates with ecosystem health. This is measured by factors such as bare ground, herbaceous cover, litter presence, shrub density, tree canopy, and volumetric water content (VWC).
# Data Aggregation for the correlation plot
correlation_data <- combined %>%
group_by(long, lat) %>%
summarise(
T_Annual = round(mean(as.numeric(T_Annual), na.rm = TRUE),2),
PPT_Annual = round(mean(as.numeric(PPT_Annual), na.rm = TRUE),2),
VWC_Summer_whole = round(mean(as.numeric(VWC_Summer_whole), na.rm = TRUE),3),
treecanopy = mean(as.numeric(treecanopy), na.rm = TRUE),
Herb = mean(as.numeric(Herb), na.rm = TRUE),
Shrub = mean(as.numeric(Shrub), na.rm = TRUE),
Litter = mean(as.numeric(Litter), na.rm = TRUE),
Bare = mean(as.numeric(Bare), na.rm = TRUE)
) %>%
ungroup()
relevant_columns <- combined[, c('T_Annual', 'PPT_Annual', "VWC_Summer_whole", 'ExtremeShortTermDryStress_Summer_whole',
'treecanopy', 'Herb', 'Shrub', 'Litter', 'Bare')]
# Selecting the relevant columns
relevant_columns <- correlation_data %>%
select(T_Annual:Bare)
# Dropping NA values
relevant_columns <- drop_na(relevant_columns)
# Correlation matrix and p-values
corr_results <- rcorr(as.matrix(relevant_columns), type = "pearson")
cor_matrix <- corr_results$r
p_values <- corr_results$P
cor_df <- expand.grid(Variable1 = colnames(cor_matrix), Variable2 = colnames(cor_matrix), stringsAsFactors = FALSE)
cor_df$Correlation = as.vector(cor_matrix)
cor_df$P_Value = as.vector(p_values)
significant_correlations <- list()
# Only showing correlations with these variables of interest
variables_of_interest <- c('T_Annual', 'PPT_Annual')
for (var in variables_of_interest) {
temp_df <- cor_df %>%
filter( (Variable1 == var | Variable2 == var), Variable1 != Variable2) %>%
mutate(PairID = pmin(Variable1, Variable2), PairID2 = pmax(Variable1, Variable2)) %>%
distinct(PairID, PairID2, .keep_all = TRUE) %>%
select(-PairID, -PairID2) %>%
arrange(desc(abs(Correlation)))
significant_correlations[[var]] <- temp_df
}
significant_correlations[['PPT_Annual']]<- significant_correlations[['PPT_Annual']] %>%
mutate(Variable1 = ifelse(Variable1 == "PPT_Annual", "T_Annual", Variable1),
Variable2 = ifelse(Variable2 == "T_Annual", "PPT_Annual", Variable2))
correlation_df <- rbind(significant_correlations[['PPT_Annual']], significant_correlations[['T_Annual']])
# correlation plot
cor_plot <- ggplot(correlation_df, aes(x = Variable1, y = Variable2, fill = Correlation, label = round(Correlation, 2))) +
geom_tile(color = "white") +
geom_text(color = "black", size = 3) +
scale_fill_gradient2(low = "blue", high = "red", mid = "white", midpoint = 0, limit = c(-1,1), space = "Lab", name="Correlation") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
plot.title = element_text(face = "bold")) + # Make title bold
labs(title = "Correlation Analysis of Climate, Vegetation, and Soil Variables in \nUtah National Parks",
x = "Soil and Vegetation Variables",
y = "Annual Climate Variables")
print(cor_plot)The correlation plot above shows how annual temperature (T_Annual) and annual precipitation (PPT_Annual) are associated with ecosystem health indicators across Utah National Parks. Our analysis reveals that:
The annual temperature (T_Annual) exhibits the highest correlation with precipitation (-0.75), as depicted in the previous section Plot 4, which reinforces their strong inverse relationship. Additionally, there is a strong negative correlation with volumetric water content in summer (-0.72), indicating a reduction in soil water content as temperature rises. Furthermore, although not highly significant, temperature also correlates negatively with litter (-0.33) and shrub (-0.19), and positively with bare (0.26), suggesting decreased vegetation and plant waste as temperatures increase.
On the other hand, Precipitation (PPT_Annual), shows a strong positive correlation with volumetric water content (VWC) during the summer (0.67), indicating that higher precipitation likely increases soil moisture. There’s also a moderate positive correlation with litter (0.32) and negative correlation with bare (-0.25), suggesting increased vegetation and litter presence with more rainfall.
As we delve into the connections among climate, vegetation, and soil variables, our analysis shifts to examining individual parks to determine which ones are susceptible to drought and which ones are flourishing. Once we identify these parks, our next objective is to comprehend the disparities in their climate, vegetation, and soil indicators.
To acheive this we focused on selecting the top three and bottom three parks in terms of their vegetation profile. To select these observations, we derived a field called vegetation density. This calculated field was calculated by summing the percentage coverage values of tree canopy,herbaceous plants, and shrubs for each park. This composite metric integrates multiple aspects of vegetation to provide a comprehensive measure of vegetation density, reflecting the overall green cover within each park.
After computing the vegetation density, we organized the data to identify parks with the highest and lowest values. Specifically, we sorted the dataset in ascending order to pinpoint the three parks with the sparsest vegetation, termed as having the “worst” vegetation density. Conversely, by sorting in descending order, we identified the three parks with the densest vegetation, labeled as having the “best” vegetation density.
# new calculated field called vegetation density
correlation_data$vegetation_density <- correlation_data$treecanopy + correlation_data$Herb + correlation_data$Shrub
# finding the worst vegetation
worst_vegetation <- correlation_data[order(correlation_data$vegetation_density), ]
worst_vegetation <- worst_vegetation[1:3, ]
worst_vegetation <- worst_vegetation %>%
mutate(vegetation_status = "worst")
# finding the best vegetation
best_vegetation <- correlation_data[order(-correlation_data$vegetation_density), ]
best_vegetation <- best_vegetation[1:3, ]
best_vegetation<- best_vegetation %>%
mutate(vegetation_status = "best")
worst_best<- rbind(worst_vegetation,best_vegetation)
write_csv(worst_best, "worst_best.csv")
worst_vegetation_table <- kable(worst_vegetation, "html", caption = "Bottom 3 Worst Vegetation Parks")
worst_vegetation_table| long | lat | T_Annual | PPT_Annual | VWC_Summer_whole | treecanopy | Herb | Shrub | Litter | Bare | vegetation_density | vegetation_status |
|---|---|---|---|---|---|---|---|---|---|---|---|
| -110.0022 | 37.60915 | 12.19 | 7.67 | 0.074 | 0 | 1 | 0 | 3 | 91 | 1 | worst |
| -109.9669 | 37.62446 | 11.74 | 8.43 | 0.110 | 0 | 1 | 0 | 1 | 93 | 1 | worst |
| -109.9940 | 37.62499 | 11.37 | 7.53 | 0.072 | 0 | 2 | 0 | 3 | 92 | 2 | worst |
best_vegetation_table <- kable(best_vegetation, "html", caption = "Top 3 Best Vegetation Parks")
best_vegetation_table| long | lat | T_Annual | PPT_Annual | VWC_Summer_whole | treecanopy | Herb | Shrub | Litter | Bare | vegetation_density | vegetation_status |
|---|---|---|---|---|---|---|---|---|---|---|---|
| -109.9731 | 37.63080 | 11.13 | 8.96 | 0.087 | 29 | 3 | 42 | 26 | 11 | 74 | best |
| -109.9882 | 37.62736 | 10.66 | 8.76 | 0.089 | 27 | 12 | 32 | 18 | 26 | 71 | best |
| -110.0033 | 37.61707 | 12.36 | 7.23 | 0.070 | 25 | 12 | 30 | 21 | 27 | 67 | best |
After looking into the top three and bottom three parks, we now try to go even further by taking the bottom-most park with the highest drought vulnerability to analyze how this particular park historically performed in terms of drought vulnerability. The way we analyze drought was by using the PPT_Annual variable which measures the average annual precipitation for each park. We then focused on the park with the least precipitation levels in 2024 identifying to be at latitude 37.60440 and longitude -110.0376. Our goal was to see how this park performed historically to capture any patterns that might have led to it having the lowest Annual Precipitation in 2024. To do so, we computed the average annual precipitation for this specific park over the years from 1980 - 2024 (PPT_sample_park) and compared it against the average annual precipitation across all parks (PPT_avg_park). This comparative analysis helps us understand the historical drought vulnerability of the park by highlighting how often its precipitation levels fell below the regional average, thereby helping to illustrate its susceptibility to drought conditions.
data <- combined %>%
group_by(year, lat, long) %>%
summarise(PPT_annual_avg = round(mean(as.numeric(PPT_Annual), na.rm = TRUE),2)) %>%
drop_na()
lowest <-data %>%
filter(lat == 37.60440 & long== -110.0376) %>%
rename(PPT_sample_park = PPT_annual_avg)
avg <- data %>%
group_by(year) %>%
summarise(PPT_annual_avg = round(mean(as.numeric(PPT_annual_avg), na.rm = TRUE),2)) %>%
rename(PPT_avg_park = PPT_annual_avg)
low_avg_data <-left_join(lowest, avg, by = "year")
write_csv(low_avg_data, "low_avg_data.csv")The analysis reveals significant environmental trends and disparities within Utah’s National Parks. Parks with the densest vegetation typically show higher levels of precipitation and lower temperatures, contributing to robust ecosystem health. In contrast, parks identified with the lowest vegetation density suffer from higher temperatures and reduced moisture levels, showing their susceptibility to drought conditions. Particularly, the park located at latitude 37.60440 and longitude -110.0376 was pinpointed as having historical drought vulnerabilities due to consistent low precipitation levels compared to regional averages. This project not only enhances our understanding of ecological patterns across Utah’s National Parks but also emphasizes the critical interplay of climate factors in shaping natural landscapes. Through this scholarship competition, these findings hope to contribute valuable insights into ongoing environmental assessments.