Part 1 - Introduction

The data for this project originates from the National Park Service (NPS) species records, compiled and shared via the TidyTuesday project. Species richness between Glacier National Park and Grand Canyon National Park is being compared to determine how temperature affects the amount of species. This study focuses on differences in the richness of birds and vascular plants in each park. It is important to look at this because temperature is an important factor for metabolic rates and habitat conditions which affect where different species can live. Temperature variation can significantly affect species distributions. The response variable is the species richness which is measured by the numbers of species observed at each park. The explanatory variable is the average temperature at each park. # Chat GPT told me to add word richness and also I added a little more context about where the data came from.

Part 2 - Main Research Question

“Is there a significant difference in the mean species richness of vascular plants and birds between Glacier Park and Grand Canyon Park?” #Helped me by making our question more focused on our specific test.

Part 3 - Exploring the Data (Descriptive Statistics)

library(dplyr)
library(ggplot2)

#park data

parkspeciesdata <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-10-08/most_visited_nps_species_data.csv')
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 61119 Columns: 28
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (21): ParkCode, ParkName, CategoryName, Order, Family, TaxonRecordStatus...
## dbl  (3): References, Observations, Vouchers
## lgl  (4): Synonyms, ParkAccepted, Sensitive, ExternalLinks
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
counts <- table(parkspeciesdata$CategoryName)
counts
## 
##             Amphibian              Bacteria                  Bird 
##                   235                   376                  4624 
##             Chromista   Crab/Lobster/Shrimp                  Fish 
##                  1040                   264                   633 
##                 Fungi                Insect                Mammal 
##                  5997                 16235                  1107 
##    Non-vascular Plant Other Non-vertebrates              Protozoa 
##                  1898                  1189                   296 
##               Reptile            Slug/Snail       Spider/Scorpion 
##                   384                   425                  4994 
##        Vascular Plant 
##                 21422
table2 <- table(parkspeciesdata$ParkName,parkspeciesdata$CategoryName)
table2
##                                      
##                                       Amphibian Bacteria  Bird Chromista
##   Acadia National Park                       15        0   364         0
##   Bryce Canyon National Park                  4        0   218         0
##   Cuyahoga Valley National Park              24        0   246         0
##   Glacier National Park                       6        0   277         2
##   Grand Canyon National Park                 15        0   456         0
##   Grand Teton National Park                   6        0   266         1
##   Great Smoky Mountains National Park        58      294   267       654
##   Hot Springs National Park                  27       10   387        24
##   Indiana Dunes National Park                24        0   353         0
##   Joshua Tree National Park                   5       23   301         5
##   Olympic National Park                      16        0   310         0
##   Rocky Mountain National Park                5       45   278       150
##   Yellowstone National Park                   9        4   330       204
##   Yosemite National Park                     14        0   270         0
##   Zion National Park                          7        0   301         0
##                                      
##                                       Crab/Lobster/Shrimp  Fish Fungi Insect
##   Acadia National Park                                  0    38     0      0
##   Bryce Canyon National Park                            0     1     0      0
##   Cuyahoga Valley National Park                         8    85     0    227
##   Glacier National Park                                 6    27   276    197
##   Grand Canyon National Park                            2    29     0    125
##   Grand Teton National Park                            11    23    28    155
##   Great Smoky Mountains National Park                 125   110  5243  12398
##   Hot Springs National Park                             9    90     0     15
##   Indiana Dunes National Park                           0    76    69    249
##   Joshua Tree National Park                             0     1    38    342
##   Olympic National Park                                 0    97     0     87
##   Rocky Mountain National Park                         39    12   306    676
##   Yellowstone National Park                            64    19    37   1764
##   Yosemite National Park                                0    10     0      0
##   Zion National Park                                    0    15     0      0
##                                      
##                                       Mammal Non-vascular Plant
##   Acadia National Park                    55                  0
##   Bryce Canyon National Park              76                  0
##   Cuyahoga Valley National Park           47                  0
##   Glacier National Park                   69                404
##   Grand Canyon National Park             107                  0
##   Grand Teton National Park               74                  0
##   Great Smoky Mountains National Park    101               1039
##   Hot Springs National Park               52                 18
##   Indiana Dunes National Park             60                  0
##   Joshua Tree National Park               67                  6
##   Olympic National Park                   79                  0
##   Rocky Mountain National Park            75                416
##   Yellowstone National Park               78                 15
##   Yosemite National Park                  87                  0
##   Zion National Park                      80                  0
##                                      
##                                       Other Non-vertebrates Protozoa Reptile
##   Acadia National Park                                    0        0      11
##   Bryce Canyon National Park                              0        0      13
##   Cuyahoga Valley National Park                          25        0      24
##   Glacier National Park                                   2        0       4
##   Grand Canyon National Park                              1        0      76
##   Grand Teton National Park                               8        0       5
##   Great Smoky Mountains National Park                   993      257      47
##   Hot Springs National Park                              22       12      52
##   Indiana Dunes National Park                             0        0      30
##   Joshua Tree National Park                              10        0      52
##   Olympic National Park                                   0        0       6
##   Rocky Mountain National Park                           48        9       3
##   Yellowstone National Park                              80       18       9
##   Yosemite National Park                                  0        0      22
##   Zion National Park                                      0        0      30
##                                      
##                                       Slug/Snail Spider/Scorpion Vascular Plant
##   Acadia National Park                         0               0           1226
##   Bryce Canyon National Park                   0               0            975
##   Cuyahoga Valley National Park               15               2           1239
##   Glacier National Park                       20               0           1269
##   Grand Canyon National Park                   2             142           1753
##   Grand Teton National Park                   24               1           1645
##   Great Smoky Mountains National Park        291            4630           2163
##   Hot Springs National Park                    2               0           1252
##   Indiana Dunes National Park                  0               1           1622
##   Joshua Tree National Park                    0             153           1314
##   Olympic National Park                        0               0           1352
##   Rocky Mountain National Park                10              22           1119
##   Yellowstone National Park                   61              43           1444
##   Yosemite National Park                       0               0           1683
##   Zion National Park                           0               0           1366
# Data: Temperature

glacier_temp <- data.frame(
  day = 1:31,
  avg_temp = c(32.7, 33.6, 38, 42.8, 39.7, 30.7, 43.6, 49, 34.1, 28, 33.3, 34, 
               29.1, 26.4, 21.6, 29.3, 38.6, 43.1, 47.8, 54, 47.7, 42, 39.7, 41.9, 
               46, 36.9, 34.6, 40, 39.2, 36.2, 33.7)
)

grandcanyon_temp <- data.frame(
  day = 1:31,
  avg_temp = c(48.3, 45.9, 38.4, 43.2, 35.6, 31.8, 35, 38.6, 43.4, 43.3, 43.1, 
               42, 44.7, 47.6, 46.3, 43.8, 48.6, 50.8, 54.3, 55, 54, 50.9, 52.5, 
               54.8, 59.4, 51, 57.5, 52.7, 51.5, 53, 52.1)
)

# Summary Statistics

combined_summary <- bind_rows(
  glacier_temp %>%
    summarise(
      Park = "Glacier",
      n = n(),
      mean_temp = mean(avg_temp, na.rm = TRUE),
      median_temp = median(avg_temp, na.rm = TRUE),
      sd_temp = sd(avg_temp, na.rm = TRUE),
      se_temp = sd_temp / sqrt(n)
    ),
  grandcanyon_temp %>%
    summarise(
      Park = "Grand Canyon",
      n = n(),
      mean_temp = mean(avg_temp, na.rm = TRUE),
      median_temp = median(avg_temp, na.rm = TRUE),
      sd_temp = sd(avg_temp, na.rm = TRUE),
      se_temp = sd_temp / sqrt(n)
    )
)

combined_summary
##           Park  n mean_temp median_temp  sd_temp  se_temp
## 1      Glacier 31  37.65484        38.0 7.297481 1.310666
## 2 Grand Canyon 31  47.39032        48.3 6.903446 1.239896
## Preplexity simplified the data making just the comboned summary instead of making a summary for each park then combining them into the combined summary. 

# Combined dataset for plots

temp_data <- data.frame(
  temp = c(glacier_temp$avg_temp, grandcanyon_temp$avg_temp),
  park = rep(c("Glacier", "Grand Canyon"), each = 31)
)

# Boxplot

ggplot(temp_data, aes(x = park, y = temp, fill = park)) +
  geom_boxplot(width = 0.6, alpha = 0.75, outlier.shape = 21, outlier.fill = "white") +
  geom_jitter(width = 0.12, alpha = 0.55, size = 1.8, color = "black") +
  scale_fill_manual(values = c("Glacier" = "#56B4E9", "Grand Canyon" = "#E69F00")) +
  theme_minimal(base_size = 13) +
  labs(
    title = "Distribution of Average Daily Temperatures",
    subtitle = "Daily temperatures across 31 days in Glacier and Grand Canyon",
    x = "Park",
    y = "Temperature (°F)"
  ) +
  theme(
    legend.position = "none",
    plot.title = element_text(face = "bold")
  )

## Perplexity changed the titles and graphs axes labled and added the points to clean up the graph.

# Histogram

ggplot(temp_data, aes(x = temp)) +
  geom_histogram(bins = 10, fill = "steelblue", color = "black") +
  facet_wrap(~park) +
  theme_minimal() +
  labs(
    title = "Temperature Distribution by Park",
    x = "Temperature (°F)",
    y = "Frequency"
  )

# Biodiversity Data

ggplot(temp_data, aes(x = temp)) +
  geom_histogram(binwidth = 2, fill = "steelblue", color = "white") +
  facet_wrap(~park, ncol = 1) +
  theme_minimal(base_size = 13) +
  labs(
    title = "Temperature Distribution by Park",
    subtitle = "Daily average temperatures across 31 days",
    x = "Temperature (°F)",
    y = "Number of days"
  )

## we didn't agree with the change as ours have a bell curve and the ai change removed that

# Biodiversity Data

biodiversity_data <- data.frame(
  park = c("Glacier", "Glacier", "Grand Canyon", "Grand Canyon"),
  group = c("Birds", "Plants", "Birds", "Plants"),
  count = c(277, 1269, 456, 1753)
)
# Bar Plot

ggplot(biodiversity_data, aes(x = park, y = count, fill = group)) +
  geom_col(position = position_dodge(width = 0.8), width = 0.7, color = "black") +
  theme_minimal(base_size = 13) +
  scale_fill_manual(values = c("Birds" = "#56B4E9", "Plants" = "#E69F00")) +
  labs(
    title = "Comparison of Birds and Vascular Plants",
    subtitle = "Species counts in Glacier and Grand Canyon",
    x = "Park",
    y = "Number of species",
    fill = "Group"
  )

## AI gave us different colors and changed the titles. 

Part 4 - Statistical Tests (Inferential Statistics)

Null: Temperature does not affect species richness between Glacier and Grand Canyon. Alternative: Temperature does affect species richness between Glacier and Grand Canyon.

t_test_result <- t.test(glacier_temp$avg_temp,
                        grandcanyon_temp$avg_temp)
t_test_result
## 
##  Welch Two Sample t-test
## 
## data:  glacier_temp$avg_temp and grandcanyon_temp$avg_temp
## t = -5.396, df = 59.816, p-value = 1.228e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -13.344677  -6.126291
## sample estimates:
## mean of x mean of y 
##  37.65484  47.39032
shapiro.test(glacier_temp$avg_temp)
## 
##  Shapiro-Wilk normality test
## 
## data:  glacier_temp$avg_temp
## W = 0.99292, p-value = 0.9988
shapiro.test(grandcanyon_temp$avg_temp)
## 
##  Shapiro-Wilk normality test
## 
## data:  grandcanyon_temp$avg_temp
## W = 0.96593, p-value = 0.4145

The Welch’s 2 sample t test was used to compare the temperatures between both of the parks because it does not assume equal variances between groups, making it appropriate for environmental data where variability may differ between locations. The Shapiro-Wilk test was used to assess the assumption of normality for each dataset. Both parks showed non-significant results (p > 0.05), which shows that the temperature data were approximately normally distributed and the assumptions for the t-test were met. The results showed a statistically significant difference in mean temperature between the two parks (t = -5.396, df = 59.816, p < 0.001). Glacier Park also had a lower mean compared to the Grand Canyon and the 95% confidence interval is between -13.34°F to -6.13°F which does not include 0 so there is a significant difference.

Part 4 - Discussion

The p value in given in the Welch two sample t test is 1.228e-6, which means that we reject the null hypothesis and conclude that there is a statistically significant difference in mean species richness between Grand Canyon and Glacier National Park. The results indicate that Grand Canyon has a higher number of both bird and vascular plant species compared to Glacier National Park. One possible ecological explanation is the difference in average temperature between the two parks, where Grand Canyon is warmer. Warmer temperatures may reduce the energy birds need for thermoregulation, potentially allowing greater species persistence. Additionally, plants have higher metabolic and photosynthetic rates at higher temperatures, which can increase plant diversity. # Told me to add a sentence at the end and explain better why the temperature affects.

Part 5 - Conclusion

The Null hypothesis was rejected and temperatures does have significant effect on the number of Birds and Vascular Plants in Glacier Park and Grand Canyon Park. Grand Canyon had a higher number of both bird and vascular plant species which suggests an association between temperature differences and species diversity, with warmer conditions potentially supporting greater biodiversity. #Added a sentence about the effect temperature has.

Part 6 - References

Burns, C. E., et al. “Global Climate Change and Mammalian Species Diversity in U.S. National Parks.” Proceedings of the National Academy of Sciences, vol. 100, no. 20, 19 Sept. 2003, pp. 11474–11477, www.pnas.org/content/100/20/11474/, https://doi.org/10.1073/pnas.1635115100.

Fly Aviary. (2024). Swift fliers top predators [Image]. https://www.flyaviary.com/wp-content/uploads/2024/05/swift_fliers_top_predators.jpg

Grand Canyon, AZ weather conditionsstar_ratehome. Weather Underground. (n.d.). https://www.wunderground.com/weather/us/az/grand-canyon East Glacier Park, Mt Weather conditionsstar_ratehome. Weather Underground. (n.d.-a). https://www.wunderground.com/weather/us/mt/east-glacier-park

Grand Canyon, AZ weather conditionsstar_ratehome. Weather Underground. (n.d.). https://www.wunderground.com/weather/us/az/grand-canyon East Glacier Park, Mt Weather conditionsstar_ratehome. Weather Underground. (n.d.-a). https://www.wunderground.com/weather/us/mt/east-glacier-park

We used Claude and Chat GPT to help debug our code. It was used to help us edit our sections and make them more like the rubric.