INFO 201 Final Project

Guiding Question: How do rising temperatures due to climate change affect rates of natural disasters in the United States?

Author

Millie Castoldi; Alec Schlenker; Samyak Shrestha

Published

June 1, 2025

Data Set 1: annual US temperatures by state

The first dataset represents the average temperature for each US state for each year from 1895 - present. Dataset was sourced from the Washington Post, and is available to download on GitHub.

annual_temperatures_state <- read.csv("climatechangeannual.csv")
annual_temperatures_state |> 
  slice(1 : 5)
  fips year     temp    tempc
1    1 1895 61.64167 16.46759
2    1 1896 64.26667 17.92593
3    1 1897 64.19167 17.88426
4    1 1898 62.98333 17.21296
5    1 1899 63.10000 17.27778

The first important change to this dataset was summarizing the average temperature across the whole US for each year. We achieved this by grouping the dataset by year, then taking the mean for all available temperatures for the year (ignoring NA values).

avg_us_temps <- annual_temperatures_state |>
  group_by(year) |>
  summarize(avg_temp = mean(temp, na.rm = TRUE))

Visualization 1: change in average US temperature over time

The first visualization was necessary to see how the temperature in the US has changed, as a way to visualize the general trend in the data. What we found was that the average temperature in the US has experienced an increase of roughly 2 degrees Farenheight since 1895.

library("ggplot2")
library("gganimate")
#| fig.alt: "Line graph showing the average us temperature for each year from 1895 - present shows slight increase in average US temperature."
avg_us_temps |>
  ggplot(aes(x = year, y = avg_temp)) +
  geom_line() + 
  labs(title = "The Average Annual US Temperatature has increased",
       x = "Year",
       y = "Average Temperature (°F)", 
       caption = "Data available at: https://github.com/washingtonpost/data-2C-beyond-the-limit-usa/",
       subtitle = "Data for annual average temperature for each US state from 1895 - present.") + 
       coord_cartesian(ylim = c(45, 60)) +

  theme_minimal() + 
  transition_reveal(year)
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?

Data Set 2: Known US natural disaster occurrences

this dataset includes all known natural disasters from 1900 - present, including the disaster type, year the disaster began, and US state and region in which it occurred. Dataset was sourced from the International Disaster Database, and can be found here.

natural_disasters_annual <- read.csv("public_emdat_custom_request_2025-02-18_4a60a03f-8071-45fb-8af8-54229b55f39e.csv")

natural_disasters_annual |>
slice(1)
         DisNo. Historic Classification.Key Disaster.Group Disaster.Subgroup
1 1900-0003-USA      Yes    nat-met-sto-tro        Natural    Meteorological
  Disaster.Type Disaster.Subtype External.IDs Event.Name ISO
1         Storm Tropical cyclone                         USA
                   Country        Subregion   Region          Location Origin
1 United States of America Northern America Americas Galveston (Texas)       
          Associated.Types OFDA.BHA.Response Appeal Declaration
1 Avalanche (Snow, Debris)                No     No          No
  AID.Contribution...000.US.. Magnitude Magnitude.Scale Latitude Longitude
1                          NA       220             Kph       NA        NA
  River.Basin Start.Year Start.Month Start.Day End.Year End.Month End.Day
1                   1900           9         8     1900         9       8
  Total.Deaths No..Injured No..Affected No..Homeless Total.Affected
1         6000          NA           NA           NA             NA
  Reconstruction.Costs...000.US.. Reconstruction.Costs..Adjusted...000.US..
1                              NA                                        NA
  Insured.Damage...000.US.. Insured.Damage..Adjusted...000.US..
1                        NA                                  NA
  Total.Damage...000.US.. Total.Damage..Adjusted...000.US..      CPI
1                   30000                           1098720 2.730451
  Admin.Units Entry.Date Last.Update
1               10/18/04    10/17/23

The first important change made to this dataset was to change the feature-name of the year the disaster began from “Start.Year” to just year. This was necessary so that we could later join this dataset with the annual temperatures dataset, using the key feature “year.”

natural_disasters_annual <- natural_disasters_annual |>
rename(year = Start.Year)

The next important step was to create a summary dataset including the counts of each disaster for each year, so that we could visualize how the number of occurrences for each type of disaster changed throughout each year from 1900 - present, and analyze the correlation between the changes in rates of disasters and changes in temperature. The first step was to group the dataset by year and disaster-type, and the next step was to create a count for each observation that we grouped by, using the n() function, which we learned how to use via Statology. We opted to use a summary df rather than mutating the natural_disasters_annual dataframe because only a few of the features within the dataframe were necessary for our project’s needs, so it was easier for us to visualize the frame with only the three features we were using (year, disaster type, and count).

disaster_counts <- natural_disasters_annual|>
  group_by(year, Disaster.Type)|>
  summarize(Count = n())
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
disaster_counts |>
  slice(1)
# A tibble: 110 × 3
# Groups:   year [110]
    year Disaster.Type       Count
   <int> <chr>               <int>
 1  1900 Storm                   1
 2  1903 Flood                   2
 3  1905 Mass movement (dry)     1
 4  1906 Earthquake              1
 5  1908 Mass movement (dry)     1
 6  1909 Flood                   1
 7  1910 Mass movement (wet)     1
 8  1911 Wildfire                1
 9  1912 Storm                   1
10  1913 Storm                   1
# ℹ 100 more rows

Visualization 2: number of occurrences of disasters in the US each year from 1900 - present

The second visualization we created was to see how the number and types of disasters that occur every year have changed, and the trend we found was that there has been an increase in the number of disasters every year, increasing from approximately 1 to approsimately 10. The most increases were visible in wildfires and floods. With this dataset, it is especially to important to acknowledge the potential for missing data leading to a bias in the result. The data for the early 1900s is certainly not as comprehensive as modern data, and likely omits a lot of disasters that aren’t on record. However, the trend is apparent enough that it is likely still significant, even taking into account potential implicit missing data.

library("ggplot2")
#| fig.alt: "Column graph showing the number of occurrences of specific disasters for each year from 1900 - present shows increase in annual number of disasters."
disaster_counts |>
  filter(Disaster.Type != "Storm", Disaster.Type != "Epidemic", Disaster.Type != "Mass movement (dry)", Disaster.Type != "Mass movement (wet)") |>
  ggplot(aes(x = year, y = Count, fill = Disaster.Type)) +
  geom_col() +                   
  labs(title = "Rates of Natural Disaster Occurrences have increased in the US",
       x = "Year",
       y = "Number of Disasters",
       fill = "Disaster Type", 
       subtitle = "Data for all known natural disasters in the US from 1900 - Present",
       caption = "Data available at: https://public.emdat.be/data") +
  theme_minimal()     

Map of the US: Interactive map indicates where natrual disaaters have occured in the US

We created this interactive map to show where natural disasters have happened the most in the US. Each red buble on the map indicates the type of disaster, the year it occured, and how many people were affected. That being said, the map highlights that wildfires are more common in dry and hot states like California and Arizona, while floods happen more in the Midwest and Southeast where heavy rain and hurricanes occur. This ties back to Visualization 2, where it showed an increase in wildfires and floods over the past hundred and twenty years, suggesting that rising temperatures may be a contributor to the increase in the frequency of disasters.

library(leaflet)  # For interactive maps
library(dplyr)    # For data filtering and manipulation
library("ggplot2")
#| fig.alt: "Interactive bubble map of the US displaying red bubble markers that highlight disaster type, year, and total affected, indicating regional patterns such as wildfires in the Southwest and floods in the Midwest and Southeast."

# Load the dataset
df <- read_csv("public_emdat_custom_request_2025-02-18_4a60a03f-8071-45fb-8af8-54229b55f39e.csv")
Rows: 1345 Columns: 46
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (24): DisNo., Historic, Classification Key, Disaster Group, Disaster Sub...
dbl (21): Magnitude, Latitude, Longitude, Start Year, Start Month, Start Day...
lgl  (1): AID Contribution ('000 US$)

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Filter for US disasters with valid locations
df_us <- df |>  
  filter(ISO == "USA") |>  # Keep only US disasters
  mutate(
    Total_Affected = as.numeric(`Total Affected`),  # Convert to numeric
    Bubble_Size = sqrt(Total_Affected) / 500  # Scale bubble size
  )

# Create the interactive map
df_us |> 
  leaflet() |>  # Start the map
  setView(lng = -98.5795, lat = 39.8283, zoom = 4) |>  # Center on the US
  addTiles() |>  # Add background map
  addCircleMarkers(  # Add disaster markers
    lng = ~Longitude, lat = ~Latitude,  
    radius = ~ifelse(is.na(Bubble_Size), 5, Bubble_Size),  # Set bubble size
    color = "red", fill = TRUE, fillOpacity = 0.5,  # Style the bubbles
    popup = ~paste(  # Show disaster details on click
      "<b>Disaster:</b>", `Disaster Type`, "<br>",  
      "<b>Year:</b>", `Start Year`, "<br>",  
      "<b>Total Affected:</b>", `Total Affected`
    )
  )
Warning in validateCoords(lng, lat, funcName): Data contains 1087 rows with
either missing or invalid lat/lon values and will be ignored

Joining the two data frames

Although it wasn’t a requirement to join the two data frames for this project, being able to compare natural disaster occurrences and temperatures was central to our guiding question, and was fairly easily accomplished with a simple full join using the common key feature of the ‘year.’

joint_dfs <- avg_us_temps |>
  full_join(disaster_counts, join_by(year))

Visualization 3: Assessing correlation between Average Temperature and Number of Disasters in the US

library("ggplot2")
#| fig.alt: "Scatterplot with trendline of average US temperature and number of disasters in the US shows slight positie linear correlation between the two variables."
joint_dfs |>
  ggplot(aes(x = avg_temp, y = Count)) +
  geom_point() +
  geom_smooth(method = "lm", color = "red") +
  coord_cartesian(xlim = c(50, 55)) + 
  labs(title = "There is a Positive Linear Correlation Between Temperature and Disaster Rates in the US", 
       x = "Average Temperature (US)",
       y = "Number of Disasters (US)", 
       caption = "Data available at: https://public.emdat.be/data, https://github.com/washingtonpost/data-2C-beyond-the-limit-usa/", 
       subtitle = "Data for all known natural disasters in the US from 1900 - Present; Data for annual average temperature for each US state from 1895 - present."
  )
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 47 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 47 rows containing missing values or values outside the scale range
(`geom_point()`).

determining the correlation coefficient for visualization 3

the correlation coefficient determines the strength of the linear relationship between two variables, and is expressed as a number between -1 and 0 (negative correlation) or 0 and 1 (positive correlation). the closer a the correlation coefficient is to 1, the stronger the relationship is. Our correlation coefficient was 0.25, indicating a weak linear relationship. Steps on how to calculate the correlation coefficient for the variables were found here.

    correlation_coefficient <- cor(joint_dfs$avg_temp, joint_dfs$Count, use = "pairwise.complete.obs")
    print(correlation_coefficient)
[1] 0.2527882

Conclusions

Analysis of all the datasets allowed us to come to conclusions about the general trends and implications of the data. The first dataset we analyzed was the Washington Post’s Annual Average Temperature for each state. The first visualization we created was an animated line graph that plotted the average temperature for each year. The trend we found in this data was that from 1895 until now there has been a slight increase in the average annual temperature in the US (of about 2 degrees farenheight). This information was expected, and is consistent with our background knowledge of climate change and rising temperatures. The second data set we analyzed was the International Disaster Database’s set of known natural disasters from 1900 - present, including the disaster type, year the disaster began, and US state and region in which it occurred. The visualization we created with this dataset was a column graph showing the number of natural disasters for each year from 1900-present, filled with the unique type of disaster. The trend we found in this data was that the number of disasters in the US for each year has increased significantly since 1900 (from approximately 1 each year to 10). We also saw trends in the types of disasters that have experienced the biggest increases, being wildfires and floods. This trend was interesting to find, because we weren’t initially sure if there was an increase in natural disasters. We do, however, recognize that much of this trend can likely be contributed to improvements in reporting of disasters from 1900 - present. The final analysis we did was after we joined both datasets by the year. The visualization we made was a scatterplot mapping number of disasters against average temperature. This was the most important visualization to our guiding question, as it allowed us to determine whether there was actually a correlation between these two variables. Our visualization showed a slight positive correlation, indicating that as temperatures increase, the number of disasters does increase. Along with the visualization, we calculated the correlation coefficient of the two variables, and found that it was roughly 0.25. This indicates that the positive relationship does exist, but it is not strong enough to be statistically significant. Overall, we found that there were increasing trends in both datasets, but that the correlation between the two was not strong enough to be statistically significant.

INFO 201 Group Project Narrative

This report analyzes U.S. temperature data (1895–present) from The Washington Post and natural disaster data (1900–present) from the International Disaster Database. Key variables include disaster type, location, year, temperature, and state codes (FIPS). Data processing included removing non-U.S. records, standardizing temperatures to Celsius, and aligning state-level data. Limitations include inconsistencies in historical temperature records and potential biases in disaster reporting, which may be higher in densely populated areas. Ethical concerns arise from disparities in disaster preparedness and recovery efforts, which may skew analysis. Some of our key questions and findings are the following: Do states with higher temperature changes experience more natural disasters? In this it is a visualization of a scatterplot that suggests a positive correlation between warming trends and disaster frequency. The second question that we are answering is is there a link between rising temperatures and disaster rates? This kind of data that is based off of time shows an increase in wildfires and hurricanes alongside temperature rises. The last question that we are answering is, Have certain disasters increased more than others with climate change? In this data it answers how wildfires and hurricanes show the sharpest rise, while earthquakes remain unaffected. In terms of our visualizations we made a line graph that shows that the average annual temperature in the US has increased slightly from 1895 to the present, with some fluctuations over time. The next visualization that we created was a bar chart revealing a significant rise in the frequency of natural disasters in the US since 1900, especially after 1975, with various disaster types represented. We also included a scatter plot demonstrating a positive correlation between rising average temperatures and an increase in natural disaster occurrences in the US. In addition to this, there is a bubble map which highlights disaster prone regions that have a very significant change in temperatures. There are four data sources that look at similar issues, starting with NOAA Climate Data Records, https://www.ncdc.noaa.gov/, which tracks billion-dollar disasters and climate trends. The NASA Earth Observatory: https://earthobservatory.nasa.gov/, has information on satellite climate monitoring. Lastly, the fourth additional data source that looks at a similar issue is the USGS Earthquake Data: https://earthquake.usgs.gov/, which publishes research on climate trends and disaster impacts.