Intro

Question: Do U.S. counties in 2019 have a different average coastal flooding exposure than inland flooding exposure?

The data set used is “Climate History Extensions”, published by the U.S. Environmental Protection Agency, on the Data.gov webpage. The full data set houses 64,400 observations and 21 variables. For this analysis, I opted to only use a few. After removing invalid values and unnecessary variables, the cleaned dataset contained 3,220 observations. Each observation represents a U.S county in 2019, either inland or coastal, with flooding exposure, which is recorded quantitatively.

This dataset was taken from the Climate History Extensions dataset published through Data.gov, and the link is as follows: https://catalog.data.gov/dataset/climate-history-extensions.

Data Analysis

I started by loading and compiling all of the base R functions and setting up the data set for analysis. After noticing trends and excess variables, I began to hone in on the variables and data that would be useful for my work. Next, I began to clean the data, using the select function on: county name, year, coastal flooding, and inland flooding exposures. Next, I removed any incorrect recordings, removing the values that were equal to -999. Finally, I compared the mean flooding exposure values in the Box plot and statistic test. These portions helped show general patterns and make the work more digestible before the formal hypothesis test.

Cleaning

flood <- data %>%
  select(NAME, YEAR, `COASTAL FLOODING`, `INLAND FLOODING`) %>%
  rename(
    county = NAME,
    year = YEAR,
    coastal_flooding = `COASTAL FLOODING`,
    inland_flooding = `INLAND FLOODING`
  ) %>%
  filter(inland_flooding != -999, coastal_flooding != -999)

summary(flood)
##     county               year      coastal_flooding  inland_flooding 
##  Length:3220        Min.   :2019   Min.   :  0.000   Min.   :  0.00  
##  Class :character   1st Qu.:2019   1st Qu.:  0.000   1st Qu.: 18.36  
##  Mode  :character   Median :2019   Median :  0.000   Median : 99.96  
##                     Mean   :2019   Mean   :  2.752   Mean   : 72.95  
##                     3rd Qu.:2019   3rd Qu.:  0.000   3rd Qu.: 99.99  
##                     Max.   :2019   Max.   :100.000   Max.   :100.00

Boxplot

boxplot(flood$coastal_flooding, flood$inland_flooding,
        names = c("Coastal Flooding", "Inland Flooding"),
        main = "Coastal vs Inland Flooding Exposure",
        ylab = "Exposure")

## Testing

mean(flood$coastal_flooding)
## [1] 2.751562
mean(flood$inland_flooding)
## [1] 72.94889
t.test(flood$coastal_flooding, flood$inland_flooding, paired = TRUE)
## 
##  Paired t-test
## 
## data:  flood$coastal_flooding and flood$inland_flooding
## t = -91.573, df = 3219, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -71.70035 -68.69431
## sample estimates:
## mean difference 
##       -70.19733

Statistical Analysis

H₀: µcoastal = µinland Hₐ: µcoastal ≠ µinland

The null hypothesis for this project was that the mean coastal flooding exposure was the same as the mean inland flooding exposure. The alternative hypothesis was the opposite, that the means were different.

To test this, a paired t-test was used. The paired option was the most suitable because the flooding measures both originated from the same country observation, basically, the same situation/place. The mean coastal flooding exposure was 2.75, while the mean inland exposure was 72.95. With a significance level of .05, the t-test resulted in a p-value lower than 2.2e-16. Because the p-value was so much smaller than .05, I rejected the null hypothesis, meaning there is a statistically significant difference between coastal flooding exposure and inland flooding exposure in U.S counties during 2019.

Conclusion

The project sook finding whether U.S counties in 2019 had a different average coastal flooding exposure and inland flooding exposure. After analyzing everything, the results show that the inland flooding exposure was much higher on average than the coastal flooding exposure. This is extremely important information to be aware of, seeing as so much of the flooding risk is typically associated with coastal areas, while inland flooding exposure is actually much more common. Furthermore, this means that inland flooding needs to be a more considered hazard; rivers, heavy rain, runoff, and non coastal drainage problems are also prominent issues. To continue the progress of this study, future steps like analyzing the actual amount of destruction compared to the flooding exposure, gaining more insight into the more hazardous type of flooding.

References

U.S. Environmental Protection Agency, Office of Research and Development. Climate History Extensions. Data.gov, 25 Feb. 2023, https://catalog.data.gov/dataset/climate-history-extensions