Question: Do U.S. counties in 2019 have a different average coastal flooding exposure than inland flooding exposure?
The data set used is “Climate History Extensions”, published by the U.S. Environmental Protection Agency, on the Data.gov webpage. The full data set houses 64,400 observations and 21 variables. For this analysis, I opted to only use a few. After removing invalid values and unnecessary variables, the cleaned dataset contained 3,220 observations. Each observation represents a U.S county in 2019, either inland or coastal, with flooding exposure, which is recorded quantitatively.
This dataset was taken from the Climate History Extensions dataset published through Data.gov, and the link is as follows: https://catalog.data.gov/dataset/climate-history-extensions.
I started by loading and compiling all of the base R functions and setting up the data set for analysis. After noticing trends and excess variables, I began to hone in on the variables and data that would be useful for my work. Next, I began to clean the data, using the select function on: county name, year, coastal flooding, and inland flooding exposures. Next, I removed any incorrect recordings, removing the values that were equal to -999. Finally, I compared the mean flooding exposure values in the Box plot and statistic test. These portions helped show general patterns and make the work more digestible before the formal hypothesis test.
flood <- data %>%
select(NAME, YEAR, `COASTAL FLOODING`, `INLAND FLOODING`) %>%
rename(
county = NAME,
year = YEAR,
coastal_flooding = `COASTAL FLOODING`,
inland_flooding = `INLAND FLOODING`
) %>%
filter(inland_flooding != -999, coastal_flooding != -999)
summary(flood)
## county year coastal_flooding inland_flooding
## Length:3220 Min. :2019 Min. : 0.000 Min. : 0.00
## Class :character 1st Qu.:2019 1st Qu.: 0.000 1st Qu.: 18.36
## Mode :character Median :2019 Median : 0.000 Median : 99.96
## Mean :2019 Mean : 2.752 Mean : 72.95
## 3rd Qu.:2019 3rd Qu.: 0.000 3rd Qu.: 99.99
## Max. :2019 Max. :100.000 Max. :100.00
boxplot(flood$coastal_flooding, flood$inland_flooding,
names = c("Coastal Flooding", "Inland Flooding"),
main = "Coastal vs Inland Flooding Exposure",
ylab = "Exposure")
## Testing
mean(flood$coastal_flooding)
## [1] 2.751562
mean(flood$inland_flooding)
## [1] 72.94889
t.test(flood$coastal_flooding, flood$inland_flooding, paired = TRUE)
##
## Paired t-test
##
## data: flood$coastal_flooding and flood$inland_flooding
## t = -91.573, df = 3219, p-value < 2.2e-16
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -71.70035 -68.69431
## sample estimates:
## mean difference
## -70.19733
H₀: µcoastal = µinland Hₐ: µcoastal ≠ µinland
The null hypothesis for this project was that the mean coastal flooding exposure was the same as the mean inland flooding exposure. The alternative hypothesis was the opposite, that the means were different.
To test this, a paired t-test was used. The paired option was the most suitable because the flooding measures both originated from the same country observation, basically, the same situation/place. The mean coastal flooding exposure was 2.75, while the mean inland exposure was 72.95. With a significance level of .05, the t-test resulted in a p-value lower than 2.2e-16. Because the p-value was so much smaller than .05, I rejected the null hypothesis, meaning there is a statistically significant difference between coastal flooding exposure and inland flooding exposure in U.S counties during 2019.
The project sook finding whether U.S counties in 2019 had a different average coastal flooding exposure and inland flooding exposure. After analyzing everything, the results show that the inland flooding exposure was much higher on average than the coastal flooding exposure. This is extremely important information to be aware of, seeing as so much of the flooding risk is typically associated with coastal areas, while inland flooding exposure is actually much more common. Furthermore, this means that inland flooding needs to be a more considered hazard; rivers, heavy rain, runoff, and non coastal drainage problems are also prominent issues. To continue the progress of this study, future steps like analyzing the actual amount of destruction compared to the flooding exposure, gaining more insight into the more hazardous type of flooding.
U.S. Environmental Protection Agency, Office of Research and Development. Climate History Extensions. Data.gov, 25 Feb. 2023, https://catalog.data.gov/dataset/climate-history-extensions