This report describes the relationship in the United States between severe weather events and their impact on population health and the economy. The data below is from the NOAA Storm Database, which contains data for 48 official weather events (as described here) and measures the number of fatalities, the amount of property damage, and the amount of crop damage inflicted by each event. This analysis found that certain events such as tornadoes and floods have a relatively low number of fatalities and property or crop damage per event, but the number of times they happen far exceeds others events. Meanwhile there are certain events, such as hurricanes especially, which happen a relatively low number of times but inflict a large number of fatalities and a large amount of property and crop damage per event.
if (!file.exists("stormdata.csv.bz2")) {
datalocation <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(datalocation, "stormdata.csv.bz2")
}
library(readr)
dat <- read_csv("stormdata.csv.bz2", col_types = paste(rep("c", 37), collapse = ""))
Here, we clean up the BGN_DATE to only show the date in yyyy-mm-dd format.
suppressMessages(library(lubridate))
dat$BGN_DATE <- mdy(gsub(" .*","",dat$BGN_DATE))
The PROPDMGEXP column represents exponent values for the PROPDMG column, but it is riddled with inconsistencies. The following removes rows with certain values, converts certain values to their numeric values, and creates a new PROPDMGLONG column that multiplies PROPDMG by the new PROPDMGEXP column. Credit to figuring out these values goes to the work done here. Does the same for CROPDMG and CROPDMGEXP, and creates a CROPDMGLONG column for the product of those columns once they’ve been tidied.
dat <- dat[!grepl("\\?|-", dat$PROPDMGEXP),]
newpropdmg <- numeric(length = nrow(dat))
newpropdmg[grep("[1-8]", dat$PROPDMGEXP)] <- 10
newpropdmg[grep("\\+", dat$PROPDMGEXP)] <- 1
newpropdmg[grep("[Bb]", dat$PROPDMGEXP)] <- 1000000000
newpropdmg[grep("[Mm]", dat$PROPDMGEXP)] <- 1000000
newpropdmg[grep("[Kk]", dat$PROPDMGEXP)] <- 1000
newpropdmg[grep("[Hh]", dat$PROPDMGEXP)] <- 100
newpropdmg[is.na(dat$PROPDMGEXP)] <- 0
dat$PROPDMGLONG <- as.numeric(dat$PROPDMG) * newpropdmg
dat <- dat[!grepl("\\?|-", dat$CROPDMGEXP),]
newcropdmg <- numeric(length = nrow(dat))
newcropdmg[grep("[1-8]", dat$CROPDMGEXP)] <- 10
newcropdmg[grep("\\+", dat$CROPDMGEXP)] <- 1
newcropdmg[grep("[Bb]", dat$CROPDMGEXP)] <- 1000000000
newcropdmg[grep("[Mm]", dat$CROPDMGEXP)] <- 1000000
newcropdmg[grep("[Kk]", dat$CROPDMGEXP)] <- 1000
newcropdmg[grep("[Hh]", dat$CROPDMGEXP)] <- 100
newcropdmg[is.na(dat$CROPDMGEXP)] <- 0
dat$CROPDMGLONG <- as.numeric(dat$CROPDMG) * newcropdmg
dat$ALLDMGLONG <- dat$PROPDMGLONG + dat$CROPDMGLONG
Now, to clean up the EVTYPE column. We’ll just look for anything that matches the official 48 event values. Keeping only these 48 values preserves 79.49% of all property damage, 75.85% of all crop damage, and gets rid of 29.59% of the rows.
evtype_vals <- tolower(paste("^Astronomical Low Tide$", "^Avalanche$",
"^Blizzard$", "^Coastal Flood$",
"^Cold/Wind Chill$", "^Debris Flow$",
"^Dense Fog$", "^Dense Smoke$", "^Drought$",
"^Dust Devil$", "^Dust Storm$",
"^Excessive Heat$", "^Extreme Cold/Wind Chill$",
"^Flash Flood$", "^Flood$", "^Freezing Fog$",
"^Frost/Freeze$", "^Funnel Cloud$", "^Hail$",
"^Heat$", "^Heavy Rain$", "^Heavy Snow$",
"^High Surf$", "^High Wind$",
"^Hurricane/Typhoon$", "^Ice Storm$",
"^Lakeshore Flood$", "^Lake-Effect Snow$",
"^Lightning$", "^Marine Hail$",
"^Marine High Wind$", "^Marine Strong Wind$",
"^Marine Thunderstorm Wind$", "^Rip Current$",
"^Seiche$", "^Sleet$", "^Storm Tide$",
"^Strong Wind$", "^Thunderstorm Wind$",
"^Tornado$", "^Tropical Depression$",
"^Tropical Storm$", "^Tsunami$",
"^Volcanic Ash$", "^Waterspout$",
"^Wildfire$", "^Winter Storm$",
"^Winter Weather$",
sep = "|"))
dat <- dat[grep(evtype_vals, tolower(dat$EVTYPE)),]
dat$EVTYPE <- tolower(dat$EVTYPE)
Finally, we use the dplyr package to create a new dat_grouped table that summarizes key metrics, such as the number of events and the number of fatalities, for each event type. One of these columns, alldmg_total, represents the sum of crop damage and property damage. Another column, alldmg_avg, represents the average amount of damage and property damage. The kableExtra package is used to print the entire dat_grouped table.
suppressMessages(library(dplyr))
suppressMessages(library(kableExtra))
dat$FATALITIES <- as.numeric(dat$FATALITIES)
dat$INJURIES <- as.numeric(dat$INJURIES)
dat_grouped <- dat %>%
group_by(EVTYPE) %>%
summarise(count = n(),
fatalities_avg = mean(FATALITIES),
alldmg_avg = mean(ALLDMGLONG))
kable(dat_grouped, format.args = list(big.mark = ","),
booktabs = T) %>%
kable_styling(latex_options = "striped", full_width = F,
position = "left")
| EVTYPE | count | fatalities_avg | alldmg_avg |
|---|---|---|---|
| astronomical low tide | 174 | 0.0000000 | 1.839080e+03 |
| avalanche | 386 | 0.5803109 | 9.641969e+03 |
| blizzard | 2,719 | 0.0371460 | 2.836609e+05 |
| coastal flood | 657 | 0.0045662 | 3.950846e+05 |
| cold/wind chill | 539 | 0.1762523 | 4.805195e+03 |
| dense fog | 1,293 | 0.0139211 | 7.481825e+03 |
| dense smoke | 10 | 0.0000000 | 1.000000e+04 |
| drought | 2,488 | 0.0000000 | 6.036444e+06 |
| dust devil | 149 | 0.0134228 | 4.823020e+03 |
| dust storm | 427 | 0.0515222 | 2.025527e+04 |
| excessive heat | 1,678 | 1.1340882 | 2.980666e+05 |
| extreme cold/wind chill | 1,002 | 0.1247505 | 8.680639e+03 |
| flash flood | 54,276 | 0.0180190 | 3.235717e+05 |
| flood | 25,327 | 0.0185573 | 5.935155e+06 |
| freezing fog | 46 | 0.0000000 | 4.743478e+04 |
| frost/freeze | 1,343 | 0.0000000 | 8.225361e+05 |
| funnel cloud | 6,844 | 0.0000000 | 2.843366e+01 |
| hail | 288,658 | 0.0000520 | 6.498424e+04 |
| heat | 767 | 1.2216428 | 5.257608e+05 |
| heavy rain | 11,742 | 0.0083461 | 1.215847e+05 |
| heavy snow | 15,708 | 0.0080851 | 6.794259e+04 |
| high surf | 734 | 0.1416894 | 1.225545e+05 |
| high wind | 20,213 | 0.0121704 | 2.923177e+05 |
| hurricane/typhoon | 88 | 0.7272727 | 8.172013e+08 |
| ice storm | 2,006 | 0.0443669 | 4.470110e+06 |
| lake-effect snow | 636 | 0.0000000 | 6.307390e+04 |
| lakeshore flood | 23 | 0.0000000 | 3.278261e+05 |
| lightning | 15,755 | 0.0517931 | 5.971130e+04 |
| marine hail | 442 | 0.0000000 | 9.049774e+00 |
| marine high wind | 135 | 0.0074074 | 9.607481e+03 |
| marine strong wind | 48 | 0.2916667 | 8.715208e+03 |
| marine thunderstorm wind | 5,812 | 0.0017206 | 8.368892e+01 |
| rip current | 470 | 0.7829787 | 2.127660e+00 |
| seiche | 21 | 0.0000000 | 4.666667e+04 |
| sleet | 59 | 0.0338983 | 0.000000e+00 |
| strong wind | 3,569 | 0.0288596 | 6.730539e+04 |
| thunderstorm wind | 82,563 | 0.0016109 | 4.721200e+04 |
| tornado | 60,652 | 0.0928741 | 9.455931e+05 |
| tropical depression | 60 | 0.0000000 | 2.895000e+04 |
| tropical storm | 690 | 0.0840580 | 1.214817e+07 |
| tsunami | 20 | 1.6500000 | 7.204100e+06 |
| volcanic ash | 23 | 0.0000000 | 2.173913e+04 |
| waterspout | 3,797 | 0.0007901 | 2.463445e+03 |
| wildfire | 2,761 | 0.0271641 | 1.832882e+06 |
| winter storm | 11,433 | 0.0180180 | 5.873735e+05 |
| winter weather | 7,045 | 0.0046842 | 5.090987e+03 |
First, we look how much property and crop damage is inflicted by each event type. Instead of plotting all 48 event types, we use the quantile function to filter down to only the event types in the 75th percentile of average property/damage inflicted.
suppressMessages(library(ggplot2))
suppressMessages(library(ggrepel))
top_dmg_val <- quantile(dat_grouped$alldmg_avg, c(.75))
top_dmg <- dat_grouped[dat_grouped$alldmg_avg > top_dmg_val,] %>%
select(-fatalities_avg)
kable(top_dmg, format.args = list(big.mark = ","),
booktabs = T) %>%
kable_styling(latex_options = "striped", full_width = F,
position = "left")
| EVTYPE | count | alldmg_avg |
|---|---|---|
| coastal flood | 657 | 395,084.6 |
| drought | 2,488 | 6,036,443.7 |
| flood | 25,327 | 5,935,155.3 |
| frost/freeze | 1,343 | 822,536.1 |
| heat | 767 | 525,760.8 |
| hurricane/typhoon | 88 | 817,201,281.8 |
| ice storm | 2,006 | 4,470,110.3 |
| tornado | 60,652 | 945,593.1 |
| tropical storm | 690 | 12,148,168.9 |
| tsunami | 20 | 7,204,100.0 |
| wildfire | 2,761 | 1,832,881.9 |
| winter storm | 11,433 | 587,373.5 |
We use the ggplot2 and ggrepel package plot the average amount of property/crop damage per event type (y-axis) against the number of times each event type happened (x-axis). We transform each axis to be on a logarithmic scale, to better see the differences between the different event types.
plot1 <- ggplot(top_dmg, aes(count, alldmg_avg)) +
geom_point() +
scale_x_log10() + scale_y_log10() +
labs(y = "Avg property/crop damage (log10)",
x = "Number of events since 1950 (log10") +
geom_text_repel(aes(label = EVTYPE))
plot1
We can see that tornadoes and floods don’t inflict on average a massive amount of property/crop damage, but they happen much more frequently. Meanwhile, hurricanes/typhoons don’t happen very frequently but on average they inflict the highest amount of property/crop damage.
Now, we look how many fatalities are inflicted by each event type. Again, instead of plotting all 48 event types, we use the quantile function to filter down to only the event types in the 75th percentile of average fatalities.
top_ftl_val <- quantile(dat_grouped$fatalities_avg, c(.75))
top_ftl <- dat_grouped[dat_grouped$fatalities_avg > top_ftl_val,] %>%
select(-alldmg_avg)
kable(top_ftl, format.args = list(big.mark = ","),
booktabs = T) %>%
kable_styling(latex_options = "striped", full_width = F,
position = "left")
| EVTYPE | count | fatalities_avg |
|---|---|---|
| avalanche | 386 | 0.5803109 |
| cold/wind chill | 539 | 0.1762523 |
| excessive heat | 1,678 | 1.1340882 |
| extreme cold/wind chill | 1,002 | 0.1247505 |
| heat | 767 | 1.2216428 |
| high surf | 734 | 0.1416894 |
| hurricane/typhoon | 88 | 0.7272727 |
| marine strong wind | 48 | 0.2916667 |
| rip current | 470 | 0.7829787 |
| tornado | 60,652 | 0.0928741 |
| tropical storm | 690 | 0.0840580 |
| tsunami | 20 | 1.6500000 |
We again use the ggplot2 and ggrepel package plot the average number of fatalities per event type (y-axis) against the number of times each event type happened (x-axis). We again transform each axis to be on a logarithmic scale, to better see the differences between the different event types.
plot2 <- ggplot(top_ftl, aes(count, fatalities_avg)) +
geom_point() +
scale_x_log10() + scale_y_log10() +
labs(y = "Avg fatalities (log10)",
x = "Number of events since 1950 (log10)") +
geom_text_repel(aes(label = EVTYPE))
plot2
We can see that tornadoe don’t inflict on average a large number of fatalities, but they happen much more frequently. Meanwhile, events like heat and excessive heat don’t happen a large number of times but they are among the event types with the highest fatalities per event.