Assesing the impact of storms in the United States

Synopsis

Each year, severe weather conditions have big impact on the United States. In this analysis, we analyzed the impact of storms in the US in terms of the number of injuries, fatalities, as well as economic consequences. It was found that tornadoes are particularly dangerous for people’s lives. They also generate much damage to the property. Heat is also another type of events that leads many people to death. On the other hand, flood is the type that has the greatest economic consequences.

Data Processing

We import libraries that will be used to make the analysis.

library(dplyr)
library(ggplot2)
library(xtable)

For the purpose of this project, we are going to use the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.

We start with downloading the CSV file and loading it into a data frame

if (!file.exists("storm_data.csv.bz2")) {
 download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
              destfile = "storm_data.csv.bz2") 
}
storm_data <- read.csv("storm_data.csv.bz2")

names(storm_data)

##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Results

Across the United States, which types of events are most harmful with respect to population health?

We group all rows by events and take the total number of fatalities and injuries for them.

storm_grouped_harm <- storm_data %>%
  group_by(EVTYPE) %>%
  summarise(TOTAL_INJURIES = sum(INJURIES), TOTAL_FATALITIES = sum(FATALITIES))

We sort the events in the descending order by fatalities and injuries and save results into separate 2 data frames. We find the top 5 of both to make our visualization less cluttered.

top5_fatalities <- head(storm_grouped_harm %>% arrange(-TOTAL_FATALITIES))
top5_injuries <- head(storm_grouped_harm %>% arrange(-TOTAL_INJURIES))

print(xtable(top5_fatalities), type="html")

	EVTYPE	TOTAL_INJURIES	TOTAL_FATALITIES
1	TORNADO	91346.00	5633.00
2	EXCESSIVE HEAT	6525.00	1903.00
3	FLASH FLOOD	1777.00	978.00
4	HEAT	2100.00	937.00
5	LIGHTNING	5230.00	816.00
6	TSTM WIND	6957.00	504.00

For both data frames, we change the type of EVTYPE to factors to preserve the orders while plotting.

top5_fatalities$EVTYPE <- factor(top5_fatalities$EVTYPE, levels = top5_fatalities$EVTYPE)
top5_injuries$EVTYPE <- factor(top5_injuries$EVTYPE, levels = top5_injuries$EVTYPE)

We plot the top 5 event types in terms of fatalities.

ggplot(top5_fatalities, aes(x = EVTYPE, y = TOTAL_FATALITIES)) +
  geom_bar(stat="identity") +
  ggtitle("Top 5 most fatal event types") +
  xlab("Event type") +
  ylab("Number of fatalities")

We do the same for injuries

ggplot(top5_injuries, aes(x = EVTYPE, y = TOTAL_INJURIES)) +
  geom_bar(stat="identity") +
  ggtitle("Top 5 event types in terms of injuries") +
  xlab("Event type") +
  ylab("Number of injuries")

Conclusion

Undoubtedly, the most harmful events are tornadoes. Other harmful events are those associated with excessive heat and thunderstorm wind.

Across the United States, which types of events have the greatest economic consequences?

This time we will find the overall cost of each event type.

We need to first multiply the property damage column with its multiplier. Let’s check what are the distinct multipliers we need to use.

unique(storm_data$PROPDMGEXP)

##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"

We can omit the empty multipliers, special characters (“-”, “?”) and digits 0-8 (footnotes) because they do not add anything new to our data. We will interpret the rest in the following way:

“H” is 100x
“K” is 1,000x
“M” and “m” is 1,000,000x
“B” is 1,000,000,000x

We multiply the property damage column according to the multipliers.

storm_data <- storm_data %>%
  mutate(PROPDMG = PROPDMG * ifelse(PROPDMGEXP == "H", 1e2,
                              ifelse(PROPDMGEXP == "K", 1e3,
                                ifelse(PROPDMGEXP == "M", 1e6,
                                  ifelse(PROPDMGEXP == "m", 1e6,
                                    ifelse(PROPDMGEXP == "B", 1e9,
                                      1
                              ))))))

head(storm_data$PROPDMG, 10)

##  [1] 25000  2500 25000  2500  2500  2500  2500  2500 25000 25000

Now, we are ready to sum up the property damage for every type

storm_grouped_cost <- storm_data %>%
  group_by(EVTYPE) %>%
  summarise(TOTAL_COST = sum(PROPDMG))

print(xtable(head(storm_grouped_cost)), type="html")

	EVTYPE	TOTAL_COST
1	HIGH SURF ADVISORY	200000.00
2	COASTAL FLOOD	0.00
3	FLASH FLOOD	50000.00
4	LIGHTNING	0.00
5	TSTM WIND	8100000.00
6	TSTM WIND (G45)	8000.00

We sort the data frame in the decreasing order by the total cost and choose top 5.

storm_grouped_cost <- storm_grouped_cost %>%
  arrange(-TOTAL_COST)

top5_cost <- head(storm_grouped_cost)
print(xtable(top5_cost), type="html")

	EVTYPE	TOTAL_COST
1	FLOOD	144657709807.00
2	HURRICANE/TYPHOON	69305840000.00
3	TORNADO	56937160778.70
4	STORM SURGE	43323536000.00
5	FLASH FLOOD	16140812067.10
6	HAIL	15732267542.70

We change the event type column to factors to preserver order on a plot.

top5_cost$EVTYPE <- factor(top5_cost$EVTYPE, levels = top5_cost$EVTYPE)

We plot the top 5 to get better insight. We express the y-axis in billions of dollars.

ggplot(top5_cost, aes(x = EVTYPE, y = TOTAL_COST/1e9)) +
  geom_bar(stat="identity") +
  ggtitle("Top 5 event types in terms of total cost") +
  xlab("Event type") +
  ylab("Billions of dollars")

Conclusion

Events related to rain and wind have the greatest economic consequences. The main event type in flood, followed by hurricanes and tornadoes.

Assesing the impact of storms in the United States

Bartosz Dzionek

24/07/2021

Synopsis

Data Processing

Results

Across the United States, which types of events are most harmful with respect to population health?

Conclusion

Across the United States, which types of events have the greatest economic consequences?

Conclusion