Synopsis

Each year, severe weather conditions have big impact on the United States. In this analysis, we analyzed the impact of storms in the US in terms of the number of injuries, fatalities, as well as economic consequences. It was found that tornadoes are particularly dangerous for people’s lives. They also generate much damage to the property. Heat is also another type of events that leads many people to death. On the other hand, flood is the type that has the greatest economic consequences.

Data Processing

We import libraries that will be used to make the analysis.

library(dplyr)
library(ggplot2)
library(xtable)

For the purpose of this project, we are going to use the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.

We start with downloading the CSV file and loading it into a data frame

if (!file.exists("storm_data.csv.bz2")) {
 download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
              destfile = "storm_data.csv.bz2") 
}
storm_data <- read.csv("storm_data.csv.bz2")

names(storm_data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Results

Across the United States, which types of events are most harmful with respect to population health?

We group all rows by events and take the total number of fatalities and injuries for them.

storm_grouped_harm <- storm_data %>%
  group_by(EVTYPE) %>%
  summarise(TOTAL_INJURIES = sum(INJURIES), TOTAL_FATALITIES = sum(FATALITIES))

We sort the events in the descending order by fatalities and injuries and save results into separate 2 data frames. We find the top 5 of both to make our visualization less cluttered.

top5_fatalities <- head(storm_grouped_harm %>% arrange(-TOTAL_FATALITIES))
top5_injuries <- head(storm_grouped_harm %>% arrange(-TOTAL_INJURIES))

print(xtable(top5_fatalities), type="html")
EVTYPE TOTAL_INJURIES TOTAL_FATALITIES
1 TORNADO 91346.00 5633.00
2 EXCESSIVE HEAT 6525.00 1903.00
3 FLASH FLOOD 1777.00 978.00
4 HEAT 2100.00 937.00
5 LIGHTNING 5230.00 816.00
6 TSTM WIND 6957.00 504.00

For both data frames, we change the type of EVTYPE to factors to preserve the orders while plotting.

top5_fatalities$EVTYPE <- factor(top5_fatalities$EVTYPE, levels = top5_fatalities$EVTYPE)
top5_injuries$EVTYPE <- factor(top5_injuries$EVTYPE, levels = top5_injuries$EVTYPE)

We plot the top 5 event types in terms of fatalities.

ggplot(top5_fatalities, aes(x = EVTYPE, y = TOTAL_FATALITIES)) +
  geom_bar(stat="identity") +
  ggtitle("Top 5 most fatal event types") +
  xlab("Event type") +
  ylab("Number of fatalities")

We do the same for injuries

ggplot(top5_injuries, aes(x = EVTYPE, y = TOTAL_INJURIES)) +
  geom_bar(stat="identity") +
  ggtitle("Top 5 event types in terms of injuries") +
  xlab("Event type") +
  ylab("Number of injuries")

Conclusion

Undoubtedly, the most harmful events are tornadoes. Other harmful events are those associated with excessive heat and thunderstorm wind.

Across the United States, which types of events have the greatest economic consequences?

This time we will find the overall cost of each event type.

We need to first multiply the property damage column with its multiplier. Let’s check what are the distinct multipliers we need to use.

unique(storm_data$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"

We can omit the empty multipliers, special characters (“-”, “?”) and digits 0-8 (footnotes) because they do not add anything new to our data. We will interpret the rest in the following way:

  • “H” is 100x
  • “K” is 1,000x
  • “M” and “m” is 1,000,000x
  • “B” is 1,000,000,000x

We multiply the property damage column according to the multipliers.

storm_data <- storm_data %>%
  mutate(PROPDMG = PROPDMG * ifelse(PROPDMGEXP == "H", 1e2,
                              ifelse(PROPDMGEXP == "K", 1e3,
                                ifelse(PROPDMGEXP == "M", 1e6,
                                  ifelse(PROPDMGEXP == "m", 1e6,
                                    ifelse(PROPDMGEXP == "B", 1e9,
                                      1
                              ))))))

head(storm_data$PROPDMG, 10)
##  [1] 25000  2500 25000  2500  2500  2500  2500  2500 25000 25000

Now, we are ready to sum up the property damage for every type

storm_grouped_cost <- storm_data %>%
  group_by(EVTYPE) %>%
  summarise(TOTAL_COST = sum(PROPDMG))

print(xtable(head(storm_grouped_cost)), type="html")
EVTYPE TOTAL_COST
1 HIGH SURF ADVISORY 200000.00
2 COASTAL FLOOD 0.00
3 FLASH FLOOD 50000.00
4 LIGHTNING 0.00
5 TSTM WIND 8100000.00
6 TSTM WIND (G45) 8000.00

We sort the data frame in the decreasing order by the total cost and choose top 5.

storm_grouped_cost <- storm_grouped_cost %>%
  arrange(-TOTAL_COST)

top5_cost <- head(storm_grouped_cost)
print(xtable(top5_cost), type="html")
EVTYPE TOTAL_COST
1 FLOOD 144657709807.00
2 HURRICANE/TYPHOON 69305840000.00
3 TORNADO 56937160778.70
4 STORM SURGE 43323536000.00
5 FLASH FLOOD 16140812067.10
6 HAIL 15732267542.70

We change the event type column to factors to preserver order on a plot.

top5_cost$EVTYPE <- factor(top5_cost$EVTYPE, levels = top5_cost$EVTYPE)

We plot the top 5 to get better insight. We express the y-axis in billions of dollars.

ggplot(top5_cost, aes(x = EVTYPE, y = TOTAL_COST/1e9)) +
  geom_bar(stat="identity") +
  ggtitle("Top 5 event types in terms of total cost") +
  xlab("Event type") +
  ylab("Billions of dollars")

Conclusion

Events related to rain and wind have the greatest economic consequences. The main event type in flood, followed by hurricanes and tornadoes.