Synopsis

This document has the goal to define what was the most harmful natural event to the population health and with the greatest economic consequences to the USA during the time period of 1950 and November 2011. In order to do that, it was used the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database has the storms and weather events in the United States, including estimates of any fatalities, injuries, and property damage. The results showed that the most harmful natural event to the population health was tornado and with the greatest economic consequences to the USA was flood.

Data processing

The data was downloaded from the internet (https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2Fstorm.csv.bz2). The data of “fatalities” and “injuries” were aggregated, in order to get the total impact on the people health, and the result was labelled “people_damage”. Additionally it was created two columns with the scales of the the property and crop damages (“PROPDMGFACTOR” and “CROPDMGFACTOR”). Following, these columns were multiplied to the property and crop damage in order to get the damages in USD. After the two columns of damage were aggregated in the column “econ_damage” and divided to 10^6 to get the values in US$ millions. It was verified that the column of natural events (“EVTYPE”) had some events described with different words but that are actually equal the other events. For example, the event “AVALANCE” it is believe to belong to the event “AVALANCHE”. So, this kind of error was corrected. Following, it was created two new data sets: one aggregating the people damage (“people_damage”) by event type and other aggregating the economics damages (“econ_damage”) by event type. The data sets were put in decreasing order by people damage and economic damage, respectively. By the end, the columns of natural events in both data sets were formatted to factor in order to keep the plots bars ordered.

# loap packages
suppressMessages(library(dplyr))
suppressMessages(library(ggplot2))
# set working directory
setwd("C:/Users/Kleber/Downloads/repdata_data_StormData.csv")
# download the file
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "data_base.csv")
# read data
storm <- read.csv("data_base.csv")
# sum the columns "fatalities" and "injuries" in order to get the total damage to the population health
storm$people_damage <- storm$FATALITIES+storm$INJURIES
# put the columns of scale of the property and crop damage in upper case
storm$PROPDMGEXP <- toupper(storm$PROPDMGEXP)
storm$CROPDMGEXP <- toupper(storm$CROPDMGEXP)
# create a column with the numeric scale of the crop damage
storm$CROPDMGFACTOR[(storm$CROPDMGEXP == "")] <- 10^0
storm$CROPDMGFACTOR[(storm$CROPDMGEXP == "?")] <- 10^0
storm$CROPDMGFACTOR[(storm$CROPDMGEXP == "-")] <- 10^0
storm$CROPDMGFACTOR[(storm$CROPDMGEXP == "+")] <- 10^0
storm$CROPDMGFACTOR[(storm$CROPDMGEXP == "0")] <- 10^0
storm$CROPDMGFACTOR[(storm$CROPDMGEXP == "1")] <- 10^1
storm$CROPDMGFACTOR[(storm$CROPDMGEXP == "2")] <- 10^2
storm$CROPDMGFACTOR[(storm$CROPDMGEXP == "3")] <- 10^3
storm$CROPDMGFACTOR[(storm$CROPDMGEXP == "4")] <- 10^4
storm$CROPDMGFACTOR[(storm$CROPDMGEXP == "5")] <- 10^5
storm$CROPDMGFACTOR[(storm$CROPDMGEXP == "6")] <- 10^6
storm$CROPDMGFACTOR[(storm$CROPDMGEXP == "7")] <- 10^7
storm$CROPDMGFACTOR[(storm$CROPDMGEXP == "8")] <- 10^8
storm$PROPDMGFACTOR[(storm$CROPDMGEXP == "H")] <- 10^2
storm$CROPDMGFACTOR[(storm$CROPDMGEXP == "K")] <- 10^3
storm$CROPDMGFACTOR[(storm$CROPDMGEXP == "M")] <- 10^6
storm$CROPDMGFACTOR[(storm$CROPDMGEXP == "B")] <- 10^9
# create a column with the numeric scale of the property damage
storm$PROPDMGFACTOR[(storm$PROPDMGEXP == "")] <- 10^0
storm$PROPDMGFACTOR[(storm$PROPDMGEXP == "-")] <- 10^0
storm$PROPDMGFACTOR[(storm$PROPDMGEXP == "?")] <- 10^0
storm$PROPDMGFACTOR[(storm$PROPDMGEXP == "+")] <- 10^0
storm$PROPDMGFACTOR[(storm$PROPDMGEXP == "0")] <- 10^0
storm$PROPDMGFACTOR[(storm$PROPDMGEXP == "1")] <- 10^1
storm$PROPDMGFACTOR[(storm$PROPDMGEXP == "2")] <- 10^2
storm$PROPDMGFACTOR[(storm$PROPDMGEXP == "3")] <- 10^3
storm$PROPDMGFACTOR[(storm$PROPDMGEXP == "4")] <- 10^4
storm$PROPDMGFACTOR[(storm$PROPDMGEXP == "5")] <- 10^5
storm$PROPDMGFACTOR[(storm$PROPDMGEXP == "6")] <- 10^6
storm$PROPDMGFACTOR[(storm$PROPDMGEXP == "7")] <- 10^7
storm$PROPDMGFACTOR[(storm$PROPDMGEXP == "8")] <- 10^8
storm$PROPDMGFACTOR[(storm$PROPDMGEXP == "H")] <- 10^2
storm$PROPDMGFACTOR[(storm$PROPDMGEXP == "K")] <- 10^3
storm$PROPDMGFACTOR[(storm$PROPDMGEXP == "M")] <- 10^6
storm$PROPDMGFACTOR[(storm$PROPDMGEXP == "B")] <- 10^9
# create a column of the property damage using the scales
storm$prop_damage <- storm$PROPDMG*storm$PROPDMGFACTOR
# create a column of the crop damage using the scales
storm$crop_damage <- storm$CROPDMG*storm$CROPDMGFACTOR
# sum the columns of property damage and crop damage
storm$econ_damage <- storm$prop_damage+storm$crop_damage
# put all the event types in 
storm$EVTYPE <- toupper(storm$EVTYPE)
# replace "AVALANCE" by "AVALANCHE"
storm$EVTYPE <- gsub("AVALANCE", "AVALANCHE", storm$EVTYPE)
# replace "COASTALSTORM" by "COASTAL STORM"
storm$EVTYPE <- gsub("COASTALSTORM", "COASTAL STORM", storm$EVTYPE)
# replace "COASTAL FLOODING/EROSION" and  "COASTAL FLOOD" by "COASTAL FLOODING"
storm$EVTYPE <- gsub("COASTAL FLOODING/EROSION|COASTAL FLOOD", "COASTAL FLOODING", storm$EVTYPE)
# replace "WINTER WEATHER MIX" by "WINTER WEATHER/MIX"
storm$EVTYPE <- gsub("WINTER WEATHER MIX", "WINTER WEATHER/MIX", storm$EVTYPE)
# replace "WINTER STORMS" by "WINTER STORM"
storm$EVTYPE <- gsub("WINTER STORMS", "WINTER STORM", storm$EVTYPE)
# replace "WINDS" by "WIND"
storm$EVTYPE <- gsub("WINDS", "WIND", storm$EVTYPE)
# replace "WILD FIRES" by "WILDFIRE"
storm$EVTYPE <- gsub("WILD FIRES", "WILDFIRE", storm$EVTYPE)
# replace "WATERSPOUT TORNADO" by "WATERSPOUT/TORNADO"
storm$EVTYPE <- gsub("WATERSPOUT TORNADO", "WATERSPOUT/TORNADO", storm$EVTYPE)
# replace "URBAN AND SMALL STREAM FLOODIN" by "URBAN/SML STREAM FLD"
storm$EVTYPE <- gsub("URBAN AND SMALL STREAM FLOODIN", "URBAN/SML STREAM FLD", storm$EVTYPE)
# replace "THUNDERSTORM WINDSS", "THUNDERSTORM WINDS" and "THUNDERSTORM WINDS" by "THUNDERSTORM WIND"
storm$EVTYPE <- gsub("THUNDERSTORM WINDSS|THUNDERSTORM WINDS|THUNDERSTORMS WINDS", "THUNDERSTORM WIND", storm$EVTYPE)
# replace "STRONG WINDS" by "STRONG WIND"
storm$EVTYPE <- gsub("STRONG WINDS", "STRONG WIND", storm$EVTYPE)
# replace "RIVER FLOODING" by "RIVER FLOOD"
storm$EVTYPE <- gsub("RIVER FLOODING", "RIVER FLOOD", storm$EVTYPE)
# replace "RIP CURRENTS" by "RIP CURRENT"
storm$EVTYPE <- gsub("RIP CURRENTS", "RIP CURRENT", storm$EVTYPE)
# replace "RECORD/EXCESSIVE HEAT" by "RECORD HEAT"
storm$EVTYPE <- gsub("RECORD/EXCESSIVE HEAT", "RECORD HEAT", storm$EVTYPE)
# replace "RECORD/EXCESSIVE HEAT" by "RECORD HEAT"
storm$EVTYPE <- gsub("RECORD/EXCESSIVE HEAT", "RECORD HEAT", storm$EVTYPE)
# replace "COLD/WIND" by "COLD/WIND CHILL"
storm$EVTYPE <- gsub("COLD/WIND", "COLD/WIND CHILL", storm$EVTYPE)
# replace "FLASH FLOOD/FLOOD","FLASH FLOODING","FLASH FLOODING/FLOOD" and "FLASH FLOODS" by "FLASH FLOOD"
storm$EVTYPE <- gsub("FLASH FLOOD/FLOOD|FLASH FLOODING|FLASH FLOODING/FLOOD|FLASH FLOODS", "FLASH FLOOD", storm$EVTYPE)
# replace "FLOODING" by "FLOOD"
storm$EVTYPE <- gsub("FLOODING", "FLOOD", storm$EVTYPE)
# replace "HEAT WAVES" by "HEAT WAVE"
storm$EVTYPE <- gsub("HEAT WAVES", "HEAT WAVE", storm$EVTYPE)
# replace "HEAVY RAINS" by "HEAVY RAIN"
storm$EVTYPE <- gsub("HEAVY RAINS", "HEAVY RAIN", storm$EVTYPE)
# replace "HEAVY SNOW/ICE" by "HEAVY SNOW"
storm$EVTYPE <- gsub("HEAVY SNOW/ICE", "HEAVY SNOW", storm$EVTYPE)
# replace "HIGH WIND/SEAS" by "HIGH WIND AND SEAS"
storm$EVTYPE <- gsub("HIGH WIND/SEAS", "HIGH WIND AND SEAS", storm$EVTYPE)
# replace "HYPERTHERMIA/EXPOSURE" and "HYPOTHERMIA" by "HYPOTHERMIA/EXPOSURE"
storm$EVTYPE <- gsub("HYPERTHERMIA/EXPOSURE|HYPOTHERMIA", "HYPOTHERMIA/EXPOSURE", storm$EVTYPE)
# replace "HYPERTHERMIA/EXPOSURE" and "HYPOTHERMIA" by "HYPOTHERMIA/EXPOSURE"
storm$EVTYPE <- gsub("HYPERTHERMIA/EXPOSURE|HYPOTHERMIA", "HYPOTHERMIA/EXPOSURE", storm$EVTYPE)
# replace "ICE ROADS" and  "ICY ROADS" by "ICE ON ROAD"
storm$EVTYPE <- gsub("ICE ROADS|ICY ROADS", "ICE ON ROAD", storm$EVTYPE)
# replace "LIGHTNING." by "LIGHTNING"
storm$EVTYPE <- gsub("LIGHTNING.", "LIGHTNING", storm$EVTYPE)
# replace "MARINE HIGH WIND" by "MARINE STRONG WIND"
storm$EVTYPE <- gsub("MARINE HIGH WIND", "MARINE STRONG WIND", storm$EVTYPE)
# replace "MUDSLIDES" by "MUDSLIDES"
storm$EVTYPE <- gsub("MUDSLIDES", "MUDSLIDE", storm$EVTYPE)
# replace "SNOW SQUALLS" by "SNOW SQUALL"
storm$EVTYPE <- gsub("SNOW SQUALLS", "SNOW SQUALL", storm$EVTYPE)
# replace "THUNDERSTORMS WIND" and "THUNDERSTORM  WIND" by "THUNDERSTORM WIND"
storm$EVTYPE <- gsub("THUNDERSTORMS WIND|THUNDERSTORM  WIND", "THUNDERSTORM WIND", storm$EVTYPE)
# replace "WILD/FOREST FIRE" by "WILDFIRE"
storm$EVTYPE <- gsub("WILD/FOREST FIRE", "WILDFIRE", storm$EVTYPE)
# replace "WINTRY MIX" by "WINTER MIX"
storm$EVTYPE <- gsub("WINTRY MIX", "WINTER MIX", storm$EVTYPE)
# group the data by event type
storm_2 <- storm %>% group_by(EVTYPE) %>% summarise(people_damage = sum(people_damage))
# change to data.frame
storm_2 <- as.data.frame(storm_2)
# filter the events with some people damage
storm_2 <- storm_2[storm_2$people_damage>0,]
# order by "people_damage"
storm_2 <- storm_2[order(storm_2$people_damage, decreasing = T),]
# filter just the top 5 events
storm_3 <- storm_2[c(1:5),]
# put events as factor
storm_3$EVTYPE <- factor(storm_3$EVTYPE, labels = c("TORNADO","EXCESSIVE HEAT", "TSTM WIND", "FLOOD", "LIGHTNING"), 
                         levels = c("TORNADO", "EXCESSIVE HEAT", "TSTM WIND", "FLOOD", "LIGHTNING"))
# group the column of economic damage by event type
storm_4 <- storm %>% group_by(EVTYPE) %>% summarise(econ_damage = sum(econ_damage))
# change to data.frame
storm_4 <- as.data.frame(storm_4)
# filter the events with some damage
storm_4 <- storm_4[storm_4$econ_damage>0,]
# order by damage
storm_4 <- storm_4[order(storm_4$econ_damage, decreasing = T),]
# filter just the top-5
storm_4 <- storm_4[c(1:5),]
# put events as factor
storm_4$EVTYPE <- factor(storm_4$EVTYPE, labels = c("FLOOD","HURRICANE/TYPHOON", "TORNADO", "STORM SURGE", "FLASH FLOOD"), 
                         levels = c("FLOOD","HURRICANE/TYPHOON", "TORNADO", "STORM SURGE", "FLASH FLOOD"))
# divide for 10^6 to turn the values in USD millions
storm_4$econ_damage <- storm_4$econ_damage/10^6

Results

On the Figure 1 it is possible to see the five most harmful natural events to the people health in USA during the time period of 1950 and November 2011. Based on that plot, it is possible to identify that the most harmful event was tornado.

suppressMessages(library(ggplot2))
# make the plot of the top-5 natural events impacts to the people health
ggplot(storm_3, aes(x = EVTYPE, y = people_damage)) +
  geom_bar(stat = "identity", color = "blue", fill="blue") +
  xlab("Event type") +
  ylab("Number of people damage (fatalities and injuries)") +
  theme(plot.title = element_text(size = 10, face = "bold")) +
  ggtitle("Top-5 most harmful natural events to the population health in the USA (1950-2011)")

Additionally, the Figure 2 shows the five natural events with the greatest economic consequences in USA during the time period of 1950 and November 2011. The event with the greatest economic impact on that time period was flood.

ggplot(storm_4, aes(x = EVTYPE, y = econ_damage)) +
  geom_bar(stat = "identity", color = "brown", fill="brown") +
  xlab("Event type") +
  ylab("Economic damages (US$ millions)") +
  theme(plot.title = element_text(size = 10, face = "bold")) +
  ggtitle("Top-5 natural events with greatest economic damages for the USA (1950-2011)")