In this report, I first downloaded and imported the data into RStudio with read.csv() function. Then I made use of the functions in dplyr library which allow me to sum the total fatalities and injuries with respect to each event type. I repeated the same operations with the dplyr functions to sum the total property and crop damage loss with respect to each event type as well. Note that in the latter case the exponents must be accounted for to scale the damage loss to the US dollars. I then ranked the first 10 event types based on the health and economic consequences. Lastly, I applied ggplot to make the barplots of the event type vs the health and economic consequences.
The first step is to load the Storm data into the RStudio environment using read.csv():
data <- read.csv("repdata-data-StormData.csv.bz2", na.strings=c("","NA"))
Then the dplyr library needs to be loaded in order to gather the information we will need to address our questions:
library(dplyr)
Then we group the data by its event type and sum the fatalities and injuries which is the health consequences:
healthdmg <- data %>%
filter(!is.na(EVTYPE)) %>% group_by(EVTYPE) %>%
summarize(sum_fat = sum(FATALITIES, na.rm = T), sum_inj = sum(INJURIES, na.rm = T)) %>%
mutate(sum_fat_inj = sum_fat + sum_inj)
Then we want to sort the event type by the total fatalities and injuries and just get the top 10 events.
highest_fat_inj <- sort(healthdmg$sum_fat_inj, decreasing = T)[1:10]
health_fat_inj <- healthdmg[healthdmg$sum_fat_inj %in% highest_fat_inj, ]
Next we will address the second question, ie, which event types are associated with highest economic consequences in US. Here we will need to change the values (which are factor types) in the PROPDMGEXP and CROPDMGEXP columns to corresponding numerical values to scale the economic loss to US currency. For instance, ‘B’ should be billion, which is 1e9, ‘M’ should be million, and so on. Then we repeated the similar procedures with dplyr in the above process with respect to the health consequences. Basically, we want to obtain the summed property and crop damages from the data. Note that the US currency unit is billion US$. Then we sort event types by the economic losses and get the top 10 event types.
data2 <- data
levels(data2$PROPDMGEXP) = c(0,0,0,10,10,10,10,10,10,10,10,10,1e9,1e2,1e2,1e3,1e6,1e6)
levels(data2$CROPDMGEXP) = c(0,0,0,1e9,1e3,1e3,1e6,1e6)
economicloss <- data2 %>% filter(!is.na(EVTYPE)) %>% group_by(EVTYPE) %>%
mutate(propdmgall = PROPDMG*as.numeric(as.character(PROPDMGEXP)) / 1e9,
cropdmgall = CROPDMG*as.numeric(as.character(CROPDMGEXP)) / 1e9) %>%
summarize(sum_propdmg = sum(propdmgall, na.rm = T), sum_cropdmg = sum(cropdmgall, na.rm = T)) %>%
mutate(sum_economicdmg = sum_propdmg + sum_cropdmg)
highest_economicdmg <- sort(economicloss$sum_economicdmg, decreasing = T)[1:10]
economicdmg_sel <- economicloss[economicloss$sum_economicdmg %in% highest_economicdmg, ]
We first apply ggplot to show the barplots of the total fatalities and injures vs the 10 event types.
library(ggplot2)
ggplot(health_fat_inj, aes(x = reorder(EVTYPE, sum_fat_inj), y = sum_fat_inj)) +
labs(x = "EVENT", y = "TOTAL FATALITIES AND INJURIES") + coord_flip() +
geom_bar(stat = "identity")
We then apply ggplot to show the barplots of the total economic consequences vs the 10 event types.
ggplot(economicdmg_sel, aes(x = reorder(EVTYPE, sum_economicdmg), y = sum_economicdmg)) +
labs(x = "EVENT", y = "TOTAL ECONOMIC LOSS / billion US$") + coord_flip() +
geom_bar(stat = "identity")
From the barplots, we can see clearly the top 10 events for the health and economic consequences in US based on the Storm Data. It turns out that the tornado causes most health losses while flood causes most economic losses. Especially, tornado causes way more health losses than other types of weather events.