This data analysis uses the Storm Database from the U.S. National Oceanic and Atmospheric Administration in order to answer two questions:
The data used in this analysis only takes weather events around the United States of America in to account. The data used for the most harmful event types were extracted using the Injuries and Fatalities columns and plotted to show that tornadoes are the most harmful events. Analysis showed that tornadoes are 10 times more harmful than the next most harmful event. The data used for the most economically damaging events were extracted to show the Property and Crop Damage amounts as well as their relevant multipliers. Plotting this data along with some analysis shows that floods are the most damaging, although this damage is almost entirely concentrated on property damage and has very little crop damage associated with it.
These are the R libraries that will be used during this analysis. We also prepare the environment as required.
options(scipen = 999)
library(dplyr)
library(ggplot2)
library(reshape2)
The NOAA Storm Database used in this analysis has been downloaded from the URL shown here on the date 23 February 2016.
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
stormdata <- read.csv("stormdata.csv.bz2")
The downloaded data is now available in the stormdata object, which will be used for the rest of this analysis.
Severe weather events can have dire consequences for human populations that are affected. These can vary from mild inconvenience, to loss of property to the injury and even death of people. We would like to examine the type of weather events that are most harmful to the health of people.
The Storm Database contains columns for Fatalities and Injuries caused by each weather type. We will simply rearrange the given data to show the event type that causes the highest number of casualties.
casualties <- stormdata %>% select(EVTYPE, FATALITIES, INJURIES) %>% group_by(EVTYPE) %>% summarise(count_FAT = sum(FATALITIES), count_INJ = sum(INJURIES)) %>% mutate(total = count_FAT + count_INJ)
The most harmful event is a full order of magnitude more harmful than the next most harmful event. This presents a small conundrum in that we do not get a really good picture of how damaging the other events can be. In order to show this, we have ordered the top 10 remaining events by their harmfulness and shown them in a histogram. Although this is not exactly required, it is interesting to see the rest of the picture.
topten <- casualties %>% arrange(desc(total)) %>% slice(2:11) %>% select(EVTYPE, count_FAT, count_INJ)
topten <- melt(topten)
graph <- ggplot(topten, aes(x=EVTYPE, y=value, fill=variable))
Storms and extreme weather events can also have severe economic consequences. These consequences are recorded in the Storm Database and can be extracted to show which types of weather events have the highest economic losses associated with them.
The Storm Database uses the headings PROPDMG and CROPDMG to show the value of the damage to property and crops respectively. In addition to this, there is a coded value stored in PROPDMGEXP and CROPDMGEXP which shows a scaling factor (typically thousands, millions and billions) which applies to the damage figure. The data transformation and processing here will first isolate these fields, transform the exponent from the coded value to the equivalent number (ie. “K” will become 3) and multiple the damage by the exponent (ie damage * 10 ^ exponent) and finally order these by the most damaging event type. A graph pointer will also be prepared showing the top 10 most damaging events.
economics <- stormdata %>% select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>% group_by(EVTYPE, PROPDMGEXP, CROPDMGEXP) %>% summarise(PROPDMG = sum(PROPDMG), CROPDMG = sum(CROPDMG))
economics$PROPDMGEXP <- sapply(economics$PROPDMGEXP, function(x) {
if(x=="K" | x=="k") 3
else if(x=="M" | x=="m") 6
else if(x=="B" | x=="B") 9
else 1
})
economics$CROPDMGEXP <- sapply(economics$CROPDMGEXP, function(x) {
if(x=="K" | x=="k") 3
else if(x=="M" | x=="m") 6
else if(x=="B" | x=="b") 9
else 1
})
economics <- economics %>% mutate(totalprop = PROPDMG * 10 ^ as.numeric(as.character(PROPDMGEXP))) %>% mutate(totalcrop = CROPDMG * 10 ^ as.numeric(as.character(CROPDMGEXP))) %>% select(EVTYPE, totalprop, totalcrop) %>% ungroup() %>% group_by(EVTYPE) %>% summarise(totalprop = sum(totalprop), totalcrop = sum(totalcrop)) %>% ungroup()
propdamage <- economics %>% select(EVTYPE, totalprop) %>% arrange(desc(totalprop)) %>% slice(1:10)
cropdamage <- economics %>% select(EVTYPE, totalprop) %>% arrange(desc(totalprop)) %>% slice(1:10)
totaldamage <- economics %>% mutate(total = totalprop + totalcrop) %>% arrange(desc(total)) %>% slice(1:10) %>% ungroup() %>% select(EVTYPE, totalprop, totalcrop) %>% melt()
grapheco <- ggplot(totaldamage, aes(x=EVTYPE, y=value, fill=variable))
This section will present the results for the most harmful event in terms of human health.
Now that the data has been well organised, we can see that the event type TORNADO causes the most casualties with a recorded 5633 fatalities and 91346 injuries.
As was already discussed, the most harmful event causes so much human health problems that we cannot compare it to other events at all. After removing this event, we want to see what are the next most harmful events, as shown.
graph + geom_bar(stat="identity", position="stack") + labs(x="Event Type", y = "Casualties", title="The Top 10 Harmful Events, Excluding Tornadoes") + scale_fill_discrete(name="Casualty Type", labels=c("Fatalities", "Injuries"))+ theme(axis.text.x = element_text(angle = 45, hjust = 1))
Figure 1: The Top 10 Most Harmful Events, Excluding Tornadoes
As can be seen from Figure 1, the next most harmful events have significantly lower casualty figures than those from a TORNADO, although they are devastating in their own right.
The economic consequences of severe weather can include damage to both property and crops. The top 10 most damaging events can be shown here.
grapheco + geom_bar(stat="identity", position="stack") + labs(x="Event Type", y = "Total Economic Consequences (Billions of Dollars)", title="The Top 10 Events with the highest Economic Consequences") + scale_fill_discrete(name="Damage Type", labels=c("Property Damage", "Crop Damage"))+ theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous(labels=function(x) x/1000000000)
Figure 2: The Top 10 Most Costly Events
As can be seen from Figure 2, events associated with floods are the mostly costly in terms of economic consequences. It is interesting to note that for all of these events, the vast majority of the damage affects property rather than crops.
From the severe weather events that affect the United States of America, it can be seen that tornadoes cause the highest number of casualties. These casualty numbers are an order of magnitude higher than the next most harmful event, excessive heat. In terms of economic consequences, it can be seen that floods are the most damaging and it is shown that property damage is much higher than crop damage.