This study explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The events in the database start in the year 1950 and end in November 2011. Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. The results of the analysis show that tornados are the weather event that causes the greatest number of fatalities and injuries and also the event that has the worst economic consequences.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The data for this assignment comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The file was downloaded from http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 on Fri September 25 2015 15:14. The first step is to read the data:
projdata <- read.csv("repdata-data-StormData.csv.bz2")
This data records the impact of an event on population health by logging the number of fatalities and injured people as a result of the event. This report considers the event which causes the greatest number of fatalities and injuries combined the most harmful. As fatalities and number of injuries are logged separately, it is necessary to transform the two into one variable by adding them:
projdata$FatalitiesAndInjuries <- projdata$FATALITIES + projdata$INJURIES
The event which is the most harmful with respect to population health is the event which has the highest value for this new variable ‘FatalitiesAndInjuries’ overall. To discover which is the most harmful event, it is necessary to aggregate the ‘FatalitiesAndInjuries’ variable for all the records in the data by event.
harmfulness <- aggregate(FatalitiesAndInjuries~EVTYPE,projdata,sum)
mostharmfulevent <- harmfulness[ which(harmfulness$FatalitiesAndInjuries==max(harmfulness$FatalitiesAndInjuries)), "EVTYPE" ]
Therefore, the most harmful event is the TORNADO. The following plot shows how the harmfulness of this event compares to that of the other top 10 most harmful events.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
harmfulness <- harmfulness[ rev(order(harmfulness$FatalitiesAndInjuries)), ]
harmfulness <- harmfulness[1:10,]
gg <- ggplot(harmfulness,aes(x=EVTYPE,y=FatalitiesAndInjuries,fill=EVTYPE)) + geom_bar(stat="identity")
gg <- gg + xlab("Event type") + ylab("Number of fatalities and injuries")
gg <- gg + theme(axis.text.x = element_text(angle=90))
gg <- gg + ggtitle("The 10 most harmful events\n in terms of caused fatalities and injuries")
gg <- gg + guides(fill=guide_legend(title="Event type"))
print(gg)
The NOAA storm database provides estimates of the property and crop damages as actual dollar amounts. To analyse the economic consequences of the events, a new variable is created that sums the property and crop damages.
projdata$Damages <- projdata$PROPDMG + projdata$CROPDMG
The event which has the greatest economic consequences is that which generates the greatest damage. To discover which it is, it is necessary to aggregate the new ‘Damages’ variable for all the records in the data by event.
totaldamages <- aggregate(Damages~EVTYPE,projdata,sum)
mostcostlyevent <- totaldamages[ which(totaldamages$Damages==max(totaldamages$Damages)), "EVTYPE" ]
Therefore, the event with the greatest economic consequences is the TORNADO. The following plot shows how the economic consequences of this event compare to those of the other top 10 events with the greatest economic consequences.
library(ggplot2)
totaldamages <- totaldamages[ rev(order(totaldamages$Damages)), ]
totaldamages <- totaldamages[1:10,]
gg <- ggplot(totaldamages,aes(x=EVTYPE,y=Damages,fill=EVTYPE)) + geom_bar(stat="identity")
gg <- gg + xlab("Event type") + ylab("Economic consequences (in US dollars)")
gg <- gg + theme(axis.text.x = element_text(angle=90))
gg <- gg + ggtitle("The 10 events with the greatest economic consequences")
gg <- gg + guides(fill=guide_legend(title="Event type"))
print(gg)
The results of the analysis show that tornados are the weather event that causes the greatest number of fatalities and injuries, with a much greater number of fatalities and injuries than the other top 10 events. Tornados are also the event that has the worst economic consequences, followed by flash floods with a significantly lower economic impact.