Summary

This work is based on the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database has information about storms and other extreme weather events in the USA over the period 1950-2011. For each registered weather event it contains information like:

The data uses a clasification of 985 different types of events. There are in total close to 1 million events recorded.

This work analyzes which are the most dangerous types of events, and how much is the damage they cause to people and property. The conclusion is that tornados are the worst type of events, they account for 65% of the total number of injuries, 37% of fatalities and 27% of the total of damage costs. Floods, thunderstorms and hail come next in terms of costs, while excesive heat is the second cause of mortality and injuries.

Data Processing

The data was provided in a zipped csv file.The file is zipped in csv.bz2 format, that can be unzipped by read.csv(). The default settings of read.csv() are the right ones to process the file.

We also load the dplyr library, that we will need in our analysis:

stormsData <- read.csv("repdata%2Fdata%2FStormData.csv.bz2")
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

We then calculate the percentage from the total, of injuries and fatalities for each type of event, selecting the top 5:

totalFatalities <- sum(stormsData$FATALITIES)
topFatalitiyEvents <- stormsData %>%   group_by(EVTYPE) %>%   summarise(fatalities = sum(FATALITIES*100/totalFatalities, na.rm=T)) %>% arrange(desc(fatalities)) %>% top_n(5)
## Selecting by fatalities
# Remove levels of EVTYPE that are not in the top 5
topFatalitiyEvents$EVTYPE <- factor(topFatalitiyEvents$EVTYPE)
# Create a vector fatalitiesValues to be used to plot the top 5 events
fatalitiesValues=topFatalitiyEvents$fatalities
# Add the rest of the events together as 1 value at the end of the vector
fatalitiesValues <- c(fatalitiesValues, 100-sum(fatalitiesValues))
# Populate the names so they can be displayed in the plot
names(fatalitiesValues) <- topFatalitiyEvents$EVTYPE
names(fatalitiesValues)[6] <- "All other"
# Make the plot of fatalities
barplot(fatalitiesValues, xlab="Event type", ylab="Percentage of total fatalities", cex.names=0.5, main="Top 5 events causing fatalities")

totalInjuries <- sum(stormsData$INJURIES)
topInjuryEvents <- stormsData %>%   group_by(EVTYPE) %>%   summarise(injuries = sum(INJURIES*100/totalInjuries, na.rm=T))  %>% arrange(desc(injuries)) %>% top_n(5)
## Selecting by injuries
# Remove levels of EVTYPE that are not in the top 5
topInjuryEvents$EVTYPE <- factor(topInjuryEvents$EVTYPE)
# Create a vector injuriesValues to be used to plot the top 5 events
injuriesValues=topInjuryEvents$injuries
# Add the rest of the events together as 1 value
injuriesValues <- c(injuriesValues, 100-sum(injuriesValues))
names(injuriesValues) <- topInjuryEvents$EVTYPE
# Populate the names so they can be displayed in the plot
names(injuriesValues)[6] <- "All other"
# Make the plot of injuries
barplot(injuriesValues, xlab="Event type", ylab="Percentage of total injuries", cex.names=0.5, main="Top 5  events causing injuries")

Similarly we calculate the damages for each type of event:

totalDamage <- sum(stormsData$CROPDMG) + sum(stormsData$PROPDMG)
topDamageEvents <- stormsData %>%   group_by(EVTYPE) %>%  summarise(cost = sum(CROPDMG*100/totalDamage, na.rm=T) + sum(PROPDMG*100/totalDamage, na.rm=T)) %>% arrange(desc(cost)) %>% top_n(5)
## Selecting by cost
# Remove levels of EVTYPE that are not in the top 5
topDamageEvents$EVTYPE <- factor(topDamageEvents$EVTYPE)
# Create a vector injuriesValues to be used to plot the top 5 events
damageValues=topDamageEvents$cost
# Add the rest of the events together as 1 value
damageValues <- c(damageValues, 100-sum(damageValues))
# Populate the names so they can be displayed in the plot
names(damageValues) <- topDamageEvents$EVTYPE
names(damageValues)[6] <-   "All other"
# Make the plot of damages
barplot(damageValues, xlab="Event type", ylab="Percentage of total cost", cex.names=0.5, main="Top 5 events causing damage")

Results

It is clear from the graphics that by far the worst type of event in both damages and personal injuries are tornadoes. Any policy that attempts to reduce the negative impact of weather events should concentrate in this type, as it alone accounts for two thirds of injuries and more than a quarter of damages.