Synopsis

The goal of this project is to explore the Storm Data from National Oceanic and Atmospheric Administration (NOAA) for the period between 1950 and November 2011. The dataset documents mainly the occurrence of storms and other significant weather phenomena. In particular, the analysis should answer the following two questions:

  1. Across the United States, which types of events (as indicated in the ‘EVTYPE’ variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Data Processing

Data file was downloaded from the website of the Reproducible Research Course on July 7th, 2017. The following code read the downloaded file into R. The storm dataset from NOAA contains 902297 obsservations of 37 variables.

rawData <- read.csv('StormData.csv.bz2')

To answer Question #1 it is demanded to find out which is the most harmful event with respect to population health. I selected the number of fatalitites and injuries per event as the variables that better represent the impact on population health. Therefore, I summed up the number of fatalities and injuries per type of event to find out which is the most harmful event.

The following code sums up fatalities, injuries per type of event to answer Questions #1.

fatalities <- aggregate(rawData$FATALITIES ~ rawData$EVTYPE, FUN = sum)
colnames(fatalities) <- c('event', 'deaths')
fatalities <- fatalities[fatalities$deaths > 0,]
fatalities <- fatalities[order(-fatalities[,2]), ]

injuries <- aggregate(rawData$INJURIES ~ rawData$EVTYPE, FUN = sum)
colnames(injuries) <- c('event', 'harms')
injuries <- injuries[injuries$harms > 0,]
injuries <- injuries[order(-injuries[,2]), ]

To answer Question #2 it is demanded to find out which events have the greatest economic consequences. I selected the property and crop damage per event as the variables that better represent the impact on economics. Therefore, I summed up these variables per type of event to find out which is the event with the greatest economic consequences.

The following code sums up property to crop damages to estimate the economic impact and answer Question#2.

propDamage <- aggregate(rawData$PROPDMG ~ rawData$EVTYPE, FUN = sum)
colnames(propDamage) <- c('event', 'damage')
cropDamage <- aggregate(rawData$CROPDMG ~ rawData$EVTYPE, FUN = sum)
colnames(cropDamage) <- c('event', 'damage')
dataDamage <- propDamage
dataDamage$totalDamage <- propDamage$damage+cropDamage$damage
dataDamage <- dataDamage[dataDamage$damage > 0,]
dataDamage <- dataDamage[order(-dataDamage[,2]), ]

Results

Question #1. Across the United States, which types of events (as indicated in the ‘EVTYPE’ variable) are most harmful with respect to population health?

library(ggplot2)
ggplot(data=fatalities[1:5,], aes(event, deaths)) + geom_bar(stat="identity") + 
        scale_x_discrete(limits=c(as.character(fatalities$event[1:5]))) +
        theme(axis.text.x=element_text(angle=45, hjust=1), plot.title = element_text(hjust = 0.5),
              plot.caption=element_text(size=8, hjust=0, margin=margin(t=15))) +
        labs(x = "Event", y='Fatalities', caption= 'Number of fatalities as a function of events of severe weather \n phenomena in US.  Tornados are the most harmful events.') +
        ggtitle('Top 5 severe weather phenomena \n leading to fatalities in US')

ggplot(data=injuries[1:5,], aes(event, harms)) + geom_bar(stat="identity") +
        scale_x_discrete(limits=c(as.character(injuries$event[1:5])))+
        theme(axis.text.x=element_text(angle=45, hjust=1), plot.title = element_text(hjust = 0.5),
              plot.caption=element_text(size=8, hjust=0, margin=margin(t=15))) +
        labs(x = "Event", y='Injuries', caption= 'Number of injuries as a function of events of severe weather \n phenomena in US.  Tornados are the most harmful events.') + 
                ggtitle('Top 5 severe weather phenomena \n leading to injuries in US')

Tornado is the most harmful weather phenomena with respect to population health. Tornados cause a large number of fatalities and injuries in US. The second most severe weather phenomena is excessive heat, with number of fatalities much higher than the other types of events.

Question #2. Across the United States, which types of events have the greatest economic consequences?

library(ggplot2)
ggplot(data=dataDamage[1:5,], aes(event, damage)) + geom_bar(stat="identity") + 
        scale_x_discrete(limits=c(as.character(dataDamage$event[1:5]))) +
        theme(axis.text.x=element_text(angle=45, hjust=1), plot.title = element_text(hjust = 0.5),
              plot.caption=element_text(size=8, hjust=0, margin=margin(t=15))) +
        labs(x = "Event", y='Damages', caption= 'Property and crop damage as a function of events of severe weather \n phenomena in US.  Tornados are the most harmful events.') +
        ggtitle('Top 5 severe weather phenomena \n leading to economic damage in US')

Tornados, flash flood and TSTM Wind are the most harmful weather phenomena with respect to economic consequences. These events cause a greatest impact to economy in US due to property and crop damage.