The United States is affected by several different types of natural disasters throughout the year. This study summarizes the effects of the most harmful of these natural disasters on human health and economic well-being of the country. The analysis showed that Tornados are by far the most hazardous in terms of human injuries and fatalities. The greatest damages to property and crops was caused by floods, making them the most harmful event in terms of economic loss.
The data has been extracted from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The data consists of about 903,800 records of 37 different attributes about natural weather phenomenon.
First the data file was saved as a .csv file and loaded into R using the following code.
mydata<-read.csv("stormdata.csv")
Other relevant packages required for plotting and pre-processing were also loaded into R.
library(ggplot2)
library(reshape2)
Starting out with the events that were most hazardous to human health, the relevant colums were extracted from the main data set.
hdata<-mydata[,c("EVTYPE", "FATALITIES","INJURIES")]
Since many of the event names given in the data file had inconsistent formatting, they had to be grouped under major event categories.
hdata$EVTYPE<-gsub('.*STORM.*', 'STORM',hdata$EVTYPE, ignore.case = T)
hdata$EVTYPE<-gsub('.*TORN.*', 'TORNADO',hdata$EVTYPE, ignore.case = T)
hdata$EVTYPE<-gsub('.*HURRICANE.*', 'HURRICANE',hdata$EVTYPE, ignore.case = T)
hdata$EVTYPE<-gsub('.*WIND.*', 'WIND',hdata$EVTYPE, ignore.case = T)
hdata$EVTYPE<-gsub('.*HAIL.*', 'HAIL',hdata$EVTYPE, ignore.case = T)
hdata$EVTYPE<-gsub('.*RAIN.*', 'RAIN',hdata$EVTYPE, ignore.case = T)
hdata$EVTYPE<-gsub('.*SNOW.*', 'SNOW',hdata$EVTYPE, ignore.case = T)
hdata$EVTYPE<-gsub('.*COLD.*', 'COLD',hdata$EVTYPE, ignore.case = T)
hdata$EVTYPE<-gsub('.*LOW.*TEMPER.*', 'COLD',hdata$EVTYPE, ignore.case = T)
hdata$EVTYPE<-gsub('.*FROST.*', 'COLD',hdata$EVTYPE, ignore.case = T)
hdata$EVTYPE<-gsub('.*HEAT.*', 'HEAT',hdata$EVTYPE, ignore.case = T)
hdata$EVTYPE<-gsub('.*HIGH.*TEMPER.*', 'HEAT',hdata$EVTYPE, ignore.case = T)
hdata$EVTYPE<-gsub('.*TSTM.*', 'STORM',hdata$EVTYPE, ignore.case = T)
hdata$EVTYPE<-gsub('.*FIRE.*', 'FIRE',hdata$EVTYPE, ignore.case = T)
hdata$EVTYPE<-gsub('.*FLOOD.*', 'FLOOD',hdata$EVTYPE, ignore.case = T)
hdata$EVTYPE<-gsub('.*DRY.*', 'DRYNESS',hdata$EVTYPE, ignore.case = T)
The number of fatalities and injuries from each type of event were summed together to calculate a quantitative measure for the effect on population health.These sums were then sorted in descending order and the 10 most harmful events were isolated.
agg.hdata<-aggregate(.~EVTYPE, hdata, FUN=sum)
for (i in 1:nrow(agg.hdata)){agg.hdata$total[i]<-agg.hdata$FATALITIES[i]+
agg.hdata$INJURIES[i]}
agg.hdata<-agg.hdata[order(-agg.hdata$total),]
agg.hdata<-agg.hdata[c(1:10),c(1:3)]
A further processing step was required to get the data in a form that was conducible to plotting a stacked bar chart.
moltenhdata<-melt(agg.hdata, id.vars='EVTYPE')
In a similar manner, the columns relevant to economic loss were picked from the data set. Inconsistencies in the recorded exponents for property and crop damage were removed and the total monetary values of damage to properties and crops were calculated.
edata<-mydata[,c('EVTYPE', 'PROPDMG','PROPDMGEXP','CROPDMG','CROPDMGEXP')]
edata$PROPDMGEXP<-as.numeric(edata$PROPDMGEXP)
edata$CROPDMGEXP<-as.numeric(edata$CROPDMGEXP)
edata$PROPDMGEXP[(edata$PROPDMGEXP=='')|(edata$PROPDMGEXP=='+')|
(edata$PROPDMGEXP=='?')|(edata$PROPDMGEXP=='-')|(edata$PROPDMGEXP)=='0'|
(edata$PROPDMGEXP=='h')|(edata$PROPDMGEXP=='H')]<-0
edata$PROPDMGEXP[(edata$PROPDMGEXP=='K')]<-3
edata$PROPDMGEXP[(edata$PROPDMGEXP=='M')|(edata$PROPDMGEXP=='m')]<-6
edata$PROPDMGEXP[(edata$PROPDMGEXP=='B')]<-9
edata$CROPDMGEXP[(edata$CROPDMGEXP=='')|(edata$CROPDMGEXP=='?')]<-0
edata$CROPDMGEXP[(edata$CROPDMGEXP=='K')|(edata$CROPDMGEXP=='k')]<-3
edata$CROPDMGEXP[(edata$CROPDMGEXP=='M')|(edata$PROPDMGEXP=='m')]<-6
edata$CROPDMGEXP[(edata$CROPDMGEXP=='B')]<-9
edata$PROPDMG<-edata$PROPDMG*(10^edata$PROPDMGEXP)
edata$CROPDMG<-edata$CROPDMG*(10^edata$CROPDMGEXP)
Inconsistencies in the names of the events were also removed and the total loss incurred from each event type was calculated.
edata$EVTYPE<-gsub('.*STORM.*', 'STORM',hdata$EVTYPE, ignore.case = T)
edata$EVTYPE<-gsub('.*TORN.*', 'TORNADO',hdata$EVTYPE, ignore.case = T)
edata$EVTYPE<-gsub('.*HURRICANE.*', 'HURRICANE',hdata$EVTYPE, ignore.case = T)
edata$EVTYPE<-gsub('.*WIND.*', 'WIND',hdata$EVTYPE, ignore.case = T)
edata$EVTYPE<-gsub('.*HAIL.*', 'HAIL',hdata$EVTYPE, ignore.case = T)
edata$EVTYPE<-gsub('.*RAIN.*', 'RAIN',hdata$EVTYPE, ignore.case = T)
edata$EVTYPE<-gsub('.*SNOW.*', 'SNOW',hdata$EVTYPE, ignore.case = T)
edata$EVTYPE<-gsub('.*COLD.*', 'COLD',hdata$EVTYPE, ignore.case = T)
edata$EVTYPE<-gsub('.*LOW.*TEMPER.*', 'COLD',hdata$EVTYPE, ignore.case = T)
edata$EVTYPE<-gsub('.*FROST.*', 'COLD',hdata$EVTYPE, ignore.case = T)
edata$EVTYPE<-gsub('.*HEAT.*', 'HEAT',hdata$EVTYPE, ignore.case = T)
edata$EVTYPE<-gsub('.*HIGH.*TEMPER.*', 'HEAT',hdata$EVTYPE, ignore.case = T)
edata$EVTYPE<-gsub('.*TSTM.*', 'STORM',hdata$EVTYPE, ignore.case = T)
edata$EVTYPE<-gsub('.*FIRE.*', 'FIRE',hdata$EVTYPE, ignore.case = T)
edata$EVTYPE<-gsub('.*FLOOD.*', 'FLOOD',hdata$EVTYPE, ignore.case = T)
edata$EVTYPE<-gsub('.*DRY.*', 'DRYNESS',hdata$EVTYPE, ignore.case = T)
agg.edata<-aggregate(.~EVTYPE, edata, FUN=sum)
for (i in 1:nrow(agg.edata)){agg.edata$total[i]<-agg.edata$PROPDMG[i]+
agg.edata$CROPDMG[i]}
The 10 most damaging events were picked after arranging the events in descending order by the value of the loss incurred. Some further processing was completed to get the data in a form that is suitable to making a stacked bar chart.
agg.edata<-agg.edata[order(-agg.edata$total),]
agg.edata<-agg.edata[c(1:10),c('EVTYPE','PROPDMG','CROPDMG')]
moltenedata<-melt(agg.edata, id.vars='EVTYPE')
Stacked bar charts were plotted for both the injuries and loss of lives caused, as well the economic loss caused by the natural disasters. Different colors were used to represent the magnitude of each type of effect.
The first bar chart shows the number of people injured or killed by the 10 most harmful natural disasters.
ggplot(moltenhdata, aes(EVTYPE, value, fill=as.factor(variable)))+
geom_bar(stat='identity')+coord_flip()+
ggtitle('Weather Events Most Harmful to Population Health')+
xlab('Events')+ylab('No. of People')+
scale_fill_discrete('Effects')
As can be seen from the graph there were many more injuries than fatalities in each category. Tornados seemed to be causing the most damage in both categories. So the analysis shows that Tornados are the most harmful natural disaster to hit the US in terms of human health.
The second bar chart shows the amount of monetary loss incurred by the 10 most destructive natural disasters in terms of property as well as crop damage.
ggplot(moltenedata, aes(EVTYPE, value, fill=as.factor(variable)))+
geom_bar(stat='identity')+coord_flip()+
ggtitle('Weather Events Most Harmful to the Economy')+
xlab('Events')+ylab('Amount of Loss')+
scale_y_log10()+scale_fill_discrete('Effects',breaks=c("PROPDMG", "CROPDMG"),
labels=c("Property Damage", "Crop Damage"))
More loss has resulted from property damage than from crop damage in general. It can be seen from the graph that floods have caused the most economic loss to the US through damages to property and crops.