Major environmental events that have had a significant consequence on human health and the economy

Synopsis

Storms and other severe weather events can cause both public health and economic problems for States. Many severe events can result in fatalities, injuries, and property damage. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.

The basic goal of this report is to address the following questions:

Across the United States, which types of events (as indicated in the 𝙴𝚅𝚃𝚈𝙿𝙴 variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?

Loading and reading the raw data

From the NOAA storm database we obtained data on of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The events in the database start in the year 1950 and end in November 2011.

The raw data comes in a “.csv.bz2 archive” it was read using read.csv(). This information was stored in a data frame called “data”.

if(!file.exists("dataNOAA")){
  dir.create("dataNOAA")}

fileUrl<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl,destfile="./dataNOAA/data.bz2")
data<-read.csv("./dataNOAA/data.bz2")

Data Processing

Three data frames were generated by selecting columns from the data frame “data”: ndata2, ndatab2

ndata2: contains columns FATALITIES and EVTYPE (Event Type) Event types were arranged according to the number of fatalities and those events which represent 80% of accumulated fatalities were selected.

ndata<-data[,c(7,8,23)]
ndata2<-aggregate(data=ndata,FATALITIES~EVTYPE,sum)
ndata2<- transform(ndata2, EVTYPE = reorder(EVTYPE, FATALITIES))
ndata2<-ndata2[order(-ndata2$FATALITIES),] 
suma<-sum(ndata2$FATALITIES)
ndata2$p<-ndata2$FATALITIES/suma
ndata2$s<-cumsum(ndata2$p)
ndata2<-subset(ndata2,ndata2$s<0.8)

ndatab2: contains columns STATE, EVTYPE (Event Type) and ECONOMIC_losses (Economic losses). For constructing this data frame it was necessesary to convert columns PROPDMGEXP and CROPDMGEXP to factor numbers that were multiplied by columns ROPDMG and CROPDMG respectively. Then ECONOMIC_losses column was rearranged and thos events that represent alltogether 80% of losses were selected.

ndatab<-data[,c(8,25,26,27,28)]
ndatab$PROPDMGEXP_factor <- sapply(data$PROPDMGEXP, function(x) {
        if(x=="K") 1000 else 
                if(x=="M") 1000000 else 
                        if(x=="B") 1000000000 else 
                                1})
ndatab$CROPDMGEXP_factor <- sapply(data$CROPDMGEXP, function(x) {
        if(x=="K") 1000 else 
                if(x=="M") 1000000 else 
                        if(x=="B") 1000000000 else 
                                1})

ndatab$PROPDMG_USDM<-ndatab$PROPDMGEXP_factor*ndatab$PROPDMG/1000000
ndatab$CROPDMG_USDM<-ndatab$CROPDMGEXP_factor*ndatab$CROPDMG/1000000

ndatab$PROPDMGEXP<-NULL
ndatab$CROPDMGEXP<-NULL
ndatab$CROPDMG<-NULL
ndatab$CROPDMGEXP_factor<-NULL
ndatab$PROPDMG<-NULL
ndatab$PROPDMGEXP_factor<-NULL
ndatab$ECONOMIC_losses<-ndatab$PROPDMG_USD+ndatab$CROPDMG_USD
ndatab$PROPDMG_USD<-NULL
ndatab$CROPDMG_USD<-NULL

ndatab<-aggregate(data=ndatab,ECONOMIC_losses~EVTYPE,sum)
ndatab<- transform(ndatab, EVTYPE = reorder(EVTYPE,ECONOMIC_losses))
ndatab<-ndatab[order(-ndatab$ECONOMIC_losses),] 
suma2<-sum(ndatab$ECONOMIC_losses)
ndatab$p<-ndatab$ECONOMIC_losses/suma2
ndatab$s<-cumsum(ndatab$p)
ndatab<-subset(ndatab,ndatab$s<0.8)

Results

In the following graph the main weather events are shown.

g1<-ggplot(data=ndata2, aes(x=EVTYPE, y=FATALITIES) )+
    geom_bar(stat="identity")+coord_flip()+ggtitle("Main weather events measured in fatalities")+xlab("Event Type")+ylab("Fatalities")
print(g1)

Tornados are by large the most damaging event, then excessive heat and flash flood.

The distribution of fatalities due to tornados, exccesive heat and flahs floods varies along the states. The largest fatality numbers due to Tornados are in AL,TX,MS,MO,AR and TN. The largest fatality numbers due to Flash Floods are in TX,MO and MS. The largest fatality numbers due to Excessive heat are in PA, IL and TX

ndata3<-aggregate(data=ndata,FATALITIES~EVTYPE+STATE,sum)
ndata3<-inner_join(ndata3,ndata2,by="EVTYPE")

## Warning in inner_join_impl(x, y, by$x, by$y): joining factors with
## different levels, coercing to character vector

ndata3$FATALITIES.y<-NULL
ndata3$p<-NULL
ndata3$s<-NULL
ndata3$FATALITIES<-ndata3$FATALITIES.x
ndata3$FATALITIES.x<-NULL
ndata3$EVTYPE<-as.factor(ndata3$EVTYPE)
ndata3<-subset(ndata3,ndata3$EVTYPE=="TORNADO" | ndata3$EVTYPE=="EXCESSIVE HEAT" | ndata3$EVTYPE=="FLASH FLOOD")
g2<-ggplot(data=ndata3, aes(STATE,FATALITIES) )+
geom_bar(stat="identity")+coord_flip()+facet_wrap(~EVTYPE)
print(g2)

In the following graph the main weather events that produced are depicted.

g3<-ggplot(data=ndatab, aes(x=EVTYPE, y=ECONOMIC_losses) )+
        geom_bar(stat="identity")+coord_flip()+ggtitle("Main weather events measured in ECONOMIC losses")+xlab("Event Type")+ylab("ECONOMIC losses / USD Millions")
print(g3)

Conclusions

Across the United States, floods, hurricanes,tornados are the most harmful with respect to population health between 1950 and 2011.

Floods, hurricanes,Tornados are the weather events that produced the largest economic losses between 1950 and 2011.