Description

Data from the U.S. National Oceanic and Atmospheric Administration has been used to analyze the occurrence of natural events ranging from 1950 to 2011, in order to find which events are to be taken into account with a higher prioritie to safeguard population and material belongings.

Libraries Required

The project will require four different R libraries:

  • maps contains information on world maps,

  • ggplot2 for the graphics,

  • dplyr to transforming the data, and

  • reshape2 for its melt function.

library(maps)
library(ggplot2)
library(dplyr)
library(reshape2)

Data processing

Data is downloaded from the source read into the system.

download.file(url="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",destfile = "stormdata.bz2")
data<-read.csv(file="stormdata.bz2")

We’ll use dplyr to transform the data. The dataset is groouped by all the natural events listed in it, and we try to see how many different events there are listed. We can tell that they’re a bit too many.

datad<-tbl_df(data)
datagroup<-group_by(datad,EVTYPE)
length(unique(datagroup$EVTYPE))
## [1] 837

Since there are too many unique events are on the setlist, in order to analyze the data more clearly and given that the goal is to prioritize efforts in prioritizing resources, one assumption made during the exploratory analysis (which was proven true by the data later on) is that there could be natural events that are more harmful than others. In order to know which ones are they, a ranking of natural events was calculated regarding two different variables: injuries and fatalities.

Another assumption was made: that the rankings of most disastrous natural events can be different for injuries and deaths. In order to solve this, the analysis is performed twice for each variable (finding the top 50), and the events are combined into one bigger list.

ranking1<-arrange(summarize(datagroup,injuries=sum(INJURIES),fatalities=sum(FATALITIES)),desc(injuries))
ranking1f<-head(ranking1,50)

ranking2<-arrange(summarize(datagroup,injuries=sum(INJURIES),fatalities=sum(FATALITIES)),desc(fatalities))
ranking2f<-head(ranking2,50)

worstevents<-unique(c(as.character(ranking1f$EVTYPE),as.character(ranking2f$EVTYPE)))

worstevents
##  [1] "TORNADO"                    "TSTM WIND"                 
##  [3] "ICE STORM"                  "LIGHTNING"                 
##  [5] "THUNDERSTORM WINDS"         "HEAT"                      
##  [7] "BLIZZARD"                   "HAIL"                      
##  [9] "EXCESSIVE HEAT"             "HEAVY SNOW"                
## [11] "WINTER STORM"               "FLASH FLOOD"               
## [13] "FLOOD"                      "HEAT WAVE"                 
## [15] "HIGH WINDS"                 "HIGH WIND"                 
## [17] "DENSE FOG"                  "EXTREME COLD"              
## [19] "FOG"                        "EXTREME HEAT"              
## [21] "WILD FIRES"                 "ICE"                       
## [23] "DUST STORM"                 "THUNDERSTORM WIND"         
## [25] "RIP CURRENTS"               "Heat Wave"                 
## [27] "WINTER WEATHER"             "HEAVY RAIN"                
## [29] "RECORD HEAT"                "TROPICAL STORM GORDON"     
## [31] "WATERSPOUT/TORNADO"         "WILD/FOREST FIRE"          
## [33] "COLD"                       "SNOW/HIGH WINDS"           
## [35] "URBAN/SML STREAM FLD"       "STORM SURGE"               
## [37] "WATERSPOUT"                 "THUNDERSTORMW"             
## [39] "MIXED PRECIP"               "BLACK ICE"                 
## [41] "RIP CURRENT"                "FREEZING RAIN"             
## [43] "SNOW"                       "EXCESSIVE RAINFALL"        
## [45] "SNOW SQUALL"                "HIGH WIND AND SEAS"        
## [47] "AVALANCHE"                  "TSTM WIND/HAIL"            
## [49] "HURRICANE"                  "WINTER STORMS"             
## [51] "UNSEASONABLY WARM AND DRY"  "TORNADOES, TSTM WIND, HAIL"
## [53] "HIGH SURF"                  "FLASH FLOODING"            
## [55] "FLOOD/FLASH FLOOD"          "RECORD/EXCESSIVE HEAT"     
## [57] "COLD AND SNOW"              "FLASH FLOOD/FLOOD"         
## [59] "UNSEASONABLY WARM"          "EXTREME WINDCHILL"         
## [61] "WIND"                       "LOW TEMPERATURE"           
## [63] "MARINE MISHAP"              "FLOODING"                  
## [65] "GLAZE"                      "HURRICANE ERIN"            
## [67] "STRONG WIND"                "FLASH FLOODING/FLOOD"

This list of the 60 most important events is further used to analyze the data.

A new, smaller dataset with the 60 most prioritized natural events is generated.

worstdata<-subset(ranking1,EVTYPE %in% worstevents)

datamelt<-melt(worstdata)

Results

The data is plotted. Notice that the data for injuries and deaths is added together to represent hazards to health in general, and as it was expected by the initial hypothesis, a few of the natural events take up most of the damages to human lives. Tornadoes, for this matter, are overwhealmingly dangerous in both counts.

ggplot(datamelt,aes(x=reorder(EVTYPE,-value),y=value,fill=variable))+geom_bar(stat="identity")+labs(x = "EVENT", y = "INJURIES AND FATALITIES", title = "Injuries and Fatalities on Natural Events")+ theme(axis.text.x=element_text(angle=90,size=8))

How do the rest of events fare compared to tornados? For the list of 60 events, we compare tornados to the rest of natural events.

worstdata2<-worstdata[2:nrow(worstdata),]

everythingelse<-cbind(EVTYPE="OTHER EVENTS", summarize(ranking1[-1,], injuries=sum(injuries),fatalities=sum(fatalities)))

rbind(worstdata[1,],everythingelse)
## Source: local data frame [2 x 3]
## 
##         EVTYPE injuries fatalities
##         (fctr)    (dbl)      (dbl)
## 1      TORNADO    72417       4216
## 2 OTHER EVENTS    16650       3337

You can tell that tornadoes are almost twice more harmful than the other 59 natural events put together.

listworst<-worstdata$EVTYPE[1:10]
listworst
##  [1] TORNADO            TSTM WIND          ICE STORM         
##  [4] LIGHTNING          THUNDERSTORM WINDS HEAT              
##  [7] BLIZZARD           HAIL               EXCESSIVE HEAT    
## [10] HEAVY SNOW        
## 837 Levels:  COASTAL FLOOD  TSTM WIND ? ... WINTRY MIX

This is the list of the ten most harmful natural events.

Another thing to consider is the fact that tornados do not ocurre in all parts of the United States. Therefore, the occurence of tornados was analyzed across all states in order to find a pattern.

tornadoplaces<-filter(datad,EVTYPE=="TORNADO")
datagroup2<-group_by(tornadoplaces,STATE)

tornadosummary<-arrange(summarize(datagroup2,injuries=sum(INJURIES),fatalities=sum(FATALITIES),total=sum(injuries,fatalities)),desc(injuries,fatalities))

tornadosummary<-mutate(tornadosummary,region=tolower(state.name[match(STATE,state.abb)]))

The results have been mapped on a map of the United States. Notice how most of the injuries and fatalities have occurred in Texas.

all_states <- map_data("state")
total <- merge(all_states, tornadosummary, by="region")

p <- ggplot()
p <- p + geom_polygon(data=total, aes(x=long, y=lat, group = group, fill=total),colour="white") +scale_fill_continuous(low = "#ece7f2", high ="#2b8cbe", guide="colorbar")
P1 <- p + theme_bw()  + labs(fill = "Total amount",title = "Injuries and Deaths by Tornadoes (1950-2011)", x="", y="")
P1 + scale_y_continuous(breaks=c()) + scale_x_continuous(breaks=c()) + theme(panel.border = element_blank())

Regarding economic damages, the variables related to property and crop damages have been considered. Also, the EXP variables, which contain data regarding the order of magnitud have been analyzed. For example, a DMG (damage) value of 2.5 with an exponent of K means 2.5*1000=USD2500 in damages.

damagedata<-select(datad,STATE,EVTYPE,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)
damagedata2<-mutate(damagedata,PROPDMG=ifelse (PROPDMGEXP == "K" | PROPDMGEXP == "k",  PROPDMG*1000, ifelse (PROPDMGEXP == "B" | PROPDMGEXP == "b",PROPDMG*1000000000,ifelse (PROPDMGEXP == "M" | PROPDMGEXP == "m",PROPDMG*1000000,0))),CROPDMG=ifelse (CROPDMGEXP == "K" | CROPDMGEXP == "k",  CROPDMG*1000, ifelse (CROPDMGEXP == "B" | CROPDMGEXP == "b", CROPDMG*1000000000,ifelse (CROPDMGEXP == "m" | CROPDMGEXP == "M",CROPDMG*1000000,0))))

damagedata2<-select(damagedata2,STATE,EVTYPE,PROPDMG,CROPDMG)
damagedata2<-mutate(damagedata2,TOTALDMG=PROPDMG+CROPDMG)

damagegroup<-group_by(damagedata2,EVTYPE)
damagerank<-arrange(summarize(damagegroup,damages=sum(TOTALDMG)),desc(damages))

In order to make the plot easier to read, data regarding damages will be represented in billions of dollars.

damagerank2<-mutate(damagerank,billions=damages/1000000000)
damagerank2
## Source: local data frame [837 x 3]
## 
##                       EVTYPE     damages  billions
##                       (fctr)       (dbl)     (dbl)
## 1                    TORNADO 33920346470 33.920346
## 2                RIVER FLOOD 10133367500 10.133368
## 3                      FLOOD  8917402100  8.917402
## 4                  ICE STORM  5570539650  5.570540
## 5               WINTER STORM  5265396600  5.265397
## 6             HURRICANE OPAL  3191846000  3.191846
## 7                       HAIL  2885282960  2.885283
## 8                FLASH FLOOD  2875089400  2.875089
## 9  HEAVY RAIN/SEVERE WEATHER  2500000000  2.500000
## 10        THUNDERSTORM WINDS  1926607550  1.926608
## ..                       ...         ...       ...

Here is a ranking of the most economically damaging events. We have considered the sum of both types of damages into this ranking.

Now we can plot the data.

ggplot(head(damagerank2,50),aes(x=reorder(EVTYPE,-billions),y=billions))+geom_bar(stat="identity",fill="#54afa3")+labs(x = "EVENT", y = "TOTAL DAMAGES IN BILLIONS OF DOLLARS (USD)", title = "Economic Losses Per Natural Event")+ theme(axis.text.x=element_text(angle=90,size=8))

From this, we can infer that the three most damaging events require much more prioritization than the rest of the events.