Synopsis

Floods, storms, hurricanes, earthquakes and tsunamis are natural disasters resulting from natural processes of the Earth.Every weather event can cause serious consequences in health and economics, resulting in loss of live, injuries and property damages.For communities and municipalities it’s important to prevent and estimate the infliction, prioritize the resources based on different types of events and establish actions plans that help build disaster resilent communities.The US National Oceanic and Atmospheric Administration’s collected information of storms from 1950 to 2011 in a database that tracks characteristics like when and where they occur and records of the damages or fatalities, that it’s the primary source of information for this project. This report aims to answer the questions of which type of event has the greatest and most harmful consequence over health and goods:

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

Data Processing

The data for this assignment comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. It can be downloaded from the course web site: Storm Data [47Mb].

The events in the database start in 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Loading and Cleaning Data

First the bzip file was downloaded from the course url previously shown.

#download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","storm.bz2")

If the file was downloaded correctly we go to read the csv file into a dataframe.

data_storm<-read.csv(bzfile("storm.bz2"),sep=",")

To make the dataset smaller and manageable we need to extract the variables related with the hypothesis to answer:

  • EVTYPE: Event type (e.g. tsunami, flood, etc.)
  • FATALITIES: Fatal harm to human health
  • INJURIES: Harm to human health
  • PROPDMG: Property damage
  • PROPDMGEXP: Amount of property damage (thousands, millions USD, etc.)
  • CROPDMG: Crop damage
  • CROPDMGEXP: Amount of crop damage (thousands, millions USD, etc.)
storm_event<-data_storm[,c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]

Next we need to make the EVTYPE,PROPDMGEXP and CROPDMGEXP to uppercase and PROPDMG and CROPDMG transform to numeric value.

storm_event$EVTYPE<-str_trim(toupper(storm_event$EVTYPE))
storm_event$PROPDMGEXP<-str_trim(toupper(storm_event$PROPDMGEXP))
storm_event$CROPDMGEXP<-str_trim(toupper(storm_event$CROPDMGEXP))

storm_event$PROPDMG<-as.numeric(storm_event$PROPDMG)
storm_event$CROPDMG<-as.numeric(storm_event$CROPDMG)

And the new dataset looks like this:

summary(storm_event)
##     EVTYPE            FATALITIES         INJURIES            PROPDMG      
##  Length:356162      Min.   :  0.000   Min.   :   0.0000   Min.   :  0.00  
##  Class :character   1st Qu.:  0.000   1st Qu.:   0.0000   1st Qu.:  0.00  
##  Mode  :character   Median :  0.000   Median :   0.0000   Median :  0.00  
##                     Mean   :  0.024   Mean   :   0.2848   Mean   : 13.05  
##                     3rd Qu.:  0.000   3rd Qu.:   0.0000   3rd Qu.:  0.00  
##                     Max.   :583.000   Max.   :1700.0000   Max.   :970.00  
##   PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Length:356162      Min.   :  0.000   Length:356162     
##  Class :character   1st Qu.:  0.000   Class :character  
##  Mode  :character   Median :  0.000   Mode  :character  
##                     Mean   :  1.198                     
##                     3rd Qu.:  0.000                     
##                     Max.   :978.000

Questions to Answer

Across the United States, which types of events are most harmful with respect to population health?

First extract the records with values

injuries_events<-subset(storm_event,storm_event$INJURIES!=0)
fatalities_events<-subset(storm_event,storm_event$FATALITIES!=0)

Then we need 2 summary datasets: injuries and fatalities

health_injuries<-aggregate(injuries_events$INJURIES,list(injuries_events$EVTYPE),sum)
health_fatalities<-aggregate(fatalities_events$FATALITIES,list(fatalities_events$EVTYPE),sum)

Now order the results and filter the top 10 events for the outcomes

health_injuries<-health_injuries[order(-health_injuries$x),][1:10,]
health_injuries
##                Group.1     x
## 98             TORNADO 74570
## 20               FLOOD  6466
## 104          TSTM WIND  4988
## 65           LIGHTNING  2095
## 61           ICE STORM  1852
## 13      EXCESSIVE HEAT  1667
## 91  THUNDERSTORM WINDS   908
## 34                HEAT   878
## 33                HAIL   781
## 3             BLIZZARD   777
health_fatalities<-health_fatalities[order(-health_fatalities$x),][1:10,]
health_fatalities
##            Group.1    x
## 112        TORNADO 4359
## 46            HEAT  706
## 19  EXCESSIVE HEAT  555
## 116      TSTM WIND  364
## 25     FLASH FLOOD  334
## 78       LIGHTNING  322
## 30           FLOOD  193
## 47       HEAT WAVE  172
## 22    EXTREME COLD  129
## 23    EXTREME HEAT   96

And now we see the plots for the 2 results

Across the United States, which types of events have the greatest economic consequences?

Firts subset the data with proper values

crop_event<-subset(storm_event,storm_event$CROPDMG!=0)
prop_event<-subset(storm_event,storm_event$PROPDMG!=0)

For this question we need to quantify the amount of harm for each type of event registered. Since the exponential value stored in another column and it’s represented by letters (H=hundred, K=thousand, M=million, B=billion) the calculation of the monetary amount needs extra transformation.

First create a function that permits applying the corresponding factor to each letter

factor<-function(val,amnt){
  if(val=="B") return(amnt*10^9)
  else if(val=="M") return(amnt*10^6)
  else if(val=="K") return(amnt*10^3)
  else if(val=="H") return(amnt*100)
  else return(amnt)
}

Next aggregate the calculated value using the function

crop_event$CROPAMNT<-mapply(factor,crop_event$CROPDMGEXP,crop_event$CROPDMG)
prop_event$PROPAMNT<-mapply(factor,prop_event$PROPDMGEXP,prop_event$PROPDMG)

Finally we calculate the sum per type of event

econ_crop<-aggregate(crop_event$CROPAMNT,list(crop_event$EVTYPE),sum)
econ_prop<-aggregate(prop_event$PROPAMNT,list(prop_event$EVTYPE),sum)

Taking the top 10 for both kind of economics consequences ordered by amount

econ_crop<-econ_crop[order(-econ_crop$x),][1:10,]
econ_crop
##         Group.1          x
## 66  RIVER FLOOD 5029459000
## 62    ICE STORM 5013448500
## 8       DROUGHT 3533141000
## 20        FLOOD 1378429050
## 55    HURRICANE 1252055000
## 14 EXTREME COLD 1140078000
## 33         HAIL 1117642273
## 16  FLASH FLOOD  501789100
## 26       FREEZE  403375000
## 42         HEAT  401285000
econ_prop<-econ_prop[order(-econ_prop$x),][1:10,]
econ_prop
##                       Group.1           x
## 276                   TORNADO 35576611369
## 49                      FLOOD  9710265457
## 332              WINTER STORM  5279478601
## 196               RIVER FLOOD  5118945500
## 146                 HURRICANE  5057822000
## 36                FLASH FLOOD  3525223957
## 81                       HAIL  3361966033
## 152            HURRICANE OPAL  3172846000
## 102 HEAVY RAIN/SEVERE WEATHER  2500000000
## 288                 TSTM WIND  2093300205

And now we can plot the results

Results

In order to make a conclusion for each of the questions (health and economic) it’s necessary to make a general summary for both.

Health Consequences

Calculate the total injuries and fatalities in human health

health_general<-aggregate(storm_event$INJURIES+storm_event$FATALITIES,list(storm_event$EVTYPE),sum)
health_general<-health_general[order(-health_general$x),][1:10,]
health_general
##                Group.1     x
## 686            TORNADO 78929
## 127              FLOOD  6659
## 706          TSTM WIND  5352
## 374          LIGHTNING  2417
## 95      EXCESSIVE HEAT  2222
## 348          ICE STORM  1907
## 211               HEAT  1584
## 111        FLASH FLOOD  1037
## 639 THUNDERSTORM WINDS   972
## 15            BLIZZARD   859

Economics Consequences

Calculate the general amount of damage in property and crops

econ_subset<-subset(storm_event[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")],storm_event$CROPDMG!=0 | storm_event$PROPDMG!=0)

econ_subset$CROPAMNT<-mapply(factor,econ_subset$CROPDMGEXP,econ_subset$CROPDMG)
econ_subset$PROPAMNT<-mapply(factor,econ_subset$PROPDMGEXP,econ_subset$PROPDMG)
econ_subset$AMOUNT<-econ_subset$CROPAMNT+econ_subset$PROPAMNT
head(econ_subset)
##    EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP CROPAMNT PROPAMNT AMOUNT
## 1 TORNADO    25.0          K       0                   0    25000  25000
## 2 TORNADO     2.5          K       0                   0     2500   2500
## 3 TORNADO    25.0          K       0                   0    25000  25000
## 4 TORNADO     2.5          K       0                   0     2500   2500
## 5 TORNADO     2.5          K       0                   0     2500   2500
## 6 TORNADO     2.5          K       0                   0     2500   2500
econ_general<-aggregate(econ_subset$AMOUNT,list(econ_subset$EVTYPE),sum)
econ_general<-econ_general[order(-econ_general$x),][1:10,]
econ_general
##            Group.1           x
## 294        TORNADO 35748425129
## 55           FLOOD 11088694507
## 209    RIVER FLOOD 10148404500
## 158      HURRICANE  6309877000
## 173      ICE STORM  6014393540
## 355   WINTER STORM  5305769601
## 88            HAIL  4479608306
## 42     FLASH FLOOD  4027013057
## 26         DROUGHT  3732546000
## 164 HURRICANE OPAL  3191846000

Conclusions

In conclusion, Tornados are the most dangerous type of event with the more severe consequences and relentless devastation causes for both health and economics.