Synopsis

This report involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.

This database tracks the characteristics of major storms and weather events in the United States, including when and where they occur. The events in the database start in the year 1950 and end in November 2011.

This analysis address the following questions:

  1. Across the United States, which types of events are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Data Processing

Loading the library

library(plyr)

Loading data from the web site

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "stormdata.bz2")

Reading the file created above and putting the values in the “storm” variable

storm = read.csv("stormdata.bz2", header = T, sep = ",")

Now, We can see the data and the variables

head(storm)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6
colnames(storm)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Analysis

Initially the analysis will cover the issue below:

Across the United States, which types of events are most harmful with respect to population health?

To answer this question the code bellow aggregate the events and adding fatalities and injuries, because these two variables indicate which events are most harmful with respect to population health

h = aggregate(storm$FATALITIES+storm$INJURIES ~ storm$EVTYPE, sum, data = storm)
colnames(h) = c("EVTYPE", "SUM_ESTIMATES")
head(h)
##                  EVTYPE SUM_ESTIMATES
## 1    HIGH SURF ADVISORY             0
## 2         COASTAL FLOOD             0
## 3           FLASH FLOOD             0
## 4             LIGHTNING             0
## 5             TSTM WIND             0
## 6       TSTM WIND (G45)             0

Now the code below orders the data and sets the 10 most harmful events

h = head(h[order(h$SUM_ESTIMATES, decreasing = T),],10)
h
##                EVTYPE SUM_ESTIMATES
## 834           TORNADO         96979
## 130    EXCESSIVE HEAT          8428
## 856         TSTM WIND          7461
## 170             FLOOD          7259
## 464         LIGHTNING          6046
## 275              HEAT          3037
## 153       FLASH FLOOD          2755
## 427         ICE STORM          2064
## 760 THUNDERSTORM WIND          1621
## 972      WINTER STORM          1527

The code below creates a barplot with the most harmful events

barplot(h$SUM_ESTIMATES, 
        names.arg = h$EVTYPE, 
        cex.names = 0.4, 
        main = "Events are most harmful with respect to population health",
        xlab = "Type of Event", 
        ylab = "Sum of Injuries and Fatalities")

Now the analysis will cover the issue below:

Across the United States, which types of events have the greatest economic consequences?

To answer this question the variables PROPDMG, CROPDMG, PROPDMGEXP and CROPDMGEXP were used.

The variables PROPDMG and CROPDMG indicate the value of damages in the events.

The variables PROPDMGEXP and CROPDMGEXP indicate alphabetical characters used to signify magnitude.

include “K” for thousands, “M” for millions, and “B” for billions.

Therefore, the PROPDMG and CROPDMG variables will be multiplied accordingly with, “M”, “K”, “B” which mean 1000000, 1000, 1000000000 respectively.

storm$PROPDMGEXP_NEW = ifelse(storm$PROPDMGEXP== "K", 1000, 
        ifelse(storm$PROPDMGEXP== "M", 1000000, 
               ifelse(storm$PROPDMGEXP == "B", 1000000000, 1)
               )
        )

storm$CROPDMGEXP_NEW = ifelse(storm$CROPDMGEXP== "K", 1000, 
        ifelse(storm$CROPDMGEXP== "M", 1000000, 
               ifelse(storm$CROPDMGEXP == "B", 1000000000, 1)
               )
        )

storm$PROPTOTALDMG = as.numeric(storm$PROPDMGEXP_NEW) * storm$PROPDMG
storm$CROPTOTALDMG = as.numeric(storm$CROPDMGEXP_NEW) * storm$CROPDMG

Lastly, the variables PROPTOTALDMG and CROPTOTALDMG were add to obtain the total sum of the damages.

storm$TOTALDMG = storm$PROPTOTALDMG + storm$CROPTOTALDMG

Now the code below aggregates the events and sum of damage

h = aggregate(storm$TOTALDMG~storm$EVTYPE  , sum, data = storm)
colnames(h) = c("EVTYPE", "SUM_TOTALDMG")
head(h)
##                  EVTYPE SUM_TOTALDMG
## 1    HIGH SURF ADVISORY       200000
## 2         COASTAL FLOOD            0
## 3           FLASH FLOOD        50000
## 4             LIGHTNING            0
## 5             TSTM WIND      8100000
## 6       TSTM WIND (G45)         8000

The code below orders the data and sets the 10 events with the greatest economic consequences

h = head(h[order(h$SUM_TOTALDMG, decreasing = T),],10)
h
##                EVTYPE SUM_TOTALDMG
## 170             FLOOD 150319678257
## 411 HURRICANE/TYPHOON  71913712800
## 834           TORNADO  57340614060
## 670       STORM SURGE  43323541000
## 244              HAIL  18752904943
## 153       FLASH FLOOD  17562129167
## 95            DROUGHT  15018672000
## 402         HURRICANE  14610229010
## 590       RIVER FLOOD  10148404500
## 427         ICE STORM   8967041360

Now the code below creates a barplot with the greatest economic consequences

barplot(h$SUM_TOTALDMG, 
         names.arg = h$EVTYPE, 
         cex.names = 0.4, 
         main = "Events with the greatest economic consequences",
         xlab = "Type of Event", 
         ylab = "Sum of damage ($USD)")

Results

Across the United States, which types of events are most harmful with respect to population health?

We can see in the first chart above that the most harmful event is the Tornado event.

Across the United States, which types of events have the greatest economic consequences?

We can see in the second chart above that the event with the greatest economic consequences is the Flood event