Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. The National Oceanic and Atmospheric Administration’s (NOAA) storm database contains information about natural disasters that occured from 1950 to 2011. This analysis takes information from this database and identifies which events caused the most: Injuries, Fatalities, Property Damage and Crop Damage.

This project has two main objectives:

  1. Determine which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health.

  2. Determine which types of events have the greatest economic consequences.

Data Processing

The data is available in a bzip2 format to reduce its size. You can download the file from the course web site:

Data Processing I: Reading and Sorting Data by Event Type
## Reads data from a previously uncompressed .csv file
setwd("~/R/COURSERA5_STORM")
stormdata <- read.csv("repdata-data-StormData.csv.bz2")

head(stormdata)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6
Data Processing II: Summing Injuries, Fatalities, Property Damage and Crop Damage by Event Type

The following lines of code will create “stormdata_sums” which contains the sums for FATALITIES, INJURIES, PROPERTY DAMAGE AND CROP DAMAGE for each Event Type. Sums by Event Type will alow to analyze which Event Type had the most events or cost for each parameter analyzed.

## Calls library that contains the "ddply" command which is used to to sort the data by Event Type (EVTYPE) and apply the "sum" function to the resulting subset data. 
library(plyr)

## Subsets the data according to Event Type (EVTYPE) and applies the "sum" function to the following columns: FATALITIES, INJURIES, PROPERTY DAMAGE AND CROP DAMAGE. 
stormdata_sums <- ddply(stormdata, ~EVTYPE, summarise, FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES), PROPDMG = sum(PROPDMG), CROPDMG = sum(CROPDMG))

head(stormdata_sums)
##                 EVTYPE FATALITIES INJURIES PROPDMG CROPDMG
## 1                    ?          0        0       5    0.00
## 2       ABNORMALLY DRY          0        0       0    0.00
## 3       ABNORMALLY WET          0        0       0    0.00
## 4      ABNORMAL WARMTH          0        0       0    0.00
## 5 ACCUMULATED SNOWFALL          0        0       0    0.00
## 6  AGRICULTURAL FREEZE          0        0       0   28.82

Results

Results I: Sorting the Top 25 Events according to:
a. Most Injuries Caused
b. Most Fatalities Caused
c. Most Property Damage Caused
d. Most Crop Damage Caused

The following code takes “stormdata_sums” and subsets data by FATALITIES, INJURIES, PROPERTY DAMAGE AND CROP DAMAGE.

fatalities <- stormdata_sums[order(stormdata_sums$FATALITIES, decreasing = T), c("EVTYPE", "FATALITIES")][1:10, ]

injuries <- stormdata_sums[order(stormdata_sums$INJURIES, decreasing = T), c("EVTYPE", "INJURIES")][1:10, ]

property_damage <- stormdata_sums[order(stormdata_sums$PROPDMG, decreasing = T), c("EVTYPE", "PROPDMG")][1:10, ]

crop_damage <- stormdata_sums[order(stormdata_sums$CROPDMG, decreasing = T), c("EVTYPE", "CROPDMG")][1:10, ]

fatalities
##             EVTYPE FATALITIES
## 830        TORNADO       5633
## 123 EXCESSIVE HEAT       1903
## 147    FLASH FLOOD        978
## 269           HEAT        937
## 452      LIGHTNING        816
## 854      TSTM WIND        504
## 164          FLOOD        470
## 581    RIP CURRENT        368
## 354      HIGH WIND        248
## 11       AVALANCHE        224
injuries
##                EVTYPE INJURIES
## 830           TORNADO    91346
## 854         TSTM WIND     6957
## 164             FLOOD     6789
## 123    EXCESSIVE HEAT     6525
## 452         LIGHTNING     5230
## 269              HEAT     2100
## 424         ICE STORM     1975
## 147       FLASH FLOOD     1777
## 759 THUNDERSTORM WIND     1488
## 238              HAIL     1361
property_damage
##                 EVTYPE   PROPDMG
## 830            TORNADO 3212258.2
## 147        FLASH FLOOD 1420124.6
## 854          TSTM WIND 1335965.6
## 164              FLOOD  899938.5
## 759  THUNDERSTORM WIND  876844.2
## 238               HAIL  688693.4
## 452          LIGHTNING  603351.8
## 783 THUNDERSTORM WINDS  446293.2
## 354          HIGH WIND  324731.6
## 972       WINTER STORM  132720.6
crop_damage
##                 EVTYPE   CROPDMG
## 238               HAIL 579596.28
## 147        FLASH FLOOD 179200.46
## 164              FLOOD 168037.88
## 854          TSTM WIND 109202.60
## 830            TORNADO 100018.52
## 759  THUNDERSTORM WIND  66791.45
## 88             DROUGHT  33898.62
## 783 THUNDERSTORM WINDS  18684.93
## 354          HIGH WIND  17283.21
## 284         HEAVY RAIN  11122.80
Results II: Graphing Results

These lines of code graph the results obtained in the previous section.

## Sets graph margins
par(mar = c(12,4,4,2), mgp = c(0, 2, 2))
par(mfrow = c(1,2))
    
## Makes barplot for Events the caused most Injuries
injuries_graph <- barplot(injuries$INJURIES, 
                          names = injuries$EVTYPE,
                          main = "Event Type with Most Injuires in US 1950 - 2011
                          (INJURIES)",
                          xlab = "Event Type", ylab = "Injuries", las = 2,
                          cex.main = 0.75, cex.axis = 0.6, cex.names = 0.6)

## Makes barplot for Events the caused most Fatalities
fatalities_graph <- barplot(fatalities$FATALITIES, 
                            names = injuries$EVTYPE,
                            main = "Event Type with Most Fatalities in US 1950 - 2011
                            (FATALITIES)",
                            xlab = "Event Type", ylab = "Fatalities", las = 2,
                            cex.main = 0.75, cex.axis = 0.6, cex.names = 0.6)

The plots show that Tornadoes are cause the most Injuries and Fatalities of all disasters by a large margin.

par(mar = c(12,4,4,2), mgp = c(0, 2, 2))
par(mfrow = c(1,1))
## Makes barplot for Events the caused most Property Damage
propertydmg_graph <- barplot(property_damage$PROPDMG, 
                            names = property_damage$EVTYPE,
                            main = "Event Type with Most Property Damage in US 1950 - 2011
                            (USD)",
                            xlab = "Event Type", ylab = "Property Damage", las = 2, 
                            cex.axis = 0.6, cex.names = 0.6)

As with Injuries and Fatalities, Tornadoes are cause the most Property Damage of all disasters.

par(mar = c(12,4,4,2), mgp = c(0, 2, 2))
## Makes barplot for Events the caused most Crop Damage
cropdmg_graph <- barplot(crop_damage$CROPDMG, 
                        names = crop_damage$EVTYPE,
                        main = "Event Type with Most Crop Damage in US 1950 - 2011
                        (USD)", 
                        xlab = "Event Type", ylab = "Crop Damage", las = 2, 
                        cex.axis = 0.6, cex.names = 0.6)

For Crop Damage, Hail cause the most Property Damage of all disasters.