Jonathan Mallia
17th December, 2016

Synopsis

Storms and other severe weather events can cause severe problems for communities and municipalities with respect to both the public health and the economy. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This report explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and aims to answer the following questions:

1.Across the United States, which types of events (as indicated in the Event variable) are most harmful with respect to population health?

2.Across the United States, which types of events have the greatest economic consequences?

The results show that, in the past 60 years, tornados are most harmful with respect to population health, which have led to 5633 deaths and 91346 injuries and floods have the greatest economic consequences, which have cause over 150 billion dollars in economic losses.

Data Processing

First, load the data which is in .bz2 format, into a data frame in R

We shall use the following library

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Verify the number of rows and columns in the entire data set

dim(Data)
## [1] 902297     37

View the first 10 rows of the data set

head(Data, 10)
##    STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1        1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2        1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3        1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4        1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5        1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6        1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
## 7        1 11/16/1951 0:00:00     0100       CST      9     BLOUNT    AL
## 8        1  1/22/1952 0:00:00     0900       CST    123 TALLAPOOSA    AL
## 9        1  2/13/1952 0:00:00     2000       CST    125 TUSCALOOSA    AL
## 10       1  2/13/1952 0:00:00     2000       CST     57    FAYETTE    AL
##     EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1  TORNADO         0                                               0
## 2  TORNADO         0                                               0
## 3  TORNADO         0                                               0
## 4  TORNADO         0                                               0
## 5  TORNADO         0                                               0
## 6  TORNADO         0                                               0
## 7  TORNADO         0                                               0
## 8  TORNADO         0                                               0
## 9  TORNADO         0                                               0
## 10 TORNADO         0                                               0
##    COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1          NA         0                      14.0   100 3   0          0
## 2          NA         0                       2.0   150 2   0          0
## 3          NA         0                       0.1   123 2   0          0
## 4          NA         0                       0.0   100 2   0          0
## 5          NA         0                       0.0   150 2   0          0
## 6          NA         0                       1.5   177 2   0          0
## 7          NA         0                       1.5    33 2   0          0
## 8          NA         0                       0.0    33 1   0          0
## 9          NA         0                       3.3   100 3   0          1
## 10         NA         0                       2.3   100 3   0          0
##    INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1        15    25.0          K       0                                    
## 2         0     2.5          K       0                                    
## 3         2    25.0          K       0                                    
## 4         2     2.5          K       0                                    
## 5         2     2.5          K       0                                    
## 6         6     2.5          K       0                                    
## 7         1     2.5          K       0                                    
## 8         0     2.5          K       0                                    
## 9        14    25.0          K       0                                    
## 10        0    25.0          K       0                                    
##    LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1      3040      8812       3051       8806              1
## 2      3042      8755          0          0              2
## 3      3340      8742          0          0              3
## 4      3458      8626          0          0              4
## 5      3412      8642          0          0              5
## 6      3450      8748          0          0              6
## 7      3405      8631          0          0              7
## 8      3255      8558          0          0              8
## 9      3334      8740       3336       8738              9
## 10     3336      8738       3337       8737             10

Extract the unique values for EVTYPE (event types)

EventTypes_Unique <- unique(Data$EVTYPE)

1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Pre-analysis data preparation

The two measurements in the dataset we need to use to calculate the effects on human population are the fields Injuries and Fatalities. We sum the two fields to find out the most harmful type of event.

TotalCasualties <- with(Data, aggregate(INJURIES + FATALITIES ~ EVTYPE, FUN = "sum"))

Adjust the field names appropriately

names(TotalCasualties) <- c('Event Type', 'Total Casualties')

Sort the data frame by Total Casualties in descending order

TotalCasualties_Ordered <- TotalCasualties[ order(TotalCasualties$`Total Casualties`,decreasing = TRUE), ]

Since there are many different events, we will only show the top 10 contributors to this analysis

TotalCasualties_Top10 <- head(TotalCasualties_Ordered, 10)

Results

with( 
    TotalCasualties_Top10, 
    barplot(
            `Total Casualties`, 
            names.arg = `Event Type`,
            main = "Top 10 events most harmful to population health", 
            xlab = "Event Type",
            ylab = "Total Number of casualties",
            cex.axis=0.7, cex.names=0.75,
            col = heat.colors(12),
            legend.text = `Event Type`,
            args.legend = list(x = "topright")
            ) 
    )

This analysis shows that the Tornadoes are the most harmful events towards human health in United States.

2. Across the United States, which types of events have the greatest economic consequences?

Pre-analysis data preparation

Create a new data frame. This data frame shall consist of Crop Damage and Property Damages by event type.

Data_EconomicExpenses <- filter( 
                Data[,c('EVTYPE', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')], 
                Data$CROPDMGEXP == "H" | 
                Data$CROPDMGEXP == "M" | 
                Data$CROPDMGEXP == "K" | 
                Data$CROPDMGEXP == "B" | 
                Data$PROPDMGEXP == "H" | 
                Data$PROPDMGEXP == "M" | 
                Data$PROPDMGEXP == "K" | 
                Data$PROPDMGEXP == "B"                                 
            )                  

Since the expenses are some represented in millions, billions, hundreds, and thousands, we need to convert all the figures into the same type.

Data_EconomicExpenses <- 
mutate(
        Data_EconomicExpenses, 
        CropDamage = ifelse(CROPDMGEXP == "M", CROPDMG * 1000000, ifelse(CROPDMGEXP == "B", CROPDMG * 1000000000, ifelse(CROPDMGEXP == "K", CROPDMG * 1000, ifelse(CROPDMGEXP == "H", CROPDMG * 100, CROPDMG)))),
        PropDamage = ifelse(PROPDMGEXP == "M", PROPDMG * 1000000, ifelse(PROPDMGEXP == "B", PROPDMG * 1000000000, ifelse(PROPDMGEXP == "K", PROPDMG * 1000, ifelse(PROPDMGEXP == "H", PROPDMG * 100, PROPDMG))))
    )

Create a new field named Total Damage which is a total of Crop Damage and Property Damage

Data_EconomicExpenses <- mutate( Data_EconomicExpenses, TotalDamage = CropDamage+PropDamage )

Aggregate Total Damage by Event Type

TotalEconomicConsequences <- with(Data_EconomicExpenses, aggregate(TotalDamage ~ EVTYPE, FUN = "sum"))

Create more appropriatee column names

names(TotalEconomicConsequences) <- c('Event Type', 'Total Economic Damage')

Sort the data frame by Total Economic Damage in descending order

TotalEconomicConsequences_Ordered <- TotalEconomicConsequences[ order(TotalEconomicConsequences$`Total Economic Damage`, decreasing = TRUE), ]

Since there are many different events, we will only show the top 10 contributors to this analysis

TotalEconomicConsequences_Top10 <- head(TotalEconomicConsequences_Ordered, 10)

Results

with( 
    TotalEconomicConsequences_Top10, 
    barplot(
        `Total Economic Damage`/ 1000000000, 
        names.arg = `Event Type`,
        main = "Top 10 events most harmful to the economy", 
        xlab = "Event Type",
        ylab = "Total Damages (in Billions of Dollars)",
        cex.axis=0.7, cex.names=0.75,
        col = heat.colors(12),
        legend.text = `Event Type`,
        args.legend = list(x = "topright")
    ) 
)

This analysis shows that Floods are the major contributors towards economic consequences in United States.