Synopsis

In this report I aim to investigate the socio-economic effects of weather event in the United States between the years 1950 and 2011. I examined which types of events were most costly in terms of economic costs as well as human injuries and fatalities. I found that during this period, tornadoes were, by far, the most costly in terms of population health. Economically, floods, hurricanes/typhoods and storm surges were the most costly.

Data Processing

From the National Weather Service I obtained data about storms that occured between 1950 and 2011. The zipped file was read directly using the read.csv function. I examined if the data included missing values for either fatalities or injuries; it did not.

cache = TRUE
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
NOAA <- read.csv("repdata_data_StormData.csv.bz2", header = T, 
                 nrows = 902297)
mean(is.na(NOAA$FATALITIES))
## [1] 0
mean(is.na(NOAA$INJURIES))
## [1] 0

Results

Effects on population health

To analyse the effects of various types of weather events on population health, I gathered the data for injuries and fatalities and grouped them by event type. I arbitrarily chose the top 20 event types for each category.

##Make datatable for each category, rank them and extract the top 20
##of each
Inj <- NOAA %>%
        group_by(EVTYPE) %>%
        summarise(Injuries = sum(INJURIES)) %>%
        arrange(desc(Injuries))
InjTop20 <- Inj[1:20,] 

Fat <- NOAA %>%
        group_by(EVTYPE) %>% 
        summarise(Fatalities = sum(FATALITIES)) %>%
        arrange(desc(Fatalities))
FatTop20 <- Fat[1:20,]

To answer the question, I plotted the top 20 weather events causing injuries or fatalities and also listed them below each plot with the numbers. In both cases, tornado was the top event and was dramatically higher than all other events. To make the plots meaningful, I scalled the x axis to not include the values for tornadoes.

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
gi <- ggplot(InjTop20, aes(x = Injuries, y = reorder(EVTYPE, Injuries)))
pi <- gi + geom_point(size = 3) +
        coord_cartesian(xlim = c(0, 7500)) +
        labs(title = "Top 20 Weather Events Causing Injuries") +
        labs(y = "Event Type")
print(pi)

InjTop20
## Source: local data frame [20 x 2]
## 
##                EVTYPE Injuries
## 1             TORNADO    91346
## 2           TSTM WIND     6957
## 3               FLOOD     6789
## 4      EXCESSIVE HEAT     6525
## 5           LIGHTNING     5230
## 6                HEAT     2100
## 7           ICE STORM     1975
## 8         FLASH FLOOD     1777
## 9   THUNDERSTORM WIND     1488
## 10               HAIL     1361
## 11       WINTER STORM     1321
## 12  HURRICANE/TYPHOON     1275
## 13          HIGH WIND     1137
## 14         HEAVY SNOW     1021
## 15           WILDFIRE      911
## 16 THUNDERSTORM WINDS      908
## 17           BLIZZARD      805
## 18                FOG      734
## 19   WILD/FOREST FIRE      545
## 20         DUST STORM      440
gf <- ggplot(FatTop20, aes(x = Fatalities, y = reorder(EVTYPE, Fatalities)))
pf <- gf + geom_point(size = 3) +
        coord_cartesian(xlim = c(0, 2000)) +
        labs(title = "Top 20 Weather Events Causing Fatalities") +
        labs(y = "Event Type")
print(pf)

FatTop20
## Source: local data frame [20 x 2]
## 
##                     EVTYPE Fatalities
## 1                  TORNADO       5633
## 2           EXCESSIVE HEAT       1903
## 3              FLASH FLOOD        978
## 4                     HEAT        937
## 5                LIGHTNING        816
## 6                TSTM WIND        504
## 7                    FLOOD        470
## 8              RIP CURRENT        368
## 9                HIGH WIND        248
## 10               AVALANCHE        224
## 11            WINTER STORM        206
## 12            RIP CURRENTS        204
## 13               HEAT WAVE        172
## 14            EXTREME COLD        160
## 15       THUNDERSTORM WIND        133
## 16              HEAVY SNOW        127
## 17 EXTREME COLD/WIND CHILL        125
## 18             STRONG WIND        103
## 19                BLIZZARD        101
## 20               HIGH SURF        101

These data show that tornadoes are the most harmful weather events with respect to population health.

Analysing the event types that have the greatest economic consequences

To determine which types of events have the greatest economic consequence, I examined property damages and crop damages.

##Extract columns for event type and property and crop damages
damage <- NOAA[,c(8,25:28)]

Because the values of the damages can be reported in thousands, millions or billions of dollars, I selected only those events that were reported in the billions.

##Select events in which property or crop damage costs are reported
##in the billions of dollars
propDamBill <- damage[damage$PROPDMGEXP == "B",1:3]
cropDamBill <- damage[damage$CROPDMGEXP == "B",c(1, 4:5)]
##Merge the two lists together
totDamBill <- merge(propDamBill, cropDamBill, all = T)

To calculate the total cost for each event, I added the columns for property damange and crop damage together into a variable called cost. NA values were replaced with 0 before this operation was performed.

propNA <- is.na(totDamBill$PROPDMG)
for(i in 1:length(propNA)) {
        if(propNA[i] == TRUE) {
                totDamBill$PROPDMG[i] = 0
        }
}
cropNA <- is.na(totDamBill$CROPDMG)
for(i in 1:length(cropNA)) {
        if(cropNA[i] == TRUE) {
                totDamBill$CROPDMG[i] = 0
        }
}
##Add together costs for each category for each event 
totDamBill$cost <- totDamBill$PROPDMG + totDamBill$CROPDMG

I then grouped the data by event type and added the costs for each.

##Group the event types and sum the costs for each event
econCost <- totDamBill %>%
        group_by(EVTYPE) %>%
        summarise(cost = sum(cost)) %>%
        arrange(desc(cost))

The following plot and table show the data.

ge <- ggplot(econCost, aes(x = cost, y = reorder(EVTYPE, cost)))
pe <- ge + geom_point(size = 3) +
        labs(title = "Most expensive types of weather events") +
        labs(x = "Cost (billions of dollars)") +
        labs(y = "Event Type")
print(pe)

as.data.frame(econCost)
##                        EVTYPE   cost
## 1                       FLOOD 122.50
## 2           HURRICANE/TYPHOON  83.62
## 3                 STORM SURGE  42.56
## 4                 RIVER FLOOD  10.00
## 5                   HURRICANE   5.70
## 6                     TORNADO   5.30
## 7              TROPICAL STORM   5.15
## 8                   ICE STORM   5.00
## 9                WINTER STORM   5.00
## 10           STORM SURGE/TIDE   4.00
## 11             HURRICANE OPAL   3.10
## 12  HEAVY RAIN/SEVERE WEATHER   2.50
## 13                       HAIL   1.80
## 14 TORNADOES, TSTM WIND, HAIL   1.60
## 15                    DROUGHT   1.50
## 16           WILD/FOREST FIRE   1.50
## 17                  HIGH WIND   1.30
## 18        SEVERE THUNDERSTORM   1.20
## 19                   WILDFIRE   1.04
## 20                FLASH FLOOD   1.00
## 21                       HEAT   0.40
## 22                     FREEZE   0.20
## 23  HURRICANE OPAL/HIGH WINDS   0.10