Synopsis

This report explores the NOAA Storm Database and answers some basic questions about severe weather events. We use the database to answer the questions below and show the code used for the entire analysis. The analysis can consist of tables, figures and other summaries. The fllowing packedges were used to support our analysis: dplyr, reshape2, data.table and ggplot2.

Questions

The data analysis addresses the following questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

The results show that more people are killed by tornados than by any other weather event, it also causes more damage to property than any other weather event. Hail causes more damage to crop than any other weather event.

Data Processing

Here we read the data into R using read.csv and save it to “storm”

setwd("~/courses/05_ReproducibleResearch")
storm<-read.csv("repdata-data-StormData.csv.bz2")

Process Negative Health Outcome Data

Here we use dplyr to subset the data then we use rehape2 to put into long format so that we can create a barplot in ggplot2. First we subset the variables of interest:
  1. Weather event type
  2. Fatalities
  3. Injuries

We then use group_by, mutate and summrize in dplyr to create a dataset that contains the total of all fatalities and injuries by weather type and then a variable with the sum of all negative health outcomes due to weather. We then use melt from reshape2 to turn this dataset into long form.

require(dplyr)
Health<-select(storm, EVTYPE, FATALITIES, INJURIES)#
Health<-tbl_df(Health)
TotalHealth<-Health %>% group_by(EVTYPE) %>% summarize(FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES)) %>% mutate(Sum=FATALITIES + INJURIES)
TotalHealth <- TotalHealth[order(TotalHealth$FATALITIES, decreasing = TRUE), ]
TotalHealth<- TotalHealth[1:15,]
require(reshape2)
TotalHealth1 <- melt(TotalHealth, id=c("Sum","EVTYPE"))

Process Negative Economic Outcome Data

Here again we use dplyr to subset the data then we use rehape2 to put into long format so that we can create a barplot in ggplot2. First we subset the variables of interest:
  1. Weather event type
  2. Damage to Property
  3. Damage to Crop

We then use group_by, mutate and summrize in dplyr to create a dataset that contains the total of all damage by weather type and then a variable with the sum of all negative economic outcomes (sum of both property and crop damage) due to weather. We then use melt from reshape2 to turn this dataset into long form.

require(dplyr)
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
Damage<-select(storm, EVTYPE, PROPDMG, CROPDMG)
Damage<-tbl_df(Damage)
TotalDamage<-Damage %>% group_by(EVTYPE) %>% summarize(PROPDMG = sum(PROPDMG), CROPDMG = sum(CROPDMG)) %>% mutate(Sum=PROPDMG + CROPDMG)
TotalDamage <- TotalDamage[order(TotalDamage$PROPDMG, decreasing = TRUE), ]
require(data.table)
## Loading required package: data.table
## 
## Attaching package: 'data.table'
## 
## The following objects are masked from 'package:dplyr':
## 
##     between, last
setnames(TotalDamage, c("PROPDMG", "CROPDMG"), c("Property", "Crop"))
TotalDamage<- TotalDamage[1:15,]
TotalDamage1 <- melt(TotalDamage, id=c("Sum","EVTYPE"))

Results

Health Outcomes

Here we see a list of the top 15 negative health outcomes based on weather events in a descending order. Tornados cause most damage.

TotalHealth
## Source: local data frame [15 x 4]
## 
##               EVTYPE FATALITIES INJURIES   Sum
##               (fctr)      (dbl)    (dbl) (dbl)
## 1            TORNADO       5633    91346 96979
## 2     EXCESSIVE HEAT       1903     6525  8428
## 3        FLASH FLOOD        978     1777  2755
## 4               HEAT        937     2100  3037
## 5          LIGHTNING        816     5230  6046
## 6          TSTM WIND        504     6957  7461
## 7              FLOOD        470     6789  7259
## 8        RIP CURRENT        368      232   600
## 9          HIGH WIND        248     1137  1385
## 10         AVALANCHE        224      170   394
## 11      WINTER STORM        206     1321  1527
## 12      RIP CURRENTS        204      297   501
## 13         HEAT WAVE        172      309   481
## 14      EXTREME COLD        160      231   391
## 15 THUNDERSTORM WIND        133     1488  1621

Here is a Bar Plot of the same results of the top 15 negative health outcomes based on weather events.

require(ggplot2)
## Loading required package: ggplot2
ggplot(TotalHealth1, aes(x=EVTYPE, y=value, fill=variable)) + 
    geom_bar(position=position_dodge(), stat="identity",
             colour="black", # Use black outlines,
             size=.1) +      # Thinner lines
    scale_fill_brewer(palette="Set1")+
    xlab("Weather Event") +
    ylab("Deaths, Injuries") +
      theme(axis.text.x = element_text(angle = 90, hjust = 1))+
    ggtitle("Death & Injuries from Weather Events") 

Economic Outcomes

Here we see a list of the top 15 negative economic outcomes based on weather events in a descending order. Again tornados top the list. But floods also cause huge damage.

TotalDamage
## Source: local data frame [15 x 4]
## 
##                EVTYPE   Property      Crop        Sum
##                (fctr)      (dbl)     (dbl)      (dbl)
## 1             TORNADO 3212258.16 100018.52 3312276.68
## 2         FLASH FLOOD 1420124.59 179200.46 1599325.05
## 3           TSTM WIND 1335965.61 109202.60 1445168.21
## 4               FLOOD  899938.48 168037.88 1067976.36
## 5   THUNDERSTORM WIND  876844.17  66791.45  943635.62
## 6                HAIL  688693.38 579596.28 1268289.66
## 7           LIGHTNING  603351.78   3580.61  606932.39
## 8  THUNDERSTORM WINDS  446293.18  18684.93  464978.11
## 9           HIGH WIND  324731.56  17283.21  342014.77
## 10       WINTER STORM  132720.59   1978.99  134699.58
## 11         HEAVY SNOW  122251.99   2165.72  124417.71
## 12           WILDFIRE   84459.34   4364.20   88823.54
## 13          ICE STORM   66000.67   1688.95   67689.62
## 14        STRONG WIND   62993.81   1616.90   64610.71
## 15         HIGH WINDS   55625.00   1759.60   57384.60

Here is a Bar Plot of the same results of the top 15 negative economic outcomes based on weather events.

g<-ggplot(TotalDamage1, aes(x=EVTYPE, y=value, fill=variable))  
 g <- g +geom_bar(position=position_dodge(), stat="identity",
             colour="black", # Use black outlines,
             size=.1)      # Thinner lines
g <- g+ scale_fill_brewer(palette="Set3")
 g <- g + xlab("Weather Event") 
 g <- g +ylab("Property, Crop Damage") 
 g <- g +theme(axis.text.x = element_text(angle = 90, hjust = 1))
   g <- g + labs(fill="Damage Type")
 g <- g +ggtitle("Economic Impact of Weather Events") 
g