Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This analysis involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The following data analysis will address the following questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Data Processing

To wrangle in the data, I utilized the dplyr package. For data visualization I worked with the ggplot2 package. Please read through the comments and code to get an understanding on how I processed the data. As you read, you will notice I separated the data process into two parts. The first part (1.) wrangles data for the weather impact on public health and the second part (2.) wrangles in data for the weather impact on the economy. I would like to also mention that I filtered and normalized the data where needed.

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# download storm data from U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database - uncomment below code to download the storm data from the website
# download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile='stormdata.csv.bz2', mode = "wb")

# read .csv file name it sd
sd <- read.csv("stormdata.csv.bz2", sep=",")

# 1. select public health data, ph = public health
ph <- select(sd, EVTYPE, FATALITIES, INJURIES )

# 2. select economic data impact econ = economic
econ <- select(sd, EVTYPE, PROPDMG, CROPDMG )

# 1. group by event type and sum public health data
phdata <- ph %>%
  group_by(EVTYPE) %>%
  select(FATALITIES, INJURIES) %>%
  summarise(
    FATALITIES = sum(FATALITIES, na.rm = TRUE),
    INJURIES = sum(INJURIES, na.rm = TRUE)
  ) 

# 2. group by event type and sum economic data
econdata <- econ %>%
  group_by(EVTYPE) %>%
  select(PROPDMG, CROPDMG) %>%
  summarise(
    PROPDMG = sum(PROPDMG, na.rm = TRUE),
    CROPDMG = sum(CROPDMG, na.rm = TRUE)
  ) 


# 1. sum public health data and add FATALITIES and INJURIES for total
phdata <- mutate(phdata,
       TOTAL = FATALITIES + INJURIES)

# 2. sum econ and add PROPDMG and CROPDMG for total
econdata <- mutate(econdata,
                 TOTAL = PROPDMG + CROPDMG)

# 1. order by TOTAL on phdata
phdata <- arrange(phdata, desc(TOTAL))

# 2. order by TOTAL on econdata
econdata <- arrange(econdata, desc(TOTAL))

# 1. filter row that meets TOTAL >99 FATALITIES + INJURIES 
phdataTotal <- filter(phdata, TOTAL>99)

# 2. filter row that meets TOTAL >49999  PROPDMG + CROPDMG
econdataTotal <- filter(econdata, TOTAL>49999)

# 1. Order TOTAL highest to lowest in order to plot EVTYPE on x axis phdataTotal
phdataTotal$EVTYPE <- factor(phdataTotal$EVTYPE, levels = phdataTotal$EVTYPE[order(phdataTotal$TOTAL)])

# 2. Order TOTAL highest to lowest in order to plot EVTYPE on x axis econdataTotal
econdataTotal$EVTYPE <- factor(econdataTotal$EVTYPE, levels = econdataTotal$EVTYPE[order(econdataTotal$TOTAL)])

# 1. looks for 'TORNDADO' EVTYPE and divides total to normalise data for plot
phdataTotal <- mutate(phdataTotal, TOTAL = ifelse(EVTYPE == 'TORNADO', TOTAL / 10, TOTAL))

# 2. Divide 100K to TOTAL to normalise data for plot
econdataTotal <- mutate(econdataTotal, TOTAL100K =TOTAL / 100000)

Results - Weather Impact on Public Health

Please refer to table below for top ten event types that have caused great impact to public health. The event weather *TORNADO is the most harmful with respect to population health with a total of 96,979 fatalities and injuries between 1950 through November 2011. Followed by EXCESSIVE HEAT and TSTM WIND with 8,428 and 7,461, respectively.

*Note TORNADO total was divided by 10 to normalize data plot.

library(ggplot2)
# 1. Top 10 event types that cause the most FATALITIES + INJURIES  combined
top_n(phdataTotal,10,TOTAL)
## Source: local data frame [10 x 4]
## 
##               EVTYPE FATALITIES INJURIES  TOTAL
## 1            TORNADO       5633    91346 9697.9
## 2     EXCESSIVE HEAT       1903     6525 8428.0
## 3          TSTM WIND        504     6957 7461.0
## 4              FLOOD        470     6789 7259.0
## 5          LIGHTNING        816     5230 6046.0
## 6               HEAT        937     2100 3037.0
## 7        FLASH FLOOD        978     1777 2755.0
## 8          ICE STORM         89     1975 2064.0
## 9  THUNDERSTORM WIND        133     1488 1621.0
## 10      WINTER STORM        206     1321 1527.0
# 1. plot event type impact to public health 
phTotal <- ggplot(phdataTotal, aes(EVTYPE, TOTAL, fill=TOTAL)) + 
  geom_bar(stat="identity")+
  labs(title="Total Fatalities & Injuries by Weather Events (1950-November 2011)", x="", y="")+
  coord_flip()
phTotal 

Results - Weather Impact on Economy

Please refer to table below for top ten event types that have caused great impact to economy. The event weather TORNADO causes great damage with respect to economy with a total cost of $3,3122,278 between 1950 through November 2011. Followed by FLASH FLOOD and TSTM WIND with a cost of $1,599,325 and $1,445,168, respectively.

library(ggplot2)


# 2. Top 10 event types that cause the most damage
top_n(econdataTotal,10,TOTAL)
## Source: local data frame [10 x 5]
## 
##                EVTYPE   PROPDMG   CROPDMG     TOTAL TOTAL100K
## 1             TORNADO 3212258.2 100018.52 3312276.7 33.122767
## 2         FLASH FLOOD 1420124.6 179200.46 1599325.1 15.993251
## 3           TSTM WIND 1335965.6 109202.60 1445168.2 14.451682
## 4                HAIL  688693.4 579596.28 1268289.7 12.682897
## 5               FLOOD  899938.5 168037.88 1067976.4 10.679764
## 6   THUNDERSTORM WIND  876844.2  66791.45  943635.6  9.436356
## 7           LIGHTNING  603351.8   3580.61  606932.4  6.069324
## 8  THUNDERSTORM WINDS  446293.2  18684.93  464978.1  4.649781
## 9           HIGH WIND  324731.6  17283.21  342014.8  3.420148
## 10       WINTER STORM  132720.6   1978.99  134699.6  1.346996
# 2. plot event type impact to economy 
econTotal <- ggplot(econdataTotal, aes(EVTYPE, TOTAL100K, fill=TOTAL100K)) + 
  geom_bar(stat="identity")+
  labs(title="Total Cost to Property & Crop Damage by Weather Events (1950-November 2011)", x="", y="")+
  coord_flip()
econTotal 

*Note TOTAL100K variable is used to normalize data plot.