Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

This project will answer two basic questions regarding the NOAA storm database:
1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
2. Across the United States, which types of events have the greatest economic consequences?

Source Data

The data for this assignment are in a comma-separated-value file compressed via the bzip2 algorithm. The original data can be downloaded from the course web site:

NOAA Storm Data [47Mb]

There is also some documentation of the database available. The documentation provides information on how some of the variables are constructed/defined.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Data Processing

The first step is to set the environment for the analysis. This involves setting the working directory and loading any needed R packges. In this case, I need the SQLDF, DPLYR, and ROCKCHALK packages to do some processing on the data. I need the GGPLOT2 and GRIDEXTRA packages for plotting the data.

Set the working environment

## Set the working directory
setwd("~/R/Coursera/Data Science/Course 5/Assignment 2")

## Load needed packages
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(rockchalk)
## 
## Attaching package: 'rockchalk'
## The following object is masked from 'package:dplyr':
## 
##     summarize
library(ggplot2)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine

If the original data from the course bzip2 archive aren’t already present, they are downloaded from the course site.

Download the data

## Download the original zipped data if it doesn't already exist
destfile="NOAAStormData.bz2"
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists(destfile)) {
    download.file(fileURL,destfile,method="auto")
}

The source data file is a comma delimited file. The data are loaded into R using the read.csv function. After loading, I check the contents of the file to ensure that the download, extraction, and conversion to R data worked as desired.

Unzip and load the data

## Read the CSV file
stormdata <- read.csv("NOAAStormData.bz2")

## Describe the dataset
str(stormdata)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Since the data has numerous unneeded variables and the analysis is extremely basic, I restrict the data to only those variables that are needed to answer the assignment questions. Event type, fatalities, injuries, crop damage, and property damage variables are kept for the analysis.

Transform and explore the data for the analysis

## Limit data to needed variables
stormdata <- subset(stormdata,
                    select=c("EVTYPE","FATALITIES","INJURIES","CROPDMG","CROPDMGEXP","PROPDMG","PROPDMGEXP"))

## Determine unique values of event type
events <- sqldf("select distinct EVTYPE as 'Events' from stormdata order by EVTYPE")
head(events,n=25)
##                         Events
## 1           HIGH SURF ADVISORY
## 2                COASTAL FLOOD
## 3                  FLASH FLOOD
## 4                    LIGHTNING
## 5                    TSTM WIND
## 6              TSTM WIND (G45)
## 7                   WATERSPOUT
## 8                         WIND
## 9                            ?
## 10             ABNORMAL WARMTH
## 11              ABNORMALLY DRY
## 12              ABNORMALLY WET
## 13        ACCUMULATED SNOWFALL
## 14         AGRICULTURAL FREEZE
## 15               APACHE COUNTY
## 16      ASTRONOMICAL HIGH TIDE
## 17       ASTRONOMICAL LOW TIDE
## 18                    AVALANCE
## 19                   AVALANCHE
## 20                BEACH EROSIN
## 21               BEACH EROSION
## 22 BEACH EROSION/COASTAL FLOOD
## 23                 BEACH FLOOD
## 24  BELOW NORMAL PRECIPITATION
## 25           BITTER WIND CHILL

After looking at the contents of event type (EVTYPE), it is apparent that the same event type has multiple entries that vary in spelling, capitalization, and punctuation. I converted all event types to uppercase and combined event types with similar spelling. Note that due to the volume of output produced by the combineLevels function, its output has been suppressed. The code has been included for completeness.

Process/transform the EVTYPE data for the analysis

## Convert event type variable to all upper case to merge same event types with differing case
levels(stormdata$EVTYPE) <- toupper(levels(stormdata$EVTYPE))

## Collapse levels of EVTYPE with differing spelling & punctuation
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("BEACH EROSION","BEACH EROSIN"), newLabel = c("BEACH EROSION"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("BITTER WIND CHILL","BITTER WIND CHILL TEMPERATURES"), newLabel = c("BITTER WIND CHILL"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("BLOWING SNOW & EXTREME WIND CH","BLOWING SNOW- EXTREME WIND CHI","BLOWING SNOW/EXTREME WIND CHIL"), newLabel = c("BLOWING SNOW/EXTREME WIND CHIL"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("BLOW-OUT TIDE","BLOW-OUT TIDES"), newLabel = c("BLOW-OUT TIDES"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("COLD TEMPERATURE","COLD TEMPERATURES"), newLabel = c("COLD TEMPERATURES"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("DUST DEVEL","DUST DEVIL"), newLabel = c("DUST DEVIL"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("FROST/FREEZE","FROST\\FREEZE"), newLabel = c("FROST/FREEZE"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("FUNNEL CLOUD","FUNNEL CLOUD."), newLabel = c("FUNNEL CLOUD"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("HAIL 0.75","HAIL 075","HAIL 75","HAIL(0.75)"), newLabel = c("HAIL 0.75"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("HAIL 0.88","HAIL 088","HAIL 88"), newLabel = c("HAIL 0.88"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("HAIL 1.75","HAIL 1.75)"), newLabel = c("HAIL 1.75"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("HEAVY PRECIPATATION","HEAVY PRECIPITATION"), newLabel = c("HEAVY PRECIPITATION"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("HEAVY SNOW/HIGH","HEAVY SNOW/HIGH WIND","HEAVY SNOW/HIGH WINDS"), newLabel = c("HEAVY SNOW/HIGH WINDS"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("HEAVY RAIN","HEAVY RAINFALL","HEAVY RAINS","HVY RAIN"), newLabel = c("HEAVY RAIN"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("LIGHTING","LIGHTNING","LIGHTNING."), newLabel = c("LIGHTNING"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("THUDERSTORM WINDS","THUNDEERSTORM WINDS","THUNDERESTORM WINDS","THUNDERSTORM  WINDS","THUNDERSTORM W INDS","THUNDERSTORM WIND",
                                                            "THUNDERSTORM WIND.","THUNDERSTORM WINDS","THUNDERSTORM WINDS      LE CEN","THUNDERSTORM WINDS.","THUNDERSTORM WINDSS",
                                                            "THUNDERSTORM WINS","THUNDERSTORMS WIND","THUNDERSTORMS WINDS","THUNDERSTORMW","THUNDERSTORMW WINDS","THUNDERSTORMWINDS",
                                                            "THUNDERSTROM WIND","THUNDERSTROM WINDS","THUNDERTORM WINDS","THUNDERTSORM WIND","THUNDESTORM WINDS","THUNERSTORM WINDS",
                                                            "TSTM","TSTM WIND","TSTM WIND","TSTM WIND DAMAGE","TSTM WINDS","TSTM WND","TSTMW","TUNDERSTORM WIND"), newLabel = c("THUNDERSTORM WINDS"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("THUNDERSTORM DAMAGE","THUNDERSTORM DAMAGE TO"), newLabel = c("THUNDERSTORM DAMAGE"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("TORNADO","TORNADOES","TORNADOS","TORNDAO"), newLabel = c("TORNADO"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("WATER SPOUT","WATERSPOUT","WATERSPOUT-","WATERSPOUT/","WATERSPOUTS","WAYTERSPOUT"), newLabel = c("WATERSPOUTS"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("WIND","WINDS","WND"), newLabel = c("WINDS"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("WINTER STORM","WINTER STORMS"), newLabel = c("WINTER STORMS"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("WINTER STORM HIGH WINDS","WINTER STORM/HIGH WIND","WINTER STORM/HIGH WINDS"), newLabel = c("WINTER STORM/HIGH WINDS"))
stormdata$EVTYPE <- combineLevels(stormdata$EVTYPE,levs = c("WINTER MIX","WINTER WEATHER MIX","WINTER WEATHER/MIX","WINTERY MIX","WINTRY MIX"), newLabel = c("WINTERY MIX"))

The property and crop damage values are expressed with “exponents”. For example, 32K is used to represent 32,000. These “exponents” need to be converted into numerics so that calculations can be carried out on the damage values.

Explore property and crop damage “exponents”

## Determine unique values of property damage "exponent"
unique(stormdata$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
## Determine unique values of crop damage "exponent"
unique(stormdata$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M

Process/transform the PROPDMG & CROPDMG data for the analysis

## Change the damage "exponents" to uppercase
stormdata$PROPDMGEXP <- toupper(stormdata$PROPDMGEXP)
stormdata$CROPDMGEXP <- toupper(stormdata$CROPDMGEXP)

## Convert the damage "exponents" into numeric values
stormdata$PROPEXP[stormdata$PROPDMGEXP == "H"] <- 100
stormdata$PROPEXP[stormdata$PROPDMGEXP == "K"] <- 1000
stormdata$PROPEXP[stormdata$PROPDMGEXP == "M"] <- 1000000
stormdata$PROPEXP[stormdata$PROPDMGEXP == "B"] <- 1000000000
stormdata$PROPEXP[stormdata$PROPDMGEXP == ""]  <- 1
stormdata$PROPEXP[stormdata$PROPDMGEXP == "0"] <- 1
stormdata$PROPEXP[stormdata$PROPDMGEXP == "1"] <- 10
stormdata$PROPEXP[stormdata$PROPDMGEXP == "2"] <- 100
stormdata$PROPEXP[stormdata$PROPDMGEXP == "3"] <- 1000
stormdata$PROPEXP[stormdata$PROPDMGEXP == "4"] <- 10000
stormdata$PROPEXP[stormdata$PROPDMGEXP == "5"] <- 100000
stormdata$PROPEXP[stormdata$PROPDMGEXP == "6"] <- 1000000
stormdata$PROPEXP[stormdata$PROPDMGEXP == "7"] <- 10000000
stormdata$PROPEXP[stormdata$PROPDMGEXP == "8"] <- 100000000
stormdata$CROPEXP[stormdata$CROPDMGEXP == "H"] <- 100
stormdata$CROPEXP[stormdata$CROPDMGEXP == "K"] <- 1000
stormdata$CROPEXP[stormdata$CROPDMGEXP == "M"] <- 1000000
stormdata$CROPEXP[stormdata$CROPDMGEXP == "B"] <- 1000000000
stormdata$CROPEXP[stormdata$CROPDMGEXP == ""]  <- 1
stormdata$CROPEXP[stormdata$CROPDMGEXP == "0"] <- 1
stormdata$CROPEXP[stormdata$CROPDMGEXP == "1"] <- 10
stormdata$CROPEXP[stormdata$CROPDMGEXP == "2"] <- 100
stormdata$CROPEXP[stormdata$CROPDMGEXP == "3"] <- 1000
stormdata$CROPEXP[stormdata$CROPDMGEXP == "4"] <- 10000
stormdata$CROPEXP[stormdata$CROPDMGEXP == "5"] <- 100000
stormdata$CROPEXP[stormdata$CROPDMGEXP == "6"] <- 1000000
stormdata$CROPEXP[stormdata$CROPDMGEXP == "7"] <- 10000000
stormdata$CROPEXP[stormdata$CROPDMGEXP == "8"] <- 100000000

## Convert invalid "exponents" to zero (0)
stormdata$PROPEXP[stormdata$PROPDMGEXP == "+"] <- 0
stormdata$PROPEXP[stormdata$PROPDMGEXP == "-"] <- 0
stormdata$PROPEXP[stormdata$PROPDMGEXP == "?"] <- 0
stormdata$CROPEXP[stormdata$CROPDMGEXP == "+"] <- 0
stormdata$CROPEXP[stormdata$CROPDMGEXP == "-"] <- 0
stormdata$CROPEXP[stormdata$CROPDMGEXP == "?"] <- 0

## Compute numeric damage values
stormdata$PROPDMGVAL <- stormdata$PROPDMG * stormdata$PROPEXP
stormdata$CROPDMGVAL <- stormdata$CROPDMG * stormdata$CROPEXP
stormdata$TOTLDMGVAL <- stormdata$PROPDMGVAL + stormdata$CROPDMGVAL

Both injuries and fatalities are “harmful to population health”. I am going to look at fatalities and injuries separately, but also combine them to see the rankings of harmful event types changes. Thus, I need to create a combined “population harm” variable that is the sum of injuries and fatalities.

Process/transform the FATALITIES & INJURIES data for the analysis

## Create "population harm" variable that is the sum of injuries and fatalities
stormdata$POPHARM <- stormdata$FATALITIES + stormdata$INJURIES

Aggregate the human and economic consequences by weather event type for the analysis

## Aggregate population harm by event type
popharm    <- aggregate(POPHARM ~ EVTYPE, data=stormdata, FUN=sum)

## Aggregate fatalities by event type
fatalities <- aggregate(FATALITIES ~ EVTYPE, data=stormdata, FUN=sum)

## Aggregate injuries by event type
injuries   <- aggregate(INJURIES ~ EVTYPE, data=stormdata, FUN=sum)

## Aggregate total damages by event type
totldamage <- aggregate(TOTLDMGVAL ~ EVTYPE, data=stormdata, FUN=sum)

## Aggregate property damages by event type
propdamage <- aggregate(PROPDMGVAL ~ EVTYPE, data=stormdata, FUN=sum)

## Aggregate crop damages by event type
cropdamage <- aggregate(CROPDMGVAL ~ EVTYPE, data=stormdata, FUN=sum)

Exploratory Analysis

To start, basic descriptive statistics (minimum, 1st quartile, median, mean 3rd quartile, maximum) for the numeric variables are produced. The basic statistics give a high-level view of the scope of the consequences of the events contained in the database.

Basic descriptive statistics for the numeric variables

## Calculate basic descriptive statistics
summary(stormdata$POPHARM)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##    0.0000    0.0000    0.0000    0.1725    0.0000 1742.0000
summary(stormdata$FATALITIES)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   0.0000   0.0000   0.0000   0.0168   0.0000 583.0000
summary(stormdata$INJURIES)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##    0.0000    0.0000    0.0000    0.1557    0.0000 1700.0000
summary(stormdata$TOTLDMGVAL)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.00e+00 0.00e+00 0.00e+00 5.29e+05 1.00e+03 1.15e+11
summary(stormdata$PROPDMGVAL)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 0.000e+00 0.000e+00 4.746e+05 5.000e+02 1.150e+11
summary(stormdata$CROPDMGVAL)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 0.000e+00 0.000e+00 5.442e+04 0.000e+00 5.000e+09

The next thing that I want to explore is the effect event type has on the population and economic consequences. I explored the top 10 events for each of the consequences calculated above.

Top 10 event types for each consequence

## Determine top 10 event types for population harm
top10popharm <- popharm[order(popharm$POPHARM, decreasing=TRUE), ][1:10, ]

## Determine top 10 event types for fatalities
top10fatalities <- fatalities[order(fatalities$FATALITIES, decreasing=TRUE), ][1:10, ]

## Determine top 10 event types for injuries
top10injuries <- injuries[order(injuries$INJURIES, decreasing=TRUE), ][1:10, ]

## Determine top 10 event types for total damage
top10totldamage <- totldamage[order(totldamage$TOTLDMGVAL, decreasing=TRUE), ][1:10, ]

## Determine top 10 event types for property damage
top10propdamage <- propdamage[order(propdamage$PROPDMGVAL, decreasing=TRUE), ][1:10, ]

## Determine top 10 event types for crop damage
top10cropdamage <- cropdamage[order(cropdamage$CROPDMGVAL, decreasing=TRUE), ][1:10, ]

## Display top 10 for each
top10popharm
##                 EVTYPE POPHARM
## 823            TORNADO   96979
## 821 THUNDERSTORM WINDS   10097
## 103     EXCESSIVE HEAT    8428
## 141              FLOOD    7259
## 820          LIGHTNING    6047
## 217               HEAT    3037
## 125        FLASH FLOOD    2755
## 352          ICE STORM    2064
## 826      WINTER STORMS    1554
## 286          HIGH WIND    1385
top10fatalities
##                 EVTYPE FATALITIES
## 823            TORNADO       5633
## 103     EXCESSIVE HEAT       1903
## 125        FLASH FLOOD        978
## 217               HEAT        937
## 820          LIGHTNING        817
## 821 THUNDERSTORM WINDS        702
## 141              FLOOD        470
## 486        RIP CURRENT        368
## 286          HIGH WIND        248
## 19           AVALANCHE        224
top10injuries
##                 EVTYPE INJURIES
## 823            TORNADO    91346
## 821 THUNDERSTORM WINDS     9395
## 141              FLOOD     6789
## 103     EXCESSIVE HEAT     6525
## 820          LIGHTNING     5230
## 217               HEAT     2100
## 352          ICE STORM     1975
## 125        FLASH FLOOD     1777
## 195               HAIL     1361
## 826      WINTER STORMS     1338
top10totldamage
##                 EVTYPE   TOTLDMGVAL
## 141              FLOOD 150319678257
## 338  HURRICANE/TYPHOON  71913712800
## 823            TORNADO  57362335387
## 561        STORM SURGE  43323541000
## 195               HAIL  18761221986
## 125        FLASH FLOOD  18243991079
## 73             DROUGHT  15018672000
## 329          HURRICANE  14610229010
## 821 THUNDERSTORM WINDS  11075628264
## 491        RIVER FLOOD  10148404500
top10propdamage
##                 EVTYPE   PROPDMGVAL
## 141              FLOOD 144657709807
## 338  HURRICANE/TYPHOON  69305840000
## 823            TORNADO  56947381117
## 561        STORM SURGE  43323536000
## 125        FLASH FLOOD  16822673979
## 195               HAIL  15735267513
## 329          HURRICANE  11868319010
## 821 THUNDERSTORM WINDS   9916049526
## 705     TROPICAL STORM   7703890550
## 826      WINTER STORMS   6688997251
top10cropdamage
##                 EVTYPE  CROPDMGVAL
## 73             DROUGHT 13972566000
## 141              FLOOD  5661968450
## 491        RIVER FLOOD  5029459000
## 352          ICE STORM  5022113500
## 195               HAIL  3025954473
## 329          HURRICANE  2741910000
## 338  HURRICANE/TYPHOON  2607872800
## 125        FLASH FLOOD  1421317100
## 112       EXTREME COLD  1312973000
## 821 THUNDERSTORM WINDS  1159578738

Many of the same event types appear on the top 10 for the various population and economic consequences.

To answer the questions posed for the assignment, I will take a look at plots of population and economic harm by event type.

Results

Across the United States, which types of events are most harmful with respect to population health?

## Plot fatalities
plotfatalities <- ggplot() + geom_bar(data = top10fatalities[1:5,], aes(x = EVTYPE, y = FATALITIES,
    fill = interaction(FATALITIES, EVTYPE)), stat = "identity", show.legend = F) + 
    theme(axis.text.x = element_text(angle = 30, hjust = 1)) + xlab("Fatalities") + 
    ylab("Number") + theme(axis.text.x = element_text(angle = 30, hjust = 1))

## Plot injuries
plotinjuries <- ggplot() + geom_bar(data = top10injuries[1:5,], aes(x = EVTYPE, y = INJURIES, 
    fill = interaction(INJURIES, EVTYPE)), stat = "identity", show.legend = F) + 
    theme(axis.text.x = element_text(angle = 30, hjust = 1)) + xlab("Injuries") + 
    ylab("Number") + theme(axis.text.x = element_text(angle = 30, hjust = 1))

## Display both plots in two panels
grid.arrange(plotfatalities, plotinjuries, ncol=2, top="Top 5 event types impacting population health")

Tornados are the most harmful to population health regardless of whether the harm is injury or death. Excessive heat and flooding are also very harmful regardless of type of injury.

Across the United States, which types of events have the greatest economic consequences?

## Plot property damage
plotproperty <- ggplot() + geom_bar(data = top10propdamage[1:5,], aes(x = EVTYPE, y = PROPDMGVAL,
    fill = interaction(PROPDMGVAL, EVTYPE)), stat = "identity", show.legend = F) +
    theme(axis.text.x = element_text(angle = 30, hjust = 1)) + xlab("Property damage") + 
    ylab("Damage amount") + theme(axis.text.x = element_text(angle = 30, hjust = 1))

## Plot crop damage
plotcrop <- ggplot() + geom_bar(data = top10cropdamage[1:5,], aes(x = EVTYPE, y = CROPDMGVAL,
    fill = interaction(CROPDMGVAL, EVTYPE)), stat = "identity", show.legend = F) + 
    theme(axis.text.x = element_text(angle = 30, hjust = 1)) + xlab("Crop damage") + 
    ylab("Damage amount") + theme(axis.text.x = element_text(angle = 30, hjust = 1))

## Display both plots in two panels
grid.arrange(plotproperty, plotcrop, ncol=2, top="Top 5 event types causing economic loss")

Just as one would expect, property damage is largely caused by water - storms and flooding. Similarly, it is intuitive that drought would cause the most crop damage with water - storms and flooding - also being prominent causes of crop damage.