Impact of Severe Weather Events on Public Health and Economy in the United States


Synopsis

This document is the final report of the Peer Assessment 2 project from Coursera’s course Reproducible Research, as part of the Specialization in Data Science. It was built up in RStudio, using its knitr functions, meant to be published in html format.
Data from the U.S. National Oceanic and Atmospheric Administration’s - NOAA storm database was used for the analysis. Data was avalilable since 1950, but a subset of such data (1995 - 2011) was used due to the increased reporting status.
In the analysis underneath, it will be possible to verify that most of the fatalities in the U.S. due to severe weather conditions come from Excessive Heat and Tornados. In the same way, Tornados are also responsible for the great majority of the injuries reported. While most of the injuries are reported in the states of Texas and Missouri, fatalities are mostly detected in Illinois.
When the focus of the analysis turns to the economical impact of the weather events, Thunderstorm Wind, Flash Flood and Tornados are responsible for most of the damage to properties in the U.S., while Hail is the main cause of crop damage.


Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.


Environment Preparation

This small routine cleans up space in memory to receive the data and uploads the necessary packages for the data analysis and plotting processes.
It also defines “echo=TRUE” as default for knitr and turns off the scientific notation for numbers.

rm(list=ls())                # free up memory for the download of the data sets
library(knitr)
library(ggplot2)
library(gridExtra)
library(dplyr)
library(tidyr)
opts_chunk$set(echo = TRUE)
options(scipen = 999)         # switches off the scientific notation

Latest information about the system where the analysis has been run:

sessionInfo()
## R version 3.2.1 (2015-06-18)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
## 
## locale:
## [1] LC_COLLATE=Portuguese_Brazil.1252  LC_CTYPE=Portuguese_Brazil.1252   
## [3] LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C                      
## [5] LC_TIME=Portuguese_Brazil.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] tidyr_0.3.1     dplyr_0.4.3     gridExtra_2.0.0 ggplot2_1.0.1  
## [5] knitr_1.10.5   
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.0      magrittr_1.5     MASS_7.3-40      munsell_0.4.2   
##  [5] colorspace_1.2-6 R6_2.1.1         stringr_1.0.0    plyr_1.8.3      
##  [9] tools_3.2.1      parallel_3.2.1   grid_3.2.1       gtable_0.1.2    
## [13] DBI_0.3.1        htmltools_0.2.6  yaml_2.1.13      digest_0.6.8    
## [17] assertthat_0.1   reshape2_1.4.1   formatR_1.2      evaluate_0.7    
## [21] rmarkdown_0.7    stringi_0.5-5    scales_0.3.0     proto_0.3-10


Data Processing


Loading the Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The file was downloaded from the course web site:

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
The data file was downloaded, unzipped and saved in a local research directory. For reproducibility of this analysis, the same procedured should be followed after the definition of the working directory using the ‘setwd()’ function. Full dataset was loaded in the vector stormData.

stormData <- read.csv("repdata-data-StormData.csv", header = T, sep = ",")
                
# check the data
dim(stormData)
## [1] 902297     37

The vector has a length of 902297 datasets, with 37 variables. An overview of the data can be seen below:

head(stormData, n=2)
##   STATE__          BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1 4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1 4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                        14   100 3   0          0
## 2         NA         0                         2   150 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2

And the dataset details are described by:

str(stormData)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436774 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...


Preparing the Datasets


Understanding the Total Dataset

To have an initial visual overview of the data, a bar graph with the total number of weather events is ploted.

stormData$year <- as.numeric(format(as.Date(stormData$BGN_DATE, 
                             format = "%m/%d/%Y %H:%M:%S"), "%Y"))
ggplot(stormData, aes(stormData$year)) + 
      geom_histogram(binwidth=1,colour="#999999",fill="#000099") + 
      ggtitle("Total Weather Events per Year")+
      xlab("year")+
      ylab("weather events")

As previously described, the dataset gets more consistent and complete after 1995. The accuracy of the data has not yet been checked. A trend for increasing events has not been checked either, being subject for later studies.
Taking it into account, a subset of the data was taken and loaded into the vector stormDataSET, with datasets only from 1995 and further. This vector will be used till the end of this analysis.


Subsetting the Data

The subset of data from 95 and on was taken following the procedure:

stormDataSET <- subset(stormData, year >= 1995)

# standardises the digits of cost variables  
stormDataSET$PROPDMGEXP <- as.character(toupper(stormDataSET$PROPDMGEXP))
stormDataSET$CROPDMGEXP <- as.character(toupper(stormDataSET$CROPDMGEXP))

# check the new dataset
dim(stormDataSET)
## [1] 681500     38

The new dataset has now 681500 events registered. The total number of registered events is detailed by:

totalEvents<- stormDataSET %>% 
              group_by(EVTYPE) %>% 
              summarize(total=n()) %>% 
              arrange(desc(total))
head(totalEvents,15)
## Source: local data frame [15 x 2]
## 
##                EVTYPE  total
##                (fctr)  (int)
## 1                HAIL 215932
## 2           TSTM WIND 128923
## 3   THUNDERSTORM WIND  81745
## 4         FLASH FLOOD  52673
## 5               FLOOD  24641
## 6             TORNADO  24335
## 7           HIGH WIND  19956
## 8          HEAVY SNOW  14710
## 9           LIGHTNING  14280
## 10         HEAVY RAIN  11621
## 11       WINTER STORM  11372
## 12 THUNDERSTORM WINDS  10041
## 13     WINTER WEATHER   6973
## 14       FUNNEL CLOUD   6357
## 15   MARINE TSTM WIND   6175

Hail and Thunderstorm Winds are the most reported severe weather events.


Results

Analysing the Impact on Public Health

The impact of the severe weather events on public health can be checked with a deeper analysis of the variables FATALITIES and INJURIES, self explained by their names.

impactFatalities <- stormDataSET %>% 
                    group_by(EVTYPE) %>% 
                    summarize(totFatalities=sum(FATALITIES)) %>%
                    arrange(desc(totFatalities))
impactFatalities <- impactFatalities[1:15,]
head(impactFatalities,15)
## Source: local data frame [15 x 2]
## 
##               EVTYPE totFatalities
##               (fctr)         (dbl)
## 1     EXCESSIVE HEAT          1903
## 2            TORNADO          1545
## 3        FLASH FLOOD           934
## 4               HEAT           924
## 5          LIGHTNING           729
## 6              FLOOD           423
## 7        RIP CURRENT           360
## 8          HIGH WIND           241
## 9          TSTM WIND           241
## 10         AVALANCHE           223
## 11      RIP CURRENTS           204
## 12      WINTER STORM           195
## 13         HEAT WAVE           161
## 14 THUNDERSTORM WIND           131
## 15      EXTREME COLD           126

Excessive Heat and Tornados are responsible for most of the fatalities due to weather events in the U.S. since 1995.

impactInjuries <- stormDataSET %>% 
                  group_by(EVTYPE) %>% 
                  summarize(totInjuries=sum(INJURIES)) %>%
                  arrange(desc(totInjuries))
impactInjuries <- impactInjuries[1:15,]
head(impactInjuries,15)
## Source: local data frame [15 x 2]
## 
##               EVTYPE totInjuries
##               (fctr)       (dbl)
## 1            TORNADO       21765
## 2              FLOOD        6769
## 3     EXCESSIVE HEAT        6525
## 4          LIGHTNING        4631
## 5          TSTM WIND        3630
## 6               HEAT        2030
## 7        FLASH FLOOD        1734
## 8  THUNDERSTORM WIND        1426
## 9       WINTER STORM        1298
## 10 HURRICANE/TYPHOON        1275
## 11         HIGH WIND        1093
## 12              HAIL         916
## 13          WILDFIRE         911
## 14        HEAVY SNOW         751
## 15               FOG         718

Tornados is the weather condition that causes most of the injuries.

In order to have an overview of the impact of the most important weather conditions as seen above on each one of the country states, a filter has been applied only to the events “TORNADO”,“EXCESSIVE HEAT”,“FLASH FLOOD”, “HEAT”, “FLOOD”, “LIGHTNING” and “TSTM WIND”, that are common to the two analyses above.

targetEvents <- c("TORNADO","EXCESSIVE HEAT","FLASH FLOOD",
                  "HEAT","FLOOD","LIGHTNING","TSTM WIND")
stateData <- filter(stormDataSET, EVTYPE %in% targetEvents )
impactStates <- stateData %>% 
                select(STATE,EVTYPE,FATALITIES,INJURIES) %>%
                group_by(STATE) %>% 
                summarize(totFATAL=sum(FATALITIES),totINJUR=sum(INJURIES)) %>%
                mutate(totalImpact=totFATAL+totINJUR) %>%
                arrange(desc(totalImpact))
impactStates <- impactStates[1:15,]
head(impactStates,15)
## Source: local data frame [15 x 4]
## 
##     STATE totFATAL totINJUR totalImpact
##    (fctr)    (dbl)    (dbl)       (dbl)
## 1      TX      639     8743        9382
## 2      MO      555     6501        7056
## 3      AL      416     3797        4213
## 4      TN      309     2410        2719
## 5      OK      210     2245        2455
## 6      IL     1045     1384        2429
## 7      GA      151     1848        1999
## 8      AR      213     1603        1816
## 9      FL      214     1600        1814
## 10     PA      495     1012        1507
## 11     NC      146     1156        1302
## 12     MS      133     1074        1207
## 13     MI       57     1083        1140
## 14     MD      132      853         985
## 15     VA       78      831         909

In total (fatalities and injuries), Texas and Missouri are the most impacted states. Most of the injuries are reported in those states.

In the other hand, Illinois is the state with the majority of the fatalities.

The total impact can be better seen in the two following graphs:

# total fatalities graph
gbar01<- ggplot(impactFatalities,aes(x=reorder(EVTYPE,-totFatalities),
                                     y=totFatalities, fill=EVTYPE)) +
                geom_bar(stat = "identity",binwidth=1) + 
                ggtitle(expression(atop("Total Fatalities per Year",
                                        atop(italic("1995 - 2011"),"")))) +
                xlab("events") +
                ylab("number of fatalities") + 
                theme(axis.text.x=element_text(angle=45,hjust=1,vjust=1),
                      legend.position="none")

# total injuries graph
gbar02<- ggplot(impactInjuries,aes(x=reorder(EVTYPE,-totInjuries),
                                     y=totInjuries, fill=EVTYPE)) +
                geom_bar(stat = "identity",binwidth=1) + 
                ggtitle(expression(atop("Total Injuries per Year",
                                        atop(italic("1995 - 2011"),"")))) +
                xlab("events") +
                ylab("number of injuriess") + 
                theme(axis.text.x=element_text(angle=45,hjust=1,vjust=1),
                      legend.position="none")


grid.arrange(gbar01, gbar02, ncol = 2)


Analysing the Impact on Economy

The economical impact that those severe wether events represent to society and government are the losses reported in the variables PROPDMG and CROPDMG, which means the total cost of the damage to Property and Crop respectively.

Before calculating such costs, the variables have to be transformed to the right unit (millions, billion, kilo, etc.), with a small procedure:

# transforms the property damage costs
stormDataSET$PROPDMGEXP = gsub("\\-|\\+|\\?","0", stormDataSET$PROPDMGEXP)
stormDataSET$PROPDMGEXP = gsub("B", "9", stormDataSET$PROPDMGEXP)
stormDataSET$PROPDMGEXP = gsub("M", "6", stormDataSET$PROPDMGEXP)
stormDataSET$PROPDMGEXP = gsub("K", "3", stormDataSET$PROPDMGEXP)
stormDataSET$PROPDMGEXP = gsub("H", "2", stormDataSET$PROPDMGEXP)
stormDataSET$PROPDMGEXP <- as.numeric(stormDataSET$PROPDMGEXP)
stormDataSET$PROPDMGEXP[is.na(stormDataSET$PROPDMGEXP)] = 0
stormDataSET$PropDMGTransf<- stormDataSET$PROPDMG * 10^stormDataSET$PROPDMGEXP

# transforms the crop damage costs
stormDataSET$CROPDMGEXP = gsub("\\-|\\+|\\?","0", stormDataSET$CROPDMGEXP)
stormDataSET$CROPDMGEXP = gsub("B", "9", stormDataSET$CROPDMGEXP)
stormDataSET$CROPDMGEXP = gsub("M", "6", stormDataSET$CROPDMGEXP)
stormDataSET$CROPDMGEXP = gsub("K", "3", stormDataSET$CROPDMGEXP)
stormDataSET$CROPDMGEXP = gsub("H", "2", stormDataSET$CROPDMGEXP)
stormDataSET$CROPDMGEXP <- as.numeric(stormDataSET$CROPDMGEXP)
stormDataSET$CROPDMGEXP[is.na(stormDataSET$CROPDMGEXP)] = 0
stormDataSET$CropDMGTransf<- stormDataSET$CROPDMG * 10^stormDataSET$CROPDMGEXP

Then the total costs of damage can be finaly calculated.

The cost of Property Damage (in US$) is calculated with:

impactProperty <- stormDataSET %>% 
                  group_by(EVTYPE) %>% 
                  summarize(totPropImpact=sum(PropDMGTransf)) %>%
                  arrange(desc(totPropImpact))
impactProperty <- impactProperty[1:15,]
head(impactProperty,15)
## Source: local data frame [15 x 2]
## 
##               EVTYPE totPropImpact
##               (fctr)         (dbl)
## 1              FLOOD  144022037057
## 2  HURRICANE/TYPHOON   69305840000
## 3        STORM SURGE   43193536000
## 4            TORNADO   24935939545
## 5        FLASH FLOOD   16047794571
## 6               HAIL   15048722103
## 7          HURRICANE   11812819010
## 8     TROPICAL STORM    7653335550
## 9          HIGH WIND    5259785375
## 10          WILDFIRE    4759064000
## 11  STORM SURGE/TIDE    4641188000
## 12         TSTM WIND    4482361440
## 13         ICE STORM    3643555810
## 14 THUNDERSTORM WIND    3399282992
## 15    HURRICANE OPAL    3172846000

Where we see that the highest levels of damage costs on properties (above US$ 10 billion) are due to FLOOD, HURRICANE/TYPHOON, STORM SURGE, TORNADO, FLASH FLOOD, HAIL and HURRICANE.

The cost of Crop Damage (in US$) is calculated with:

impactCrop <- stormDataSET %>% 
              group_by(EVTYPE) %>% 
              summarize(totCropImpact=sum(CropDMGTransf)) %>%
              arrange(desc(totCropImpact))
impactCrop <- impactCrop[1:15,]
head(impactCrop,15)
## Source: local data frame [15 x 2]
## 
##               EVTYPE totCropImpact
##               (fctr)         (dbl)
## 1            DROUGHT   13922066000
## 2              FLOOD    5422810400
## 3          HURRICANE    2741410000
## 4               HAIL    2614127070
## 5  HURRICANE/TYPHOON    2607872800
## 6        FLASH FLOOD    1343915000
## 7       EXTREME COLD    1292473000
## 8       FROST/FREEZE    1094086000
## 9         HEAVY RAIN     728399800
## 10    TROPICAL STORM     677836000
## 11         HIGH WIND     633561300
## 12         TSTM WIND     553947350
## 13    EXCESSIVE HEAT     492402000
## 14 THUNDERSTORM WIND     414354000
## 15              HEAT     401411500

Once again, we observe that the highest levels of damage costs on crop (above US$ 1 billion) are due to DROUGHT, FLOOD, HURRICANE, HAIL, HURRICANE/TYPHOON, FLASH FLOOD, EXTREME COLD and FROST/FREEZE.

A complete figure, with both data of damage all together is seen below.

impactTotalCost <- stormDataSET %>% 
                   group_by(EVTYPE) %>% 
                   summarize(totCrop=sum(CropDMGTransf),
                             totProp=sum(PropDMGTransf)) %>%
                   mutate(totCost=totCrop+totProp) %>%
                   arrange(desc(totCost))
impactTotalCost <- impactTotalCost[1:15,]
head(impactTotalCost,15)
## Source: local data frame [15 x 4]
## 
##               EVTYPE     totCrop      totProp      totCost
##               (fctr)       (dbl)        (dbl)        (dbl)
## 1              FLOOD  5422810400 144022037057 149444847457
## 2  HURRICANE/TYPHOON  2607872800  69305840000  71913712800
## 3        STORM SURGE        5000  43193536000  43193541000
## 4            TORNADO   296595770  24935939545  25232535315
## 5               HAIL  2614127070  15048722103  17662849173
## 6        FLASH FLOOD  1343915000  16047794571  17391709571
## 7            DROUGHT 13922066000   1046106000  14968172000
## 8          HURRICANE  2741410000  11812819010  14554229010
## 9     TROPICAL STORM   677836000   7653335550   8331171550
## 10         HIGH WIND   633561300   5259785375   5893346675
## 11          WILDFIRE   295472800   4759064000   5054536800
## 12         TSTM WIND   553947350   4482361440   5036308790
## 13  STORM SURGE/TIDE      850000   4641188000   4642038000
## 14 THUNDERSTORM WIND   414354000   3399282992   3813636992
## 15         ICE STORM    15660000   3643555810   3659215810

Taking the data above, we see that the most impacted states on costs are:

totalCostState <- stormDataSET %>% 
                  group_by(STATE) %>% 
                  summarize(totCropSt=sum(CropDMGTransf),
                            totPropSt=sum(PropDMGTransf)) %>%
                  mutate(totCostSt=totCropSt+totPropSt) %>%
                  arrange(desc(totCostSt))
totalCostState <- totalCostState[1:15,]
head(totalCostState,15)
## Source: local data frame [15 x 4]
## 
##     STATE  totCropSt    totPropSt    totCostSt
##    (fctr)      (dbl)        (dbl)        (dbl)
## 1      CA 3461941700 122421713252 125883654952
## 2      LA 1178141000  59029677250  60207818250
## 3      FL 3850241400  38834725270  42684966670
## 4      MS 1609752600  28696098452  30305851052
## 5      TX 7280847900  22648381290  29929229190
## 6      AL  539759240  10825484400  11365243640
## 7      NC 2048807400   7849194782   9898002182
## 8      IA 4590075660   3216192200   7806267860
## 9      MO  731448200   5869876270   6601324470
## 10     TN   19543500   6053418445   6072961945
## 11     ND  473010000   5222936400   5695946400
## 12     OH  361301900   4814507743   5175809643
## 13     OK 1106393550   3790507245   4896900795
## 14     MN  268853150   4256860720   4525713870
## 15     NY  160658100   4122348240   4283006340

California, Louisiana, Florida, Mississippi and Texas had most impact (damage costs) with the severe weather events.

A summary of the damages can be seen in the following graphs.

gbar04<- ggplot(impactProperty,aes(x=reorder(EVTYPE,-totPropImpact),
                                     y=totPropImpact,fill=EVTYPE)) +
                geom_bar(stat="identity",binwidth=1) + 
                ggtitle(expression(atop("Property Damage Cost",
                                        atop(italic("1995 - 2011"),"")))) +
                xlab("event") +
                ylab("total cost (in US$)") + 
                theme(axis.text.x=element_text(angle=45,hjust=1,vjust=1),
                      legend.position="none")

gbar05<- ggplot(impactCrop,aes(x=reorder(EVTYPE,-totCropImpact),
                                     y=totCropImpact,fill=EVTYPE)) +
                geom_bar(stat="identity",binwidth=1) + 
                ggtitle(expression(atop("Crop Damage Cost",
                                        atop(italic("1995 - 2011"),"")))) +
                xlab("event") +
                ylab("total cost(in US$)") + 
                theme(axis.text.x=element_text(angle=45,hjust=1,vjust=1),
                      legend.position="none")
                      

grid.arrange(gbar04, gbar05, ncol = 2)                      


Conclusion

Answering the questions proposed for this Peer Assessment, we can summarise the conclusions as follows:

Question 01:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  • Excessive Heat and Tornados are responsible for most of the fatalities due to weather events in the U.S. since 1995.
  • Tornados is the weather condition that causes most of the injuries.
  • In total (fatalities plus injuries), Texas and Missouri are the most impacted states. Most of the injuries are reported in those states.
  • Illinois is the state with the majority of the fatalities.

Question 02:
Across the United States, which types of events have the greatest economic consequences?

  • The highest levels of damage costs on properties (above US$ 10 billion) are due to FLOOD, HURRICANE/TYPHOON, STORM SURGE, TORNADO, FLASH FLOOD, HAIL and HURRICANE.
  • The highest levels of damage costs on crop (above US$ 1 billion) are due to DROUGHT, FLOOD, HURRICANE, HAIL, HURRICANE/TYPHOON, FLASH FLOOD, EXTREME COLD and FROST/FREEZE.
  • California, Louisiana, Florida, Mississippi and Texas had most impact (damage costs) with the severe weather events.