Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. Based on the data collected in the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, tracking characteristics of major storms and weather events in the United States, the objective of this assignment is to:
* Identify the types of events which are the most harmful to population health
* Identify which types of events have the greatest economic consequences.

The dataset for this analysis is available HERE.
The National Weather Service Storm Data Documentation provides information about the data collected.

Synopsis

The events in the database start in the year 1950 and end in November 2011. However, in the earlier years of the database there are generally fewer events recorded, therefore I excluded observations from years for which the volume of data is low and not representative (the histogram shows that data from 1950 to 1994 can be excluded).
To optimize the processing of data, I extracted only the variables relevant for the study. Also, in the raw dataset, there are about 1000 unique event types, whereas the offical event types table contains only 48 entries. Consequently, I decided to tidy up the data as follows:
* Fix typos, remove leading/trailing/subsequent spaces and combine obvious similar event types
* Re-categorize the event types not matching one of the categories from the official event type list and having the highest amount of observations, to an ‘official category’ where easily identifiable. The remaining event types will be combined as ‘uncategorized’.

Data Processing

STEP 1: Setting the environment

Note: Change the working directory accordingly before running the script (setwd command)

setwd("C:/RAGNIMY1/datasciencecoursera/RepData_PeerAssessment2")
rm(list = ls())
Sys.setlocale("LC_TIME", "English")
library(ggplot2)
library(tidyr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(stringr)
library(stringdist)
library(data.table)

## 
## Attaching package: 'data.table'

## The following objects are masked from 'package:dplyr':
## 
##     between, first, last

library(R.utils) #### required for fread to be able to read bz2 files

## Loading required package: R.oo

## Loading required package: R.methodsS3

## R.methodsS3 v1.7.1 (2016-02-15) successfully loaded. See ?R.methodsS3 for help.

## R.oo v1.22.0 (2018-04-21) successfully loaded. See ?R.oo for help.

## 
## Attaching package: 'R.oo'

## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods

## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save

## R.utils v2.9.0 successfully loaded. See ?R.utils for help.

## 
## Attaching package: 'R.utils'

## The following object is masked from 'package:tidyr':
## 
##     extract

## The following object is masked from 'package:utils':
## 
##     timestamp

## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, parse, warnings

library(gridExtra) #### used to arrange plots on a page

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

STEP 2: Downloading and reading the raw data

SrcFileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
DataFileZip <- "repdata_data_StormData.csv.bz2"

#### Check if zipped data file was already downloaded, if not, download the file
if (!file.exists(DataFileZip)){
        download.file(SrcFileURL, destfile=DataFileZip)
}
#### Read the raw dataset
StormDataFull <- fread(DataFileZip,stringsAsFactors = FALSE)

STEP 3: Analysing the data structure

str(StormDataFull)

## Classes 'data.table' and 'data.frame':   902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
##  - attr(*, ".internal.selfref")=<externalptr>

length(table(StormDataFull$EVTYPE))

## [1] 985

head(StormDataFull, n=10)

##     STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
##  1:       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
##  2:       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
##  3:       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
##  4:       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
##  5:       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
##  6:       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##  7:       1 11/16/1951 0:00:00     0100       CST      9     BLOUNT    AL
##  8:       1  1/22/1952 0:00:00     0900       CST    123 TALLAPOOSA    AL
##  9:       1  2/13/1952 0:00:00     2000       CST    125 TUSCALOOSA    AL
## 10:       1  2/13/1952 0:00:00     2000       CST     57    FAYETTE    AL
##      EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
##  1: TORNADO         0                                               0
##  2: TORNADO         0                                               0
##  3: TORNADO         0                                               0
##  4: TORNADO         0                                               0
##  5: TORNADO         0                                               0
##  6: TORNADO         0                                               0
##  7: TORNADO         0                                               0
##  8: TORNADO         0                                               0
##  9: TORNADO         0                                               0
## 10: TORNADO         0                                               0
##     COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
##  1:         NA         0                      14.0   100 3   0          0
##  2:         NA         0                       2.0   150 2   0          0
##  3:         NA         0                       0.1   123 2   0          0
##  4:         NA         0                       0.0   100 2   0          0
##  5:         NA         0                       0.0   150 2   0          0
##  6:         NA         0                       1.5   177 2   0          0
##  7:         NA         0                       1.5    33 2   0          0
##  8:         NA         0                       0.0    33 1   0          0
##  9:         NA         0                       3.3   100 3   0          1
## 10:         NA         0                       2.3   100 3   0          0
##     INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC
##  1:       15    25.0          K       0                          
##  2:        0     2.5          K       0                          
##  3:        2    25.0          K       0                          
##  4:        2     2.5          K       0                          
##  5:        2     2.5          K       0                          
##  6:        6     2.5          K       0                          
##  7:        1     2.5          K       0                          
##  8:        0     2.5          K       0                          
##  9:       14    25.0          K       0                          
## 10:        0    25.0          K       0                          
##     ZONENAMES LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
##  1:               3040      8812       3051       8806              1
##  2:               3042      8755          0          0              2
##  3:               3340      8742          0          0              3
##  4:               3458      8626          0          0              4
##  5:               3412      8642          0          0              5
##  6:               3450      8748          0          0              6
##  7:               3405      8631          0          0              7
##  8:               3255      8558          0          0              8
##  9:               3334      8740       3336       8738              9
## 10:               3336      8738       3337       8737             10

STEP 4: Subsetting the raw dataset

#### Extracting the variables relevant for the analysis
StormData <-  StormDataFull[ ,c("BGN_DATE","EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
#### Format the date and extract the year
StormData$Year <- as.numeric(format(as.Date(StormData$BGN_DATE, "%m/%d/%Y %H :%M :%S"), "%Y"))
#### Analyse of volume of the data collected by year
with(StormData,hist(Year, breaks=50, xlab="Year", main="Nr of observations by year"))

#### Extract the data collected from 1995 upward 
StormData <- StormData %>%
        filter(Year >= 1995) %>%
        mutate(EVTYPE=str_trim(toupper(EVTYPE))) %>%
        filter(!str_detect(EVTYPE, "SUMMARY")) %>%
        filter(FATALITIES>0|INJURIES>0|PROPDMG>0|CROPDMG>0)
nrEVTYPE <- length(table(StormData$EVTYPE))

Based on the volume of data collected by year, I decided to keep only the observations collected from 1995 upward, as the data volume before that date is low. In addition, I excluded all observations :
* Which contain ‘summary’ as event type as this is clearly an invalid entry
* for which neither an impact on population health was reported, nor on the economy.

STEP 5: Cleaning the data - Phase 1

After this initial data cleaning phase, we still have 330 event types (variable EVTYPE) in the reworked dataset, whereas the strom data event table provided by the OAA only contains 48 distinct categories. Therefore, I decided to fix typos and combine obvious similar event types.

StormData$EVTYPE <- gsub("-"," ",StormData$EVTYPE, fixed=TRUE)
StormData$EVTYPE <- gsub("[[:digit:]]","",StormData$EVTYPE)
StormData$EVTYPE <- gsub("TSTM|TUNDERSTORM|THUNDERTSORM|THUNDEERSTORM|THUNDERESTORM|THUNERSTORM","THUNDERSTORM",StormData$EVTYPE)
StormData$EVTYPE <- gsub("LIGHTING","LIGHTNING",StormData$EVTYPE)
StormData$EVTYPE <- gsub("WINDS|WND|WINS","WIND",StormData$EVTYPE)
StormData$EVTYPE <- gsub("RAINS","RAIN",StormData$EVTYPE)
StormData$EVTYPE <- gsub("TREES","TREE",StormData$EVTYPE)
StormData$EVTYPE <- gsub("WATERSPOUTS|WATER SPOUT","WATERSPOUT",StormData$EVTYPE)
StormData$EVTYPE <- gsub("HVY","HEAVY",StormData$EVTYPE)
StormData$EVTYPE <- gsub("FLD|FLOODING|FLOODS","FLOOD",StormData$EVTYPE)
StormData$EVTYPE <- gsub("MPH","",StormData$EVTYPE)
StormData$EVTYPE <- gsub("CSTL","COASTAL",StormData$EVTYPE)
StormData$EVTYPE <- gsub("CURRENTS","CURRENT",StormData$EVTYPE)
StormData$EVTYPE <- gsub("(G)","",StormData$EVTYPE,fixed=TRUE)
StormData$EVTYPE <- gsub("()","",StormData$EVTYPE,fixed=TRUE)
StormData$EVTYPE <- gsub(".","",StormData$EVTYPE,fixed=TRUE)
StormData$EVTYPE <- gsub(" )","",StormData$EVTYPE,fixed=TRUE)
StormData$EVTYPE <- gsub("/ ","/",StormData$EVTYPE,fixed=TRUE)
StormData$EVTYPE <- gsub("&","/",StormData$EVTYPE,fixed=TRUE)
StormData$EVTYPE <- gsub(" / ","/",StormData$EVTYPE,fixed=TRUE)
StormData$EVTYPE <- gsub("\\s+", " ",StormData$EVTYPE)

StormData$EVTYPE <- 
        ifelse(grepl("FIRE", StormData$EVTYPE),"WILDFIRE",
               ifelse(grepl("HEAT|DROUGHT|DRY|WARM|HOT", StormData$EVTYPE),"HEAT",
                      ifelse(grepl("AVALANCHE", StormData$EVTYPE),"AVALANCHE",
                             ifelse(grepl("TORNADO",StormData$EVTYPE),"TORNADO",
                                    ifelse(grepl("HURRICANE|TYPHOON", StormData$EVTYPE),"HURRICANE (TYPHOON)",StormData$EVTYPE)
                             ))))
StormData$EVTYPE <- str_trim(StormData$EVTYPE)
nrEVTYPE <- length(table(StormData$EVTYPE))

STEP 6: Cleaning the data - Phase 2

After the previous cleaning activity, the number of event types is reduced to 230. The next step is to match those with the official event table.

#### Create a list of official event types from as per documentation and match the unique values of the vent types in the data file with that list
EVTREF_TABLE <- toupper(c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme Cold/Wind Chill", "Flash Flood", "Flood", "Frost/Freeze", "Funnel Cloud", "Freezing Fog", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane (Typhoon)", "Ice Storm", "Lake-Effect Snow", "Lakeshore Flood", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Surge/Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather"))

EVTYPE <-unique(StormData$EVTYPE)
VECT_MATCH <- match(EVTYPE,EVTREF_TABLE) #### returns a vector of the positions of the matching Event type in the reference table; NA if not found
DF_MATCH <- data.frame(EVTYPE,VECT_MATCH)

#### DFFINAL_MATCH: for each (unique) event type in the reworked dataset, provides the position of the event in the reference table (NA if not found) and the number of observations 
EVTTYPE_FREQ <- as.data.frame(table(StormData$EVTYPE))
names(EVTTYPE_FREQ) <- c("EVTYPE","NR_OBS")
DFFINAL_MATCH <- arrange(merge(DF_MATCH,EVTTYPE_FREQ), desc(NR_OBS))
names(DFFINAL_MATCH) <- c("EVTYPE","POS_REFTABLE","NR_OBS")
head(DFFINAL_MATCH,n=20)

##                    EVTYPE POS_REFTABLE NR_OBS
## 1       THUNDERSTORM WIND           39 110244
## 2                    HAIL           19  23873
## 3             FLASH FLOOD           14  19959
## 4                 TORNADO           40  13018
## 5               LIGHTNING           29  12027
## 6                   FLOOD           15   9736
## 7               HIGH WIND           24   5730
## 8             STRONG WIND           38   3418
## 9            WINTER STORM           47   1479
## 10                   HEAT           20   1310
## 11               WILDFIRE           46   1242
## 12             HEAVY SNOW           22   1134
## 13             HEAVY RAIN           21   1085
## 14 URBAN/SML STREAM FLOOD           NA    702
## 15              ICE STORM           26    653
## 16            RIP CURRENT           34    630
## 17 THUNDERSTORM WIND/HAIL           NA    444
## 18         TROPICAL STORM           42    412
## 19         WINTER WEATHER           48    407
## 20              AVALANCHE            2    266

#### Split DFFINAL_MATCH into two dataframes: one containing the EVTYPES not in the reference table and one with the EVTYPES in the reference table
DFFINAL_MATCH_NOTNA <- DFFINAL_MATCH %>%
        subset(!is.na(POS_REFTABLE)) %>%
        mutate(EVTYPE_NEW=EVTYPE)

DFFINAL_MATCH_NA <- DFFINAL_MATCH %>%
        subset(is.na(POS_REFTABLE))

countNA <- count(DFFINAL_MATCH_NA)
head(DFFINAL_MATCH_NA, n=10)

##                    EVTYPE POS_REFTABLE NR_OBS
## 14 URBAN/SML STREAM FLOOD           NA    702
## 17 THUNDERSTORM WIND/HAIL           NA    444
## 23       LAKE EFFECT SNOW           NA    198
## 24              LANDSLIDE           NA    193
## 26           EXTREME COLD           NA    179
## 27            STORM SURGE           NA    169
## 29             LIGHT SNOW           NA    141
## 30     WINTER WEATHER/MIX           NA    139
## 34            RIVER FLOOD           NA    109
## 35                    FOG           NA    104

There are still 186 event types which do not match the official table. Hence as last cleanup activity I looked at the ones with the highest number of observations and assigned them to an official category where easily identifiable.

DFFINAL_MATCH_NA$EVTYPE_NEW <- 
ifelse(grepl("FLASH +FLOOD", DFFINAL_MATCH_NA$EVTYPE),"FLASH FLOOD",
ifelse(grepl("COASTAL +FLOOD", DFFINAL_MATCH_NA$EVTYPE),"COASTAL FLOOD",
ifelse(grepl("^THUNDERSTORM WIND",DFFINAL_MATCH_NA$EVTYPE),"THUNDERSTORM WIND",
ifelse(grepl("FLOOD", DFFINAL_MATCH_NA$EVTYPE),"FLOOD",
ifelse(grepl("*HIGH +SURF",DFFINAL_MATCH_NA$EVTYPE),"HIGH SURF",
ifelse(grepl("COLD|FROST|FREEZE|FREEZING|WINTER", DFFINAL_MATCH_NA$EVTYPE),"COLD/WIND CHILL","UNCATEGORIZED")
)))))
#### Combine again all event types together, the final category being in the EVETYPE_NEW field
EVTYPE_FINAL <- rbind(DFFINAL_MATCH_NOTNA,DFFINAL_MATCH_NA)
EVTYPE_FINAL$EVTYPE_NEW <- as.character(EVTYPE_FINAL$EVTYPE_NEW)
EVTYPE_FINAL <- select(EVTYPE_FINAL,-NR_OBS, -POS_REFTABLE )
StormDataFinal <- merge(StormData, EVTYPE_FINAL, all.x = TRUE) ##211775

STEP 7: Calculate the financial impact on property and crop based on the magnitude factors defined in the fields PROPDMGEXP and CROPDMGEXP

K/k/3 –> 1000 = thousands if $
M/M/6 —> 1000000 = millions of $
H/h/2 –> 100 = hundreds of $
B —> 1000000000 = billions of $……
“” –> 1 = $
exotic exponent (punctuation characters) –> 0

unique(StormDataFinal$PROPDMGEXP)

##  [1] ""  "K" "M" "+" "7" "0" "B" "5" "4" "H" "m" "-" "6" "2" "3"

unique(StormDataFinal$CROPDMGEXP)

## [1] "M" ""  "K" "k" "?" "B" "0" "m"

StormDataFinal$PROPDMGEXP <- gsub("K|3","1000",StormDataFinal$PROPDMGEXP)
StormDataFinal$PROPDMGEXP <- gsub("M|m|6","1000000",StormDataFinal$PROPDMGEXP)
StormDataFinal$PROPDMGEXP <- gsub("B","1000000000",StormDataFinal$PROPDMGEXP)
StormDataFinal$PROPDMGEXP <- gsub("H|2","100",StormDataFinal$PROPDMGEXP) 
StormDataFinal$PROPDMGEXP <- gsub("4","10000",StormDataFinal$PROPDMGEXP)
StormDataFinal$PROPDMGEXP <- gsub("5","100000",StormDataFinal$PROPDMGEXP) 
StormDataFinal$PROPDMGEXP <- gsub("7","10000000",StormDataFinal$PROPDMGEXP)
StormDataFinal$PROPDMGEXP <- gsub("[[:punct:]]","0",StormDataFinal$PROPDMGEXP)
StormDataFinal$PROPDMGEXP <- gsub("^$","1",StormDataFinal$PROPDMGEXP)
StormDataFinal$PROPDMGEXP <- as.numeric(StormDataFinal$PROPDMGEXP)
StormDataFinal$CROPDMGEXP <- gsub("K|k","1000",StormDataFinal$CROPDMGEXP)
StormDataFinal$CROPDMGEXP <- gsub("B","1000000000",StormDataFinal$CROPDMGEXP)
StormDataFinal$CROPDMGEXP <- gsub("M|m","1000000",StormDataFinal$CROPDMGEXP)
StormDataFinal$CROPDMGEXP <- gsub("[[:punct:]]","0",StormDataFinal$CROPDMGEXP)
StormDataFinal$CROPDMGEXP <- gsub("^$","1",StormDataFinal$CROPDMGEXP)
StormDataFinal$CROPDMGEXP<- as.numeric(StormDataFinal$CROPDMGEXP)
StormDataFinal <- StormDataFinal %>%
        mutate(PROPDMGTOT=PROPDMG*PROPDMGEXP,CROPDMGTOT=CROPDMG*CROPDMGEXP)

Results

Question 1: Across the United States, which types of events are most harmful with respect to population health?

IMPACT_HEALTH <- StormDataFinal %>%
        group_by(EVENT=EVTYPE_NEW) %>%
        summarise(FATALITIES=sum(FATALITIES), INJURIES=sum(INJURIES), TOTAL=sum(FATALITIES,INJURIES)) %>%
        arrange(desc(TOTAL))

Graph_FATALITIES <- top_n(IMPACT_HEALTH, n=5, FATALITIES) %>%
 ggplot(.,aes(x=reorder(EVENT,-FATALITIES),y=FATALITIES)) + 
        geom_bar(fill="blue", stat="identity") +         
        ggtitle("Fatalities - Top 5 Events") +
        labs(x = "Event Type", y = "Nr of Fatalities") +
        theme(axis.text.x=element_text(size=9, angle=20, hjust=1),plot.title=element_text(size=11, hjust=0.5))

Graph_INJURIES <- top_n(IMPACT_HEALTH, n=5, INJURIES) %>%
 ggplot(.,aes(x=reorder(EVENT,-INJURIES),y=INJURIES)) + 
        geom_bar(fill="blue", stat="identity") +
        ggtitle("Injuries - Top 5 Events") +
        labs(x = "Event Type", y = "Nr of Injuries")  +
        theme(axis.text.x=element_text(size=9, angle=20, hjust=1),plot.title=element_text(size=11, hjust=0.5))

Graph_HEALTH <- top_n(IMPACT_HEALTH, n=5, TOTAL) %>%
 ggplot(.,aes(x=reorder(EVENT,-TOTAL),y=TOTAL)) + 
        geom_bar(fill="purple", stat="identity") +
        ggtitle("Impact on population health\n(Injuries & Fatalities)") +
        labs(x = "Event Type", y = "Nr of Injuries & Fatalities")  +
        theme(axis.text.x=element_text(size=9, angle=20, hjust=1),plot.title=element_text(size=11, hjust=0.5))

grid.arrange(Graph_FATALITIES,Graph_INJURIES,Graph_HEALTH,ncol=3, top="Weather events with the highest impact on population health in the US from 1995 to 2011")

The above plot highlights the most harmful event types with respect to population health in the United States from 1995 to 2011.
Injuries and fatalities combined, my analysis shows that the top five events are: TORNADO, HEAT, FLOOD, THUNDERSTORM WIND and LIGHTNING.

Question 2: Across the United States, which types of events have the greatest economic consequences?

IMPACT_DAMAGE <- StormDataFinal%>%
        group_by(EVENT=EVTYPE_NEW) %>%
        summarise(PROPDMGTOT=sum(PROPDMGTOT)/1000000000, CROPDMGTOT=sum(CROPDMGTOT)/1000000000, TOTALDMG=sum(PROPDMGTOT,CROPDMGTOT)) %>%
        arrange(desc(TOTALDMG))

Graph_PROPDAMAGE <- top_n(IMPACT_DAMAGE , n=5, PROPDMGTOT) %>%
 ggplot(.,aes(x=reorder(EVENT,-PROPDMGTOT),y=PROPDMGTOT)) + 
        geom_bar(fill="blue", stat="identity") +         
        ggtitle("Property damage - Financial impact\nTop 5 event types") +
        labs(x = "Event Type", y = "Property Damage in Billions of $") +
        theme(axis.text.x=element_text(size=9, angle=20, hjust=1),plot.title=element_text(size=11, hjust=0.5))

Graph_CORPDAMAGE <- top_n(IMPACT_DAMAGE, n=5, CROPDMGTOT) %>%
 ggplot(.,aes(x=reorder(EVENT,-CROPDMGTOT),y=CROPDMGTOT)) + 
        geom_bar(fill="blue", stat="identity") +
        ggtitle("Corp damage - Financial impact\nTop 5 event types") +
        labs(x = "Event Type", y = "Corp Damage in Billions of $")  +
        theme(axis.text.x=element_text(size=9, angle=20, hjust=1),plot.title=element_text(size=11, hjust=0.5))

Graph_TOTALDAMAGE <- top_n(IMPACT_DAMAGE, n=5, TOTALDMG) %>%
 ggplot(.,aes(x=reorder(EVENT,-TOTALDMG),y=TOTALDMG)) + 
        geom_bar(fill="purple", stat="identity") +
        ggtitle("Total Financial impact\n(Corp & Property damage)\nTop 5 event types") +
        labs(x = "Event Type", y = "Property & Corp Damage in Billions of $")  +
        theme(axis.text.x=element_text(size=9, angle=20, hjust=1),plot.title=element_text(size=11, hjust=0.5))

grid.arrange(Graph_PROPDAMAGE,Graph_CORPDAMAGE,Graph_TOTALDAMAGE,ncol=3, top="Weather events with the greatest economic impact in the US from 1995 to 2011")

The above plot shows that FLOOD, HURRICANE (TYPHOON), TORNADO and FLASH FLOOD have the greates economic impact.

Reproducible Research: Peer Assessment 2 Impact of severe weather events on health and on the economy in the United States

Myriam Ragni

29 Sept. 2019