Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. Based on the data collected in the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, tracking characteristics of major storms and weather events in the United States, the objective of this assignment is to:
* Identify the types of events which are the most harmful to population health
* Identify which types of events have the greatest economic consequences.
The dataset for this analysis is available HERE.
The National Weather Service Storm Data Documentation provides information about the data collected.
The events in the database start in the year 1950 and end in November 2011. However, in the earlier years of the database there are generally fewer events recorded, therefore I excluded observations from years for which the volume of data is low and not representative (the histogram shows that data from 1950 to 1994 can be excluded).
To optimize the processing of data, I extracted only the variables relevant for the study. Also, in the raw dataset, there are about 1000 unique event types, whereas the offical event types table contains only 48 entries. Consequently, I decided to tidy up the data as follows:
* Fix typos, remove leading/trailing/subsequent spaces and combine obvious similar event types
* Re-categorize the event types not matching one of the categories from the official event type list and having the highest amount of observations, to an ‘official category’ where easily identifiable. The remaining event types will be combined as ‘uncategorized’.
Note: Change the working directory accordingly before running the script (setwd command)
setwd("C:/RAGNIMY1/datasciencecoursera/RepData_PeerAssessment2")
rm(list = ls())
Sys.setlocale("LC_TIME", "English")
library(ggplot2)
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(stringr)
library(stringdist)
library(data.table)
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
library(R.utils) #### required for fread to be able to read bz2 files
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.7.1 (2016-02-15) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.22.0 (2018-04-21) successfully loaded. See ?R.oo for help.
##
## Attaching package: 'R.oo'
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
## The following objects are masked from 'package:base':
##
## attach, detach, gc, load, save
## R.utils v2.9.0 successfully loaded. See ?R.utils for help.
##
## Attaching package: 'R.utils'
## The following object is masked from 'package:tidyr':
##
## extract
## The following object is masked from 'package:utils':
##
## timestamp
## The following objects are masked from 'package:base':
##
## cat, commandArgs, getOption, inherits, isOpen, parse, warnings
library(gridExtra) #### used to arrange plots on a page
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
SrcFileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
DataFileZip <- "repdata_data_StormData.csv.bz2"
#### Check if zipped data file was already downloaded, if not, download the file
if (!file.exists(DataFileZip)){
download.file(SrcFileURL, destfile=DataFileZip)
}
#### Read the raw dataset
StormDataFull <- fread(DataFileZip,stringsAsFactors = FALSE)
str(StormDataFull)
## Classes 'data.table' and 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
## - attr(*, ".internal.selfref")=<externalptr>
length(table(StormDataFull$EVTYPE))
## [1] 985
head(StormDataFull, n=10)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1: 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2: 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3: 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4: 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5: 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6: 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## 7: 1 11/16/1951 0:00:00 0100 CST 9 BLOUNT AL
## 8: 1 1/22/1952 0:00:00 0900 CST 123 TALLAPOOSA AL
## 9: 1 2/13/1952 0:00:00 2000 CST 125 TUSCALOOSA AL
## 10: 1 2/13/1952 0:00:00 2000 CST 57 FAYETTE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1: TORNADO 0 0
## 2: TORNADO 0 0
## 3: TORNADO 0 0
## 4: TORNADO 0 0
## 5: TORNADO 0 0
## 6: TORNADO 0 0
## 7: TORNADO 0 0
## 8: TORNADO 0 0
## 9: TORNADO 0 0
## 10: TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1: NA 0 14.0 100 3 0 0
## 2: NA 0 2.0 150 2 0 0
## 3: NA 0 0.1 123 2 0 0
## 4: NA 0 0.0 100 2 0 0
## 5: NA 0 0.0 150 2 0 0
## 6: NA 0 1.5 177 2 0 0
## 7: NA 0 1.5 33 2 0 0
## 8: NA 0 0.0 33 1 0 0
## 9: NA 0 3.3 100 3 0 1
## 10: NA 0 2.3 100 3 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC
## 1: 15 25.0 K 0
## 2: 0 2.5 K 0
## 3: 2 25.0 K 0
## 4: 2 2.5 K 0
## 5: 2 2.5 K 0
## 6: 6 2.5 K 0
## 7: 1 2.5 K 0
## 8: 0 2.5 K 0
## 9: 14 25.0 K 0
## 10: 0 25.0 K 0
## ZONENAMES LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1: 3040 8812 3051 8806 1
## 2: 3042 8755 0 0 2
## 3: 3340 8742 0 0 3
## 4: 3458 8626 0 0 4
## 5: 3412 8642 0 0 5
## 6: 3450 8748 0 0 6
## 7: 3405 8631 0 0 7
## 8: 3255 8558 0 0 8
## 9: 3334 8740 3336 8738 9
## 10: 3336 8738 3337 8737 10
#### Extracting the variables relevant for the analysis
StormData <- StormDataFull[ ,c("BGN_DATE","EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
#### Format the date and extract the year
StormData$Year <- as.numeric(format(as.Date(StormData$BGN_DATE, "%m/%d/%Y %H :%M :%S"), "%Y"))
#### Analyse of volume of the data collected by year
with(StormData,hist(Year, breaks=50, xlab="Year", main="Nr of observations by year"))
#### Extract the data collected from 1995 upward
StormData <- StormData %>%
filter(Year >= 1995) %>%
mutate(EVTYPE=str_trim(toupper(EVTYPE))) %>%
filter(!str_detect(EVTYPE, "SUMMARY")) %>%
filter(FATALITIES>0|INJURIES>0|PROPDMG>0|CROPDMG>0)
nrEVTYPE <- length(table(StormData$EVTYPE))
Based on the volume of data collected by year, I decided to keep only the observations collected from 1995 upward, as the data volume before that date is low. In addition, I excluded all observations :
* Which contain ‘summary’ as event type as this is clearly an invalid entry
* for which neither an impact on population health was reported, nor on the economy.
After this initial data cleaning phase, we still have 330 event types (variable EVTYPE) in the reworked dataset, whereas the strom data event table provided by the OAA only contains 48 distinct categories. Therefore, I decided to fix typos and combine obvious similar event types.
StormData$EVTYPE <- gsub("-"," ",StormData$EVTYPE, fixed=TRUE)
StormData$EVTYPE <- gsub("[[:digit:]]","",StormData$EVTYPE)
StormData$EVTYPE <- gsub("TSTM|TUNDERSTORM|THUNDERTSORM|THUNDEERSTORM|THUNDERESTORM|THUNERSTORM","THUNDERSTORM",StormData$EVTYPE)
StormData$EVTYPE <- gsub("LIGHTING","LIGHTNING",StormData$EVTYPE)
StormData$EVTYPE <- gsub("WINDS|WND|WINS","WIND",StormData$EVTYPE)
StormData$EVTYPE <- gsub("RAINS","RAIN",StormData$EVTYPE)
StormData$EVTYPE <- gsub("TREES","TREE",StormData$EVTYPE)
StormData$EVTYPE <- gsub("WATERSPOUTS|WATER SPOUT","WATERSPOUT",StormData$EVTYPE)
StormData$EVTYPE <- gsub("HVY","HEAVY",StormData$EVTYPE)
StormData$EVTYPE <- gsub("FLD|FLOODING|FLOODS","FLOOD",StormData$EVTYPE)
StormData$EVTYPE <- gsub("MPH","",StormData$EVTYPE)
StormData$EVTYPE <- gsub("CSTL","COASTAL",StormData$EVTYPE)
StormData$EVTYPE <- gsub("CURRENTS","CURRENT",StormData$EVTYPE)
StormData$EVTYPE <- gsub("(G)","",StormData$EVTYPE,fixed=TRUE)
StormData$EVTYPE <- gsub("()","",StormData$EVTYPE,fixed=TRUE)
StormData$EVTYPE <- gsub(".","",StormData$EVTYPE,fixed=TRUE)
StormData$EVTYPE <- gsub(" )","",StormData$EVTYPE,fixed=TRUE)
StormData$EVTYPE <- gsub("/ ","/",StormData$EVTYPE,fixed=TRUE)
StormData$EVTYPE <- gsub("&","/",StormData$EVTYPE,fixed=TRUE)
StormData$EVTYPE <- gsub(" / ","/",StormData$EVTYPE,fixed=TRUE)
StormData$EVTYPE <- gsub("\\s+", " ",StormData$EVTYPE)
StormData$EVTYPE <-
ifelse(grepl("FIRE", StormData$EVTYPE),"WILDFIRE",
ifelse(grepl("HEAT|DROUGHT|DRY|WARM|HOT", StormData$EVTYPE),"HEAT",
ifelse(grepl("AVALANCHE", StormData$EVTYPE),"AVALANCHE",
ifelse(grepl("TORNADO",StormData$EVTYPE),"TORNADO",
ifelse(grepl("HURRICANE|TYPHOON", StormData$EVTYPE),"HURRICANE (TYPHOON)",StormData$EVTYPE)
))))
StormData$EVTYPE <- str_trim(StormData$EVTYPE)
nrEVTYPE <- length(table(StormData$EVTYPE))
After the previous cleaning activity, the number of event types is reduced to 230. The next step is to match those with the official event table.
#### Create a list of official event types from as per documentation and match the unique values of the vent types in the data file with that list
EVTREF_TABLE <- toupper(c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme Cold/Wind Chill", "Flash Flood", "Flood", "Frost/Freeze", "Funnel Cloud", "Freezing Fog", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane (Typhoon)", "Ice Storm", "Lake-Effect Snow", "Lakeshore Flood", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Surge/Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather"))
EVTYPE <-unique(StormData$EVTYPE)
VECT_MATCH <- match(EVTYPE,EVTREF_TABLE) #### returns a vector of the positions of the matching Event type in the reference table; NA if not found
DF_MATCH <- data.frame(EVTYPE,VECT_MATCH)
#### DFFINAL_MATCH: for each (unique) event type in the reworked dataset, provides the position of the event in the reference table (NA if not found) and the number of observations
EVTTYPE_FREQ <- as.data.frame(table(StormData$EVTYPE))
names(EVTTYPE_FREQ) <- c("EVTYPE","NR_OBS")
DFFINAL_MATCH <- arrange(merge(DF_MATCH,EVTTYPE_FREQ), desc(NR_OBS))
names(DFFINAL_MATCH) <- c("EVTYPE","POS_REFTABLE","NR_OBS")
head(DFFINAL_MATCH,n=20)
## EVTYPE POS_REFTABLE NR_OBS
## 1 THUNDERSTORM WIND 39 110244
## 2 HAIL 19 23873
## 3 FLASH FLOOD 14 19959
## 4 TORNADO 40 13018
## 5 LIGHTNING 29 12027
## 6 FLOOD 15 9736
## 7 HIGH WIND 24 5730
## 8 STRONG WIND 38 3418
## 9 WINTER STORM 47 1479
## 10 HEAT 20 1310
## 11 WILDFIRE 46 1242
## 12 HEAVY SNOW 22 1134
## 13 HEAVY RAIN 21 1085
## 14 URBAN/SML STREAM FLOOD NA 702
## 15 ICE STORM 26 653
## 16 RIP CURRENT 34 630
## 17 THUNDERSTORM WIND/HAIL NA 444
## 18 TROPICAL STORM 42 412
## 19 WINTER WEATHER 48 407
## 20 AVALANCHE 2 266
#### Split DFFINAL_MATCH into two dataframes: one containing the EVTYPES not in the reference table and one with the EVTYPES in the reference table
DFFINAL_MATCH_NOTNA <- DFFINAL_MATCH %>%
subset(!is.na(POS_REFTABLE)) %>%
mutate(EVTYPE_NEW=EVTYPE)
DFFINAL_MATCH_NA <- DFFINAL_MATCH %>%
subset(is.na(POS_REFTABLE))
countNA <- count(DFFINAL_MATCH_NA)
head(DFFINAL_MATCH_NA, n=10)
## EVTYPE POS_REFTABLE NR_OBS
## 14 URBAN/SML STREAM FLOOD NA 702
## 17 THUNDERSTORM WIND/HAIL NA 444
## 23 LAKE EFFECT SNOW NA 198
## 24 LANDSLIDE NA 193
## 26 EXTREME COLD NA 179
## 27 STORM SURGE NA 169
## 29 LIGHT SNOW NA 141
## 30 WINTER WEATHER/MIX NA 139
## 34 RIVER FLOOD NA 109
## 35 FOG NA 104
There are still 186 event types which do not match the official table. Hence as last cleanup activity I looked at the ones with the highest number of observations and assigned them to an official category where easily identifiable.
DFFINAL_MATCH_NA$EVTYPE_NEW <-
ifelse(grepl("FLASH +FLOOD", DFFINAL_MATCH_NA$EVTYPE),"FLASH FLOOD",
ifelse(grepl("COASTAL +FLOOD", DFFINAL_MATCH_NA$EVTYPE),"COASTAL FLOOD",
ifelse(grepl("^THUNDERSTORM WIND",DFFINAL_MATCH_NA$EVTYPE),"THUNDERSTORM WIND",
ifelse(grepl("FLOOD", DFFINAL_MATCH_NA$EVTYPE),"FLOOD",
ifelse(grepl("*HIGH +SURF",DFFINAL_MATCH_NA$EVTYPE),"HIGH SURF",
ifelse(grepl("COLD|FROST|FREEZE|FREEZING|WINTER", DFFINAL_MATCH_NA$EVTYPE),"COLD/WIND CHILL","UNCATEGORIZED")
)))))
#### Combine again all event types together, the final category being in the EVETYPE_NEW field
EVTYPE_FINAL <- rbind(DFFINAL_MATCH_NOTNA,DFFINAL_MATCH_NA)
EVTYPE_FINAL$EVTYPE_NEW <- as.character(EVTYPE_FINAL$EVTYPE_NEW)
EVTYPE_FINAL <- select(EVTYPE_FINAL,-NR_OBS, -POS_REFTABLE )
StormDataFinal <- merge(StormData, EVTYPE_FINAL, all.x = TRUE) ##211775
unique(StormDataFinal$PROPDMGEXP)
## [1] "" "K" "M" "+" "7" "0" "B" "5" "4" "H" "m" "-" "6" "2" "3"
unique(StormDataFinal$CROPDMGEXP)
## [1] "M" "" "K" "k" "?" "B" "0" "m"
StormDataFinal$PROPDMGEXP <- gsub("K|3","1000",StormDataFinal$PROPDMGEXP)
StormDataFinal$PROPDMGEXP <- gsub("M|m|6","1000000",StormDataFinal$PROPDMGEXP)
StormDataFinal$PROPDMGEXP <- gsub("B","1000000000",StormDataFinal$PROPDMGEXP)
StormDataFinal$PROPDMGEXP <- gsub("H|2","100",StormDataFinal$PROPDMGEXP)
StormDataFinal$PROPDMGEXP <- gsub("4","10000",StormDataFinal$PROPDMGEXP)
StormDataFinal$PROPDMGEXP <- gsub("5","100000",StormDataFinal$PROPDMGEXP)
StormDataFinal$PROPDMGEXP <- gsub("7","10000000",StormDataFinal$PROPDMGEXP)
StormDataFinal$PROPDMGEXP <- gsub("[[:punct:]]","0",StormDataFinal$PROPDMGEXP)
StormDataFinal$PROPDMGEXP <- gsub("^$","1",StormDataFinal$PROPDMGEXP)
StormDataFinal$PROPDMGEXP <- as.numeric(StormDataFinal$PROPDMGEXP)
StormDataFinal$CROPDMGEXP <- gsub("K|k","1000",StormDataFinal$CROPDMGEXP)
StormDataFinal$CROPDMGEXP <- gsub("B","1000000000",StormDataFinal$CROPDMGEXP)
StormDataFinal$CROPDMGEXP <- gsub("M|m","1000000",StormDataFinal$CROPDMGEXP)
StormDataFinal$CROPDMGEXP <- gsub("[[:punct:]]","0",StormDataFinal$CROPDMGEXP)
StormDataFinal$CROPDMGEXP <- gsub("^$","1",StormDataFinal$CROPDMGEXP)
StormDataFinal$CROPDMGEXP<- as.numeric(StormDataFinal$CROPDMGEXP)
StormDataFinal <- StormDataFinal %>%
mutate(PROPDMGTOT=PROPDMG*PROPDMGEXP,CROPDMGTOT=CROPDMG*CROPDMGEXP)
Question 1: Across the United States, which types of events are most harmful with respect to population health?
IMPACT_HEALTH <- StormDataFinal %>%
group_by(EVENT=EVTYPE_NEW) %>%
summarise(FATALITIES=sum(FATALITIES), INJURIES=sum(INJURIES), TOTAL=sum(FATALITIES,INJURIES)) %>%
arrange(desc(TOTAL))
Graph_FATALITIES <- top_n(IMPACT_HEALTH, n=5, FATALITIES) %>%
ggplot(.,aes(x=reorder(EVENT,-FATALITIES),y=FATALITIES)) +
geom_bar(fill="blue", stat="identity") +
ggtitle("Fatalities - Top 5 Events") +
labs(x = "Event Type", y = "Nr of Fatalities") +
theme(axis.text.x=element_text(size=9, angle=20, hjust=1),plot.title=element_text(size=11, hjust=0.5))
Graph_INJURIES <- top_n(IMPACT_HEALTH, n=5, INJURIES) %>%
ggplot(.,aes(x=reorder(EVENT,-INJURIES),y=INJURIES)) +
geom_bar(fill="blue", stat="identity") +
ggtitle("Injuries - Top 5 Events") +
labs(x = "Event Type", y = "Nr of Injuries") +
theme(axis.text.x=element_text(size=9, angle=20, hjust=1),plot.title=element_text(size=11, hjust=0.5))
Graph_HEALTH <- top_n(IMPACT_HEALTH, n=5, TOTAL) %>%
ggplot(.,aes(x=reorder(EVENT,-TOTAL),y=TOTAL)) +
geom_bar(fill="purple", stat="identity") +
ggtitle("Impact on population health\n(Injuries & Fatalities)") +
labs(x = "Event Type", y = "Nr of Injuries & Fatalities") +
theme(axis.text.x=element_text(size=9, angle=20, hjust=1),plot.title=element_text(size=11, hjust=0.5))
grid.arrange(Graph_FATALITIES,Graph_INJURIES,Graph_HEALTH,ncol=3, top="Weather events with the highest impact on population health in the US from 1995 to 2011")
The above plot highlights the most harmful event types with respect to population health in the United States from 1995 to 2011.
Injuries and fatalities combined, my analysis shows that the top five events are: TORNADO, HEAT, FLOOD, THUNDERSTORM WIND and LIGHTNING.
Question 2: Across the United States, which types of events have the greatest economic consequences?
IMPACT_DAMAGE <- StormDataFinal%>%
group_by(EVENT=EVTYPE_NEW) %>%
summarise(PROPDMGTOT=sum(PROPDMGTOT)/1000000000, CROPDMGTOT=sum(CROPDMGTOT)/1000000000, TOTALDMG=sum(PROPDMGTOT,CROPDMGTOT)) %>%
arrange(desc(TOTALDMG))
Graph_PROPDAMAGE <- top_n(IMPACT_DAMAGE , n=5, PROPDMGTOT) %>%
ggplot(.,aes(x=reorder(EVENT,-PROPDMGTOT),y=PROPDMGTOT)) +
geom_bar(fill="blue", stat="identity") +
ggtitle("Property damage - Financial impact\nTop 5 event types") +
labs(x = "Event Type", y = "Property Damage in Billions of $") +
theme(axis.text.x=element_text(size=9, angle=20, hjust=1),plot.title=element_text(size=11, hjust=0.5))
Graph_CORPDAMAGE <- top_n(IMPACT_DAMAGE, n=5, CROPDMGTOT) %>%
ggplot(.,aes(x=reorder(EVENT,-CROPDMGTOT),y=CROPDMGTOT)) +
geom_bar(fill="blue", stat="identity") +
ggtitle("Corp damage - Financial impact\nTop 5 event types") +
labs(x = "Event Type", y = "Corp Damage in Billions of $") +
theme(axis.text.x=element_text(size=9, angle=20, hjust=1),plot.title=element_text(size=11, hjust=0.5))
Graph_TOTALDAMAGE <- top_n(IMPACT_DAMAGE, n=5, TOTALDMG) %>%
ggplot(.,aes(x=reorder(EVENT,-TOTALDMG),y=TOTALDMG)) +
geom_bar(fill="purple", stat="identity") +
ggtitle("Total Financial impact\n(Corp & Property damage)\nTop 5 event types") +
labs(x = "Event Type", y = "Property & Corp Damage in Billions of $") +
theme(axis.text.x=element_text(size=9, angle=20, hjust=1),plot.title=element_text(size=11, hjust=0.5))
grid.arrange(Graph_PROPDAMAGE,Graph_CORPDAMAGE,Graph_TOTALDAMAGE,ncol=3, top="Weather events with the greatest economic impact in the US from 1995 to 2011")
The above plot shows that FLOOD, HURRICANE (TYPHOON), TORNADO and FLASH FLOOD have the greates economic impact.