Note: This is a slightly enhanced version of the Reproducible Research project assignment 2, containing more than the allowed three figures and a map showing the frequency of weatehr events by states.
This report aims at identifying the most harmful weather events in the U.S. as measured from 1950 to 2011. It shows which events have had the greatest impact on public health and the economy. The report is based on the Storm Data published by the the National Oceanic and Atmospheric Administration (NOAA).
These data measure the number of fatalities and injuries as well as the values of damages on property and crops for each event. Specifically, the report addresses the two following questions:
It suggests that, overall, floods and tornados are the most harmful weather events in the U.S.
Note: We created the sub-directory “ds-project-52” for this project.
Load the packages used for this report.
library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.7.1 (2016-02-15) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.20.0 (2016-02-17) successfully loaded. See ?R.oo for help.
##
## Attaching package: 'R.oo'
## Die folgenden Objekte sind maskiert von 'package:methods':
##
## getClasses, getMethods
## Die folgenden Objekte sind maskiert von 'package:base':
##
## attach, detach, gc, load, save
## R.utils v2.3.0 (2016-04-13) successfully loaded. See ?R.utils for help.
##
## Attaching package: 'R.utils'
## Das folgende Objekt ist maskiert 'package:utils':
##
## timestamp
## Die folgenden Objekte sind maskiert von 'package:base':
##
## cat, commandArgs, getOption, inherits, isOpen, parse, warnings
library(ggplot2)
library(gridExtra)
library(plyr)
library(ggmap)
We are obtaining the data from the course web site Storm Data.
#Download and unzip zip file, read in the data
if (!file.exists("./ds-project-52/repdata_dat_StormData.csv")) {
getzip <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
ziparch <- "./ds-project-52/repdata_dat_StormData.csv.bz2"
download.file(getzip, destfile = ziparch)
bunzip2(ziparch, destname = "./ds-project-52/repdata_dat_StormData.csv")
}
dataraw <- read.csv("./ds-project-52/repdata_dat_StormData.csv", header=TRUE, stringsAsFactors=FALSE)
str(dataraw)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
#Checking for NA
sum (is.na(dataraw))
## [1] 1745947
For the questions to answer here we need only the following variables:
We don’t want to loose completely the time and space dimensions. So we keep also these variables:
We reduce the data set to these variables.
datared<-dataraw[, c("BGN_DATE", "STATE", "EVTYPE","FATALITIES","INJURIES","PROPDMG", "PROPDMGEXP","CROPDMG","CROPDMGEXP")]
In order to answer the questions of this report the variable EVTYPE is crucial. So let’s have a closer look at it. First, we look at how many types there are.
types_num0 <- length(sort(unique(datared$EVTYPE)))
There are 985 types, which seems far too many.
A closer look at all of these types shows that they are rather incoherently denominated. I.e. leading spaces, upper- and lowercases mixed (“Beach Erosion”, “BEACH EROSION”), typos, (“BEACH EROSIN”), abreviations (“TSTM”, “WND”), singular/plural (“THUNDERSTRORM WIND”, “THUNDERSTORM WINDS”, spaces (“WILD FIRES”, WILDFIRES). Displaying all types would unduly inflate this report, so we will just list the first 40 types for illustration.
evtype <- sort(unique(datared$EVTYPE))
head(evtype, 40)
## [1] " HIGH SURF ADVISORY" " COASTAL FLOOD"
## [3] " FLASH FLOOD" " LIGHTNING"
## [5] " TSTM WIND" " TSTM WIND (G45)"
## [7] " WATERSPOUT" " WIND"
## [9] "?" "ABNORMAL WARMTH"
## [11] "ABNORMALLY DRY" "ABNORMALLY WET"
## [13] "ACCUMULATED SNOWFALL" "AGRICULTURAL FREEZE"
## [15] "APACHE COUNTY" "ASTRONOMICAL HIGH TIDE"
## [17] "ASTRONOMICAL LOW TIDE" "AVALANCE"
## [19] "AVALANCHE" "BEACH EROSIN"
## [21] "Beach Erosion" "BEACH EROSION"
## [23] "BEACH EROSION/COASTAL FLOOD" "BEACH FLOOD"
## [25] "BELOW NORMAL PRECIPITATION" "BITTER WIND CHILL"
## [27] "BITTER WIND CHILL TEMPERATURES" "Black Ice"
## [29] "BLACK ICE" "BLIZZARD"
## [31] "BLIZZARD AND EXTREME WIND CHIL" "BLIZZARD AND HEAVY SNOW"
## [33] "Blizzard Summary" "BLIZZARD WEATHER"
## [35] "BLIZZARD/FREEZING RAIN" "BLIZZARD/HEAVY SNOW"
## [37] "BLIZZARD/HIGH WIND" "BLIZZARD/WINTER STORM"
## [39] "BLOW-OUT TIDE" "BLOW-OUT TIDES"
We will try to eliminate the most obvious redundancies by hand.
datared$EVTYPE <- trimws(datared$EVTYPE)
datared$EVTYPE <- toupper(dataraw$EVTYPE)
datared$EVTYPE <- gsub("WND", "WIND", datared$EVTYPE)
datared$EVTYPE <- gsub("WINDS", "WIND", datared$EVTYPE)
datared$EVTYPE <- gsub("WIND CH", "WIND CHIL", datared$EVTYPE)
datared$EVTYPE <- gsub("TSTM", "THUNDERSTORM", datared$EVTYPE)
datared$EVTYPE <- gsub("TIDES", "TIDE", datared$EVTYPE)
datared$EVTYPE <- gsub("AVALANCE", "AVALANCHE", datared$EVTYPE)
datared$EVTYPE <- gsub("EROSIN", "EROSION", datared$EVTYPE)
datared$EVTYPE <- gsub(" & ", "/", datared$EVTYPE)
datared$EVTYPE <- gsub("FIRES", "FIRE", datared$EVTYPE)
datared$EVTYPE <- gsub("WILDFIRES", "WILD FIRE", datared$EVTYPE)
datared$EVTYPE <- gsub("FLOODING", "FLOOD", datared$EVTYPE)
datared$EVTYPE <- gsub("CSTL", "COASTAL", datared$EVTYPE)
datared$EVTYPE <- gsub("COASTALFLOOD", "COASTAL FLOOD", datared$EVTYPE)
datared$EVTYPE <- gsub("COASTALSTORM", "COASTAL STORM", datared$EVTYPE)
datared$EVTYPE <- gsub("FUNNELS", "FUNNEL", datared$EVTYPE)
datared$EVTYPE <- gsub("TEMPERATURES", "TEMPERATURE", datared$EVTYPE)
datared$EVTYPE <- gsub("MUDSLIDES", "MUDSLIDE", datared$EVTYPE)
datared$EVTYPE <- gsub("THUNDERSTORMW", "THUNDERSTORM", datared$EVTYPE)
datared$EVTYPE <- gsub("THUNERSTORM", "THUNDERSTORM", datared$EVTYPE)
datared$EVTYPE <- gsub("THUNDERSTROM", "THUNDERSTORM", datared$EVTYPE)
datared$EVTYPE <- gsub("THUNDERTSORM", "THUNDERSTORM", datared$EVTYPE)
datared$EVTYPE <- gsub("THUNDERSTORMS", "THUNDERSTORM", datared$EVTYPE)
datared$EVTYPE <- gsub("HUNDERSTORMWINDS", "THUNDERSTORM WIND", datared$EVTYPE)
datared$EVTYPE <- gsub("TORNADOES", "TORNADO", datared$EVTYPE)
datared$EVTYPE <- gsub("TORNADOS", "TORNADO", datared$EVTYPE)
datared$EVTYPE <- gsub("WATERSPROUT", "WATER SPROUT", datared$EVTYPE)
types_num1 <- length(sort(unique(datared$EVTYPE)))
So, we could reduce the types from 985 to 827. This is still far too many compared to the 48 types defined in the Storm Data Documentation. However, it is beyond the scope of this analysis to do more cleaning. As we will focus on the few types with highest impact, we asume that the effects of these incoherencies will be limited. Still, when interpreting the results, we have to bear in mind that there is a probleme here.
Eventually, we get the years out of the BGN_DATE variable.
#datared$BGN_DATE <- as.Date(data1$BGN_DATE, format = "%d/%m/%Y")
datared$YEAR <- as.numeric(format(as.Date(datared$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))
Checking for NA
sum(is.na(datared))
## [1] 0
We don’t have to worry about missing values.
What are the main characteristics of the data?
str(datared)
## 'data.frame': 902297 obs. of 10 variables:
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ YEAR : num 1950 1950 1951 1951 1951 ...
head(datared)
## BGN_DATE STATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP
## 1 4/18/1950 0:00:00 AL TORNADO 0 15 25.0 K
## 2 4/18/1950 0:00:00 AL TORNADO 0 0 2.5 K
## 3 2/20/1951 0:00:00 AL TORNADO 0 2 25.0 K
## 4 6/8/1951 0:00:00 AL TORNADO 0 2 2.5 K
## 5 11/15/1951 0:00:00 AL TORNADO 0 2 2.5 K
## 6 11/15/1951 0:00:00 AL TORNADO 0 6 2.5 K
## CROPDMG CROPDMGEXP YEAR
## 1 0 1950
## 2 0 1950
## 3 0 1951
## 4 0 1951
## 5 0 1951
## 6 0 1951
Let’s see, how the the measurments are spread in time and space.
#Create a histogram for events by year.
yplot <- ggplot(data=datared, aes(datared$YEAR)) + geom_histogram(breaks=seq(1950, 2010, by = 0.5), col="red", fill="orange", alpha = .5) + labs(title="Weather Events in Time") + labs(x="YEAR", y="Measured Weather Events")
#Create a histogram for events by most affected states.
states <- count(datared, vars = "STATE")
states <- states[order(states$freq, decreasing = TRUE), ]
sttop <- states[1:30, ]
sttop$STATE <- factor(sttop$STATE, levels = sttop$STATE)
sttop$STATE <- factor(sttop$STATE, levels = rev(levels(sttop$STATE)))
stplot <- ggplot(sttop, aes(x=factor(STATE), y=freq))+ geom_bar(stat ="identity", fill="green") + ggtitle("30 States most affected by Weather Events") + xlab("States") + ylab("Number of Weather Events") + scale_y_continuous(breaks = seq(0,80000, by = 10000))
grid.arrange(yplot, stplot, ncol=1, bottom="Figure 1: Extreme Weather Events in Time and Space")
From figure 1 we see that the majority of the measurments falls within the period of 1995/2011 and that Texas is the state by far most affected by extreme weather events, followed by Kansas and Oklahoma.
How does the space component looks like on a map? We display the frequency of events by State on a map of the US core territory.
We will use Googel maps. The latitudes and longitudes given in the Storm data set do not match the measures of Google maps, however. So, as a approximate subsitute, we will just locate the events by state. For this we will download the latitude and longitude from the website Average Latitude and Longitude for US States.
#Get longitude/latitude of US States from http://dev.maxmind.com/geoip/legacy/codes/state_latlon/
if (!file.exists("./ds-project-52/state_latlon.csv")) {
getfile <- "https://dev.maxmind.com/static/csv/codes/state_latlon.csv"
download.file(getfile, destfile = "./ds-project-52/state_latlon.csv")
}
#Read in the csv-file.
statelalo <- read.csv("./ds-project-52/state_latlon.csv", header = TRUE, stringsAsFactors = FALSE)
names(statelalo) <- c("STATE", "LAT", "LON")
#Merge dataframes
stateP <- merge(states, statelalo, by = "STATE")
#Check for NAs
sum(is.na(stateP))
## [1] 0
#Create a simple map representing the frequency of weather events by state.
map <- get_map(location = "USA", zoom = 4)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=USA&zoom=4&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=USA&sensor=false
mapPoints <- ggmap(map) + geom_point(aes(x = LON, y = LAT, size = freq), data = stateP, alpha = .5) + ggtitle("Map 1: Extreme Weather Events in the US") + scale_size("Frequency")
mapPoints
## Warning: Removed 5 rows containing missing values (geom_point).
#A warning is shown that 5 rows have been removed. This is because these rows concerne points outside the US mainland.
The map shows that there are more extreme weather events in teh eastern than the western part of the US and the the central south region is most affected.
Question: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
The variables FATALITIES and INJURIES are relevant for answering this question.
#Aggregate FATALITIES and INJURIES to variable on population health.
datared$PHEALTH <- datared$FATALITIES + datared$INJURIES
phealth <- aggregate (PHEALTH~EVTYPE, datared, sum)
phealth <- phealth[order(phealth$PHEALTH, decreasing=TRUE), ]
unname(unlist(subset(phealth, PHEALTH >= 20, select = PHEALTH)))
## [1] 96979 10055 8428 7267 6046 3037 2782 2064 1722 1527 1376
## [12] 1339 1148 986 906 796 600 557 551 501 462 431
## [23] 412 398 395 393 360 349 260 251 223 162 153
## [34] 149 143 111 107 107 107 101 100 90 90 86
## [45] 78 68 53 52 51 51 48 45 45 40 37
## [56] 36 36 36 36 32 32 31 30 29 28 28
## [67] 27 25 25 23 23 22
We see that the impact decreases sharply within the first few cases. In order to detect the most harmful types we can concentrate on the first 10 cases without loosing too much information.
fatality <- aggregate (FATALITIES~EVTYPE, datared, sum)
fatality <- fatality[order(fatality$FATALITIES, decreasing=TRUE), ]
injury <- aggregate (INJURIES~EVTYPE, datared, sum)
injury <- injury[order(injury$INJURIES, decreasing=TRUE), ]
#We want to use ggplot2 for plotting. This requires the EVTYPE variable to be set to factor, so that ggplot arranges the events in decreasing order.
phtop <- phealth[1:20, ]
phtop$EVTYPE <- factor(phtop$EVTYPE, levels = phtop$EVTYPE)
phtop$EVTYPE <- factor(phtop$EVTYPE, levels = rev(levels(phtop$EVTYPE)))
fataltop <- fatality[1:10, ]
fataltop$EVTYPE <- factor(fataltop$EVTYPE , levels = fataltop$EVTYPE)
fataltop$EVTYPE <- factor(fataltop$EVTYPE, levels = rev(levels(fataltop$EVTYPE)))
injtop <- injury[1:10, ]
injtop$EVTYPE <- factor(injtop$EVTYPE , levels = injtop$EVTYPE)
injtop$EVTYPE <- factor(injtop$EVTYPE, levels = rev(levels(injtop$EVTYPE)))
Creating figure of most harmful events for population health.
phplot <- ggplot(phtop, aes(x=factor(EVTYPE), y=PHEALTH)) + geom_bar(stat ="identity", fill="blue") + ggtitle("Figure 2: Number of Victims : Top 20 Events") + xlab("Wheater Events") + ylab("Total public health damage (number of victims") + coord_flip() + scale_y_continuous(breaks = seq(0,90000, by = 10000))
phplot
fatalplot <- ggplot(fataltop, aes(x=factor(EVTYPE), y=FATALITIES))+ geom_bar(stat ="identity", fill="black") + ggtitle("Fatalities: Top 10 Events") + xlab("Wheater Events") + ylab("Total number of Fatalities") + coord_flip() + scale_y_continuous(breaks = seq(0,5500, by = 500))
injplot <- ggplot(injtop, aes(x=factor(EVTYPE), y=INJURIES))+ geom_bar(stat ="identity", fill="green") + ggtitle("Injuries: Top 10 Events") + xlab("Wheater Events") + ylab("Total number of Injuries") + coord_flip() +scale_y_continuous(breaks = seq(0,90000, by = 10000))
grid.arrange(injplot, fatalplot, ncol=1, bottom="Figure 3: Victims of Extreme Weahter Events broken down by Fatalities and Injuries")
The figures show that tornados are by far the most dangerous weather events for public health, in total and broken down by fatalities and injuries.
Question: Across the United States, which types of events have the greatest economic consequences?
We will calculate the economic damages by aggregating property and crop damages. The values of these damages are each represented by two variabels, one recording a number rounded to three digits, the other indicating the magnitude of the values with letters (e.g. “K” for thousands, “M” for millions, and “B” for billions. In order to do any calculation, we will have to convert these letters into their numeric value. The product of the number and the converted value variables will be used as a new new variable for calculating economic damages.
#Transform the magnitude information into numeric.
#Create a vector with the magnitude indications.
dimprop <- unique(datared$PROPDMGEXP)
dimcrop <- unique(datared$CROPDMGEXP)
dim <- sort(unique(c(dimprop, dimcrop)))
dim
## [1] "" "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "k"
## [18] "K" "m" "M"
#Create a numerical vector with the respective magnitude values.
dimnum <- c(1, 0, 0, 0, 1, 10, 10^2, 10^3, 10^4, 10^5, 10^6, 10^7, 10^8, 10^9, 10^2, 10^2, 10^3, 10^3, 10^6, 10^6)
#Create exponential variables for PROPDMG and CROPDMG
for (i in 1:length(dim)) {
datared$PROPEXP[datared$PROPDMGEXP == dim[i]] <- dimnum[i]
datared$CROPEXP[datared$CROPDMGEXP == dim[i]] <- dimnum[i]
}
datared$PROPDMG_VAL <- datared$PROPDMG * datared$PROPEXP
datared$CROPDMG_VAL <- datared$CROPDMG * datared$CROPEXP
datared$ECONDMG <- datared$PROPDMG_VAL + datared$CROPDMG_VAL
Now, we can calculate the economic damage (property and crop damages) by type of event based on the created variable ECONDMG.
econdmg <- aggregate(ECONDMG~EVTYPE, datared, sum)
econdmg <- econdmg[order(econdmg$ECONDMG, decreasing = TRUE), ]
unname(unlist(subset(econdmg, ECONDMG >= 10^8, select = ECONDMG)))
## [1] 150437379757 71913712800 57362334887 43323541000 18761221986
## [6] 18566870733 15018672000 14610229010 11072776564 10292723500
## [11] 8967041360 8382236550 6715441251 6557661893 5161586800
## [16] 4642038000 3191846000 3108658330 2500000000 1602500000
## [21] 1427647890 1380710400 1224560000 1104666000 1067412240
## [26] 942471520 771273950 624100000 601055000 500155700
## [31] 456930000 403258500 394110000 392592060 344613000
## [36] 304230000 274930006 269095007 247627740 241000000
## [41] 144082000 142000000 117500000 112800000 110000000
## [46] 109427250 105000000
We see that we can focus on the first 20 cases.
#Top 20 total economic damage events.
econtop <- econdmg[1:20, ]
#Top 10 property damage events.
propdmg <- aggregate (PROPDMG_VAL~EVTYPE, datared, sum)
propdmg <- propdmg[order(propdmg$PROPDMG_VAL, decreasing = TRUE), ]
proptop <- propdmg[1:10, ]
#Top 10 crop damage events.
cropdmg <- aggregate (CROPDMG_VAL~EVTYPE, datared, sum)
cropdmg <- cropdmg[order(cropdmg$CROPDMG_VAL, decreasing = TRUE), ]
croptop <- cropdmg[1:10, ]
#Set EVTYPE variable to factor, so that ggplot arranges the events in decreasing order.
econtop$EVTYPE <- factor(econtop$EVTYPE , levels = econtop$EVTYPE)
econtop$EVTYPE <- factor(econtop$EVTYPE, levels = rev(levels(econtop$EVTYPE)))
proptop$EVTYPE <- factor(proptop$EVTYPE , levels = proptop$EVTYPE)
proptop$EVTYPE <- factor(proptop$EVTYPE, levels = rev(levels(proptop$EVTYPE)))
croptop$EVTYPE <- factor(croptop$EVTYPE , levels = croptop$EVTYPE)
croptop$EVTYPE <- factor(croptop$EVTYPE, levels = rev(levels(croptop$EVTYPE)))
#Create figure 3 with panel plot for total economic, property and crop damages.
econplot <- ggplot(data=econtop, aes(x=EVTYPE, y=ECONDMG/10^9)) +
geom_bar(stat ="identity", fill = "blue") + ggtitle("Figure 4: Total Economic Damage: Top 20 Events") +
xlab("Wheater Events") + ylab("Total Damage in billion USD") + coord_flip() +
scale_y_continuous(breaks = seq(0,150, by = 20))
econplot
propplot <- ggplot(data=proptop, aes(x=EVTYPE, y=PROPDMG_VAL/10^9)) +
geom_bar(stat ="identity", fill = "red") + ggtitle("Property Damage: Top 10 Events") +
xlab("Wheater Events") + ylab("Total Damage in billion USD") + coord_flip() +
scale_y_continuous(breaks = seq(0,150, by = 20))
cropplot <- ggplot(data=croptop, aes(x=EVTYPE, y=CROPDMG_VAL/10^9)) +
geom_bar(stat ="identity", fill="green" ) + ggtitle("Crop Damage: Top 10 Events") +
xlab("Wheater Events") + ylab("Total Damage in billion USD") + coord_flip() +
scale_y_continuous(breaks = seq(0,15, by = 2))
grid.arrange(propplot, cropplot, ncol=1, bottom="Figure 5: Events with greatest economic impact (Property, Crop")
The figures show that flood, hurricane/typhoon, tornado and storm surge are the four most damaging events in economic terms. The economic damages are dominated by the property damages, crop damages count only for a relatively small part. For crops drought is the most harmful event.
From figures 2 and 3 we can infer that floods and tornados are by far the most harmful weather events. While tornados were responsible for most human victims, floods caused the highest economic damage. Moreover, both of these top events, were high ranked also in the other domain (tornado 3rd in the economic damage ranking, flood 4th in the public health ranking.