Author: Philip Abraham
Date: September 3, 2016
The aim of this report is to address the question of which types of weather events in the United States (U.S.) cause the greatest harm to the population health and have the greatest economic consequences.
This assignment involved analyzing the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States including, when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The NOAA database listed weather event information starting in the Year 1950 and ending in November 2011.
From the given data, it is found that, cumulatively across the U.S., tornadoes caused the greatest harm to the human health resulting in about 97,000 fatalities and/or injuries.
Floods resulted in the highest property and/or crop damage in the U.S., with a total incurred cost of about $178 billion.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The dataset was obtained from the course web site: StormData.
Loading the Data (i.e. read.csv()).
# https (Secure) URL to the repdata%2Fdata%2FStormData.csv.bz2 file.
require("readr") || install.packages("readr")
## Loading required package: readr
## [1] TRUE
library(readr)
url_csv <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url_csv, destfile= "repdata%2Fdata%2FStormData.csv.bz2")
dateDownloaded <- date()
dateDownloaded
## [1] "Fri Sep 02 20:07:00 2016"
# read csv file into memory
storm <- read.csv("repdata%2Fdata%2FStormData.csv.bz2",stringsAsFactors=FALSE)
dim(storm)
## [1] 902297 37
names(storm)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
Processed/transformed the data into a format suitable for the analysis.
As shown above, this downloaded NOAA storm database contains 902297 observations with 37 variables. Since the project objective is to discover the weather events causing the most damage to the U.S. population, in terms of health and financial loss, the storm dataset was subsetted to show only a few variables required for the data analysis to meet the project objectives.
To reduce the computer time and memory usage for the data processing, the dataset was further subsetted to reflect only values greater than zero for population health damage counts and economic cost values.
There were also quite a few weather event types in the dataset that were consolidated to maintain naming consistency for weather events.
The PROPDMG and CROPDMG columns in the raw NOAA dataset contains the damage values in dollars, and the PROPDMGEXP and CROPDMGEXP listed the standard K (thousand), M(million) or B(billion) unit designations. In the processed dataset, the PROPDMG and CROPDMG columns values given were converted to the actual values. For example, if a row had a value of 25 in the PROPDMG column, and a “K” designation for that row in the PROPDMGEXP column, then the assigned cost value for that row for PROPDMG is 25 x1000 =$25000, replacing the 25.
Note that a few of the property and crop damage amounts in the raw dataset displayed units that were not of the standard K, M or B unit designation. For those particular rows in the dataset, the unknown units were replaced with a “?” character. The programming strategy was to use “as is” the values for the property and crop damages in these “?” designated rows. For example, if a row had a value of 25 in the PROPDMG column, and a “?” designation for that row in the PROPDMGEXP column, then the assigned cost value for that row for PROPDMG is 25x1 =$25, in essence keeping the same value as before. This might introduce some errors in the final cost tally, but the number of rows replaced with the “?” were considerably smaller than the overall number of rows in the dataset.
In the final processed dataset, the fatality and injury counts were added together to form a single value for each row representing the population health damage count per each weather event. Similarily, the sum of the property and crop damage costs were generated to represent the economic losses for each weather event by a single value for each row. For analysis purposes, new columns were added to the dataframe to better reflect the given population health and economic damage cost estimates in the dataset.
Population Health Damage Computations.
## Population Health Damage
# subset dataframe for event type and population health variables
storm_health <- storm[,c("EVTYPE","FATALITIES", "INJURIES")]
# subset further to not include zero for fatalities and injuries
storm_health <- subset(storm_health, storm_health$FATALITIES>0 |
storm_health$INJURIES>0)
# Clean up datasets from duplicates
#converts all event type column values to upper case to get rid of duplicates
storm_health$EVTYPE <- toupper(storm_health$EVTYPE)
# clean up duplicate weather events
storm_health$EVTYPE[storm_health$EVTYPE == "EXCESSIVE HEAT"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "HEAT"]
storm_health$EVTYPE[storm_health$EVTYPE == "HEAT WAVE"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "HEAT"]
storm_health$EVTYPE[storm_health$EVTYPE == "EXTREME HEAT"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "HEAT"]
storm_health$EVTYPE[storm_health$EVTYPE == "RECORD HEAT"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "HEAT"]
storm_health$EVTYPE[storm_health$EVTYPE == "TSTM WIND"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "THUNDERSTORM WIND"]
storm_health$EVTYPE[storm_health$EVTYPE == "THUNDERSTORM WINDS"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "THUNDERSTORM WIND"]
storm_health$EVTYPE[storm_health$EVTYPE == "FLASH FLOOD"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "FLOOD"]
storm_health$EVTYPE[storm_health$EVTYPE == "URBAN/SML STREAM FLD"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "FLOOD"]
storm_health$EVTYPE[storm_health$EVTYPE == "ICE STORM"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "WINTER STORM"]
storm_health$EVTYPE[storm_health$EVTYPE == "HEAVY SNOW"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "WINTER STORM"]
storm_health$EVTYPE[storm_health$EVTYPE == "BLIZZARD"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "WINTER STORM"]
storm_health$EVTYPE[storm_health$EVTYPE == "WILD/FOREST FIRE"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "WILDFIRE"]
storm_health$EVTYPE[storm_health$EVTYPE == "WILD FIRES"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "WILDFIRE"]
storm_health$EVTYPE[storm_health$EVTYPE == "WIND"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "HIGH WIND"]
storm_health$EVTYPE[storm_health$EVTYPE == "HIGH WINDS"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "HIGH WIND"]
storm_health$EVTYPE[storm_health$EVTYPE == "STRONG WIND"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "HIGH WIND"]
storm_health$EVTYPE[storm_health$EVTYPE == "HURRICANE"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "HURRICANE/TYPHOON"]
storm_health$EVTYPE[storm_health$EVTYPE == "DENSE FOG"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "FOG"]
storm_health$EVTYPE[storm_health$EVTYPE == "RIP CURRENT"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "RIP CURRENTS"]
storm_health$EVTYPE[storm_health$EVTYPE == "EXTREME COLD/WIND CHILL"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "EXTREME COLD"]
storm_health$EVTYPE[storm_health$EVTYPE == "COLD/WIND CHILL"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "EXTREME COLD"]
storm_health$EVTYPE[storm_health$EVTYPE == "HEAVY SURF/HIGH SURF"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "HIGH SURF"]
storm_health$EVTYPE[storm_health$EVTYPE == "HEAVY SURF"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "HIGH SURF"]
storm_health$EVTYPE[storm_health$EVTYPE == "TROPICAL STORM GORDON"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "TROPICAL STORM"]
storm_health$EVTYPE[storm_health$EVTYPE == "WINTER WEATHER/MIX"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "WINTER WEATHER"]
storm_health$EVTYPE[storm_health$EVTYPE == "WINTRY MIX"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "WINTER WEATHER"]
storm_health$EVTYPE[storm_health$EVTYPE == "WINTER WEATHER MIX"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "WINTER WEATHER"]
storm_health$EVTYPE[storm_health$EVTYPE == "COLD"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "EXTREME COLD"]
storm_health$EVTYPE[storm_health$EVTYPE == "EXCESSIVE RAINFALL"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "HEAVY RAIN"]
storm_health$EVTYPE[storm_health$EVTYPE == "HEAVY RAINS"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "HEAVY RAIN"]
storm_health$EVTYPE[storm_health$EVTYPE == "TORRENTIAL RAINFALL"] <-
storm_health$EVTYPE[storm_health$EVTYPE == "HEAVY RAIN"]
# split the population health variables by event type combining fatalities and injuries
# in each row
storm_health_a<-aggregate(storm_health$FATALITIES+storm_health$INJURIES~
storm_health$EVTYPE, storm_health,sum)
# change column names
colnames(storm_health_a) <- c("Event_Type", "HealthDamage_Qty")
# sort dataset in decreasing order for health damage numbers
rank <- order(storm_health_a$HealthDamage_Qty, decreasing = TRUE)
storm_health_a <- storm_health_a[rank,]
Economic Consequences Computations.
## Economic Consequences
# subset dataframe for event type and economic consequences variables
storm_econ <- storm[,c("EVTYPE","PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
# subset further to not include zero expense rows for property and crops
storm_econ <- subset(storm_econ, storm_econ$PROPDMG>0 |
storm_econ$CROPDMG>0)
# cleaning up data
#converts all event type column values to upper case to get rid of duplicates
storm_econ$EVTYPE <- toupper(storm_econ$EVTYPE)
unique(storm_econ$PROPDMGEXP)
## [1] "K" "M" "B" "m" "" "+" "0" "5" "6" "4" "h" "2" "7" "3" "H" "-"
# convert all PROPDMGEXP column values to upper case to get rid of duplicates
storm_econ$PROPDMGEXP <- toupper(storm_econ$PROPDMGEXP)
# replace "" "+" "0" "5" "6" "4" "H" "2" "7" "3" "-" with "?"
l<-length(storm_econ$PROPDMGEXP)
for(i in 1:l){
if(storm_econ$PROPDMGEXP[i]==""){
storm_econ$PROPDMGEXP[i]<-"?"
}else if(storm_econ$PROPDMGEXP[i]=="+"){
storm_econ$PROPDMGEXP[i]<-"?"
}else if(storm_econ$PROPDMGEXP[i]=="0"){
storm_econ$PROPDMGEXP[i]<-"?"
}else if(storm_econ$PROPDMGEXP[i]=="5"){
storm_econ$PROPDMGEXP[i]<-"?"
}else if(storm_econ$PROPDMGEXP[i]=="6"){
storm_econ$PROPDMGEXP[i]<-"?"
}else if(storm_econ$PROPDMGEXP[i]=="4"){
storm_econ$PROPDMGEXP[i]<-"?"
}else if(storm_econ$PROPDMGEXP[i]=="H"){
storm_econ$PROPDMGEXP[i]<-"?"
}else if(storm_econ$PROPDMGEXP[i]=="2"){
storm_econ$PROPDMGEXP[i]<-"?"
}else if(storm_econ$PROPDMGEXP[i]=="7"){
storm_econ$PROPDMGEXP[i]<-"?"
}else if(storm_econ$PROPDMGEXP[i]=="3"){
storm_econ$PROPDMGEXP[i]<-"?"
}else if(storm_econ$PROPDMGEXP[i]=="-"){
storm_econ$PROPDMGEXP[i]<-"?"}
}
unique(storm_econ$PROPDMGEXP)
## [1] "K" "M" "B" "?"
unique(storm_econ$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k"
# convert all CROPDMGEXP column values to upper case to get rid of duplicates
storm_econ$CROPDMGEXP <- toupper(storm_econ$CROPDMGEXP)
# replace "0" & "" with "?"
l<-length(storm_econ$CROPDMGEXP)
for(i in 1:l){
if(storm_econ$CROPDMGEXP[i]=="0"){
storm_econ$CROPDMGEXP[i]<-"?"
}else if(storm_econ$CROPDMGEXP[i]==""){
storm_econ$CROPDMGEXP[i]<-"?"}
}
unique(storm_econ$CROPDMGEXP)
## [1] "?" "M" "K" "B"
# convert values in "PROPDMG" and "CROPDMG" to appropriate amounts based on the
# "PROPDMGEXP" & "CROPDMGEXP" entries.
storm_econ_rev <- storm_econ
for(i in 1:l){
if(storm_econ_rev$PROPDMGEXP[i]=="K"){
storm_econ_rev$PROPDMG[i]<-1000*storm_econ_rev$PROPDMG[i]
}else if(storm_econ_rev$PROPDMGEXP[i]=="M"){
storm_econ_rev$PROPDMG[i]<-1000000*storm_econ_rev$PROPDMG[i]
}else if(storm_econ_rev$PROPDMGEXP[i]=="B"){
storm_econ_rev$PROPDMG[i]<-1000000000*storm_econ_rev$PROPDMG[i]
}else if(storm_econ_rev$PROPDMGEXP[i]=="?"){
storm_econ_rev$PROPDMG[i]<-1*storm_econ_rev$PROPDMG[i]}
}
for(i in 1:l){
if(storm_econ_rev$CROPDMGEXP[i]=="K"){
storm_econ_rev$CROPDMG[i]<-1000*storm_econ_rev$CROPDMG[i]
}else if(storm_econ_rev$CROPDMGEXP[i]=="M"){
storm_econ_rev$CROPDMG[i]<-1000000*storm_econ_rev$CROPDMG[i]
}else if(storm_econ_rev$CROPDMGEXP[i]=="B"){
storm_econ_rev$CROPDMG[i]<-1000000000*storm_econ_rev$CROPDMG[i]
}else if(storm_econ_rev$CROPDMGEXP[i]=="?"){
storm_econ_rev$CROPDMG[i]<-1*storm_econ_rev$CROPDMG[i]}
}
storm_econ_rev <- storm_econ_rev[, c(1,2,4)]
head(storm_econ_rev,3)
## EVTYPE PROPDMG CROPDMG
## 1 TORNADO 25000 0
## 2 TORNADO 2500 0
## 3 TORNADO 25000 0
# clean up duplicate weather events
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "FLASH FLOOD"] <-
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "FLOOD"]
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "RIVER FLOOD"] <-
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "FLOOD"]
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "HURRICANE"] <-
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "HURRICANE/TYPHOON"]
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "HURRICANE OPAL"] <-
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "HURRICANE/TYPHOON"]
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "HURRICANE ERIN"] <-
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "HURRICANE/TYPHOON"]
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "TYPHOON"] <-
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "HURRICANE/TYPHOON"]
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "TSTM WIND"] <-
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "THUNDERSTORM WIND"]
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "THUNDERSTORM WINDS"] <-
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "THUNDERSTORM WIND"]
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "ICE STORM"] <-
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "WINTER STORM"]
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "WILD/FOREST FIRE"] <-
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "WILDFIRE"]
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "WILD FIRES"] <-
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "WILDFIRE"]
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "STORM SURGE/TIDE"] <-
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "STORM SURGE"]
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "HIGH WINDS"] <-
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "HIGH WIND"]
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "BLIZZARD"] <-
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "WINTER STORM"]
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "SEVERE THUNDERSTORM"] <-
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "HEAVY RAIN/SEVERE WEATHER"]
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "TORNADOES, TSTM WIND, HAIL"] <-
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "HEAVY RAIN/SEVERE WEATHER"]
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "FREEZE"] <-
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "FROST/FREEZE"]
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "HEAT"] <-
storm_econ_rev$EVTYPE[storm_econ_rev$EVTYPE == "EXCESSIVE HEAT"]
# split the economic variables by event type
storm_econ_rev_a<-aggregate(storm_econ_rev$PROPDMG+storm_econ_rev$CROPDMG~storm_econ_rev$EVTYPE,storm_econ_rev,sum)
# change column names
colnames(storm_econ_rev_a) <- c("Event_Type", "DamageCost")
# sort dataset in decreasing order for damage costs
rank2 <- order(storm_econ_rev_a$DamageCost, decreasing = TRUE)
storm_econ_rev_a <- storm_econ_rev_a[rank2,]
# total of all population health damage numbers
options(scipen=999) # removes scientific notation in values
health_tot <- sum(storm_health_a$HealthDamage_Qty)
There were 155673 fatalities and/or injuries caused by weather events in the U.S. for the years 1950 to 2011.
# % break-down of population health damage numbers
require("scales") || install.packages("scales")
## Loading required package: scales
##
## Attaching package: 'scales'
## The following objects are masked from 'package:readr':
##
## col_factor, col_numeric
library(scales)
storm_health_a$percent_of_total <- percent((storm_health_a$HealthDamage_Qty)/health_tot)
Shown below is the summary statistics for the population damage counts.
summary(storm_health_a)
## Event_Type HealthDamage_Qty percent_of_total
## Length:174 Min. : 1.0 Length:174
## Class :character 1st Qu.: 1.0 Class :character
## Mode :character Median : 4.0 Mode :character
## Mean : 894.7
## 3rd Qu.: 26.5
## Max. :96979.0
quantile(storm_health_a$HealthDamage_Qty,c(.1,.25,.5,.75,.9,.95,.98,.99,1))
## 10% 25% 50% 75% 90% 95% 98% 99%
## 1.00 1.00 4.00 26.50 389.80 1400.50 8210.32 10714.46
## 100%
## 96979.00
storm_health_a <- subset(storm_health_a, storm_health_a$HealthDamage_Qty>=
quantile(storm_health_a$HealthDamage_Qty,c(.9)))
For the period 1950 to 2011, the weather event that causes the greatest harm to human health in the United States is a tornado. There were 96979 fatalities and/or injuries caused by tornadoes. Tornadoes made up of 62.3% of all fatalities and/or injuries due to weather related events in the United States.
require("knitr") || install.packages("knitr")
## Loading required package: knitr
library(knitr)
kable(storm_health_a[1,])
| Event_Type | HealthDamage_Qty | percent_of_total | |
|---|---|---|---|
| 148 | TORNADO | 96979 | 62.3% |
Selecting only the 90% percentile and higher health damage quantities, the top weather events causing the greatest harm to human health are:
kable(storm_health_a)
| Event_Type | HealthDamage_Qty | percent_of_total | |
|---|---|---|---|
| 148 | TORNADO | 96979 | 62.3% |
| 52 | HEAT | 12319 | 7.9% |
| 32 | FLOOD | 10121 | 6.5% |
| 138 | THUNDERSTORM WIND | 10054 | 6.5% |
| 94 | LIGHTNING | 6046 | 3.9% |
| 171 | WINTER STORM | 5645 | 3.6% |
| 68 | HIGH WIND | 2214 | 1.4% |
| 168 | WILDFIRE | 1696 | 1.1% |
| 82 | HURRICANE/TYPHOON | 1446 | 0.9% |
| 50 | HAIL | 1376 | 0.9% |
| 37 | FOG | 1156 | 0.7% |
| 117 | RIP CURRENTS | 1101 | 0.7% |
| 25 | EXTREME COLD | 735 | 0.5% |
| 174 | WINTER WEATHER | 677 | 0.4% |
| 22 | DUST STORM | 462 | 0.3% |
| 152 | TROPICAL STORM | 449 | 0.3% |
| 64 | HIGH SURF | 398 | 0.3% |
| 2 | AVALANCHE | 394 | 0.3% |
2-D histogram showing the Base-10 Log of population health damage counts per weather events.
# Install ggplot2 package for plotting
require("ggplot2") || install.packages("ggplot2")
## Loading required package: ggplot2
library(ggplot2)
ggplot(storm_health_a, aes(reorder(Event_Type,log10(HealthDamage_Qty)),
log10(HealthDamage_Qty))) +
stat_bin2d(bins = 11, colour = "white")+labs(x="Weather Event",
y="Log10(Total Number of People Killed and/or Injured)",
title="Weather Events Causing the Highest
Harm to Population Health for the Years 1950 to 2011") +
theme(axis.text.x=element_text(angle=90, size=10, vjust=0.5)) +
theme(legend.position="none") +
theme(panel.background = element_rect(fill = 'black'))
# total of all damage costs
damage_tot <- sum(storm_econ_rev_a$DamageCost)
The total damage cost to property and crops inflicted caused by weather events in the United States for the years 1950 to 2011 is $476422842480.
# % break-down of costs
storm_econ_rev_a$percent_of_total <- percent((storm_econ_rev_a$DamageCost)/damage_tot)
Shown below is the summary statistics for the damage costs.
summary(storm_econ_rev_a)
## Event_Type DamageCost percent_of_total
## Length:379 Min. : 0 Length:379
## Class :character 1st Qu.: 12750 Class :character
## Mode :character Median : 200000 Mode :character
## Mean : 1257052355
## 3rd Qu.: 5000000
## Max. :178030211924
quantile(storm_econ_rev_a$DamageCost,c(.1,.25,.5,.75,.9,.95,.98,.99,1))
## 10% 25% 50% 75% 90%
## 3000 12750 200000 5000000 76219040
## 95% 98% 99% 100%
## 325052989 9704214708 25183840166 178030211924
storm_econ_rev_a <- subset(storm_econ_rev_a, storm_econ_rev_a$DamageCost >=
quantile(storm_econ_rev_a$DamageCost, c(.95)))
For the period 1950 to 2011, the weather event that causes the most economic loss in the United States is the flood.
A total of $178030211924 in property and/or crop damages were caused by floods. Floods made up of 37.4% of all property and/or crop damages in the United States due to weather related events.
kable(storm_econ_rev_a[1,])
| Event_Type | DamageCost | percent_of_total | |
|---|---|---|---|
| 64 | FLOOD | 178030211924 | 37.4% |
Selecting only the 95% percentile and higher damage costs, the top weather events that result in the greatest economic consequences are:
kable(storm_econ_rev_a)
| Event_Type | DamageCost | percent_of_total | |
|---|---|---|---|
| 64 | FLOOD | 178030211924 | 37.4% |
| 172 | HURRICANE/TYPHOON | 90710952810 | 19.0% |
| 312 | TORNADO | 57352114049 | 12.0% |
| 262 | STORM SURGE | 47965579000 | 10.1% |
| 98 | HAIL | 18758221521 | 3.9% |
| 373 | WINTER STORM | 16453756561 | 3.5% |
| 34 | DROUGHT | 15018672000 | 3.2% |
| 273 | THUNDERSTORM WIND | 10863543990 | 2.3% |
| 365 | WILDFIRE | 8793313130 | 1.8% |
| 320 | TROPICAL STORM | 8382236550 | 1.8% |
| 153 | HIGH WIND | 6557661943 | 1.4% |
| 123 | HEAVY RAIN/SEVERE WEATHER | 5308060000 | 1.1% |
| 84 | FROST/FREEZE | 1561596000 | 0.3% |
| 119 | HEAVY RAIN | 1427647890 | 0.3% |
| 47 | EXTREME COLD | 1380710400 | 0.3% |
| 129 | HEAVY SNOW | 1067242242 | 0.2% |
| 196 | LIGHTNING | 940751537 | 0.2% |
| 43 | EXCESSIVE HEAT | 903414200 | 0.2% |
| 187 | LANDSLIDE | 344613000 | 0.1% |
2-D histogram showing the Base-10 Log of the top damage costs and the associated weather events.
# See the damage costs by 2d histograms
ggplot(storm_econ_rev_a, aes(reorder(Event_Type,log10(DamageCost)),
log10(DamageCost))) +
stat_bin2d(bins = 11, colour = "white")+labs(x="Weather Event",
y="Log10(Damage Costs-Property and/or Crops)",
title="Weather Events Causing the Greatest
Economic Consequences for the Years 1950 to 2011") +
theme(axis.text.x=element_text(angle=90, size=10, vjust=0.5)) +
theme(legend.position="none") +
theme(panel.background = element_rect(fill = 'black'))