In this report we aim to investigate which weather events cause the greatest damages, we focused our study on the degree of harm to population and the economic consequences. To perform this investigation, we obtained data about major storms and weather events in the United States between 1950 and the end of November 2011 from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database which includes estimates of any fatalities, injuries, and property damage. From this data we found that across the U.S. Tornadoes have caused the greatest damage to human population with respect to fatalities and total injuries. Regarding the effects of these events on the economy, floods had the greatest consequences to property, while drought did the most harm to crops. However, it was floods that caused the greatest economic consequences overall.
We obtained major storms and weather events data in the United States between 1950 and the end of November 2011 from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.
First, we define constants for the data URL and the name and path of the file that will be downloaded
dataURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
dataDir <- "data"
filePath <- file.path(dataDir, "StormData.csv.bz2")
Then, we check if the data is already loaded, and only if it is not,
we create a new directory (if it is not already created) and download
the data to it. We then read the compressed csv data (.csv.bz2) using
read.csv() because it can read compressed data.
if(!exists('data') || !is.data.frame(get('data'))) {
if(!file.exists(filePath)) {
if(!dir.exists(dataDir)) dir.create(dataDir)
download.file(dataURL, filePath, method = "curl")
}
data <- read.csv(filePath)
}
head(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
In order to measure the sum of injuries for each type of event, we
use tapply() to get a vector of the total number of
injuries for each weather event type.
injuries <- tapply(data$INJURIES, data$EVTYPE, sum)
Then we explore this injuries vector sorted in a
descending order to determine which events caused the most amount of
injuries.
head(sort(injuries, decreasing = TRUE), 10)
## TORNADO TSTM WIND FLOOD EXCESSIVE HEAT
## 91346 6957 6789 6525
## LIGHTNING HEAT ICE STORM FLASH FLOOD
## 5230 2100 1975 1777
## THUNDERSTORM WIND HAIL
## 1488 1361
From this data, we can see that only 5 types of events caused more than 5000 injuries.
Using the same method we used to calculate injuries per event type we
get the vector fatalities which we can use to explore the
most amount of fatalities per event type:
fatalities <- tapply(data$FATALITIES, data$EVTYPE, sum)
head(sort(fatalities, decreasing = TRUE), 10)
## TORNADO EXCESSIVE HEAT FLASH FLOOD HEAT LIGHTNING
## 5633 1903 978 937 816
## TSTM WIND FLOOD RIP CURRENT HIGH WIND AVALANCHE
## 504 470 368 248 224
From this data, we can see that only 5 types of events caused more than 800 injuries.
However, what we are really interested in finding out is the total
amount of human population damage that is caused by weather events. So
we sum the 2 previous vectors injuries and
fatalities to get total population damage.
populationDMG <- injuries + fatalities
head(sort(populationDMG, decreasing = TRUE), 10)
## TORNADO EXCESSIVE HEAT TSTM WIND FLOOD
## 96979 8428 7461 7259
## LIGHTNING HEAT FLASH FLOOD ICE STORM
## 6046 3037 2755 2064
## THUNDERSTORM WIND WINTER STORM
## 1621 1527
From this data, we can see that the same top 5 event types that caused the greatest number of injuries, caused the most harm to population overall. We plot this data in a bar plot:
par(oma = c(2, 0, 0, 0))
barplot(populationDMG[populationDMG > 5000])
mtext("Total Harm to Population (Injuries + Fatalities) by Weather Event Type", side = 1, line = 1, outer = T)
box("inner")
We can see that tornadoes caused the most harm to human population with 96 thousand affected humans, followed by excessive heat, TSTM wind, floods and lightning which caused 8.4K, 7.4K, 7.2K, 6K humans affected respectively.
Effects on the economy are measured by estimated property damage and crop damage.
head(data[,c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")], 10)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 25.0 K 0
## 2 TORNADO 2.5 K 0
## 3 TORNADO 25.0 K 0
## 4 TORNADO 2.5 K 0
## 5 TORNADO 2.5 K 0
## 6 TORNADO 2.5 K 0
## 7 TORNADO 2.5 K 0
## 8 TORNADO 2.5 K 0
## 9 TORNADO 25.0 K 0
## 10 TORNADO 25.0 K 0
Looking at this data, we can tell that the columns
PROPDMG and CROPDMG have only the significant
digits of the estimated total number, and this number (significant
digits) needs to be multiplied by a specific multiplier depending on the
character in the corresponding PROPDMGEXP and
CROPDMGEXP columns which denotes the exponent.
We explore the unique characters (exponents) of the columns
PROPDMGEXP and CROPDMGEXP columns:
unique(data$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(data$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
We can see in the unique characters set: “K”, “M”, “B” which denote “Kilo”, “Million”, “Billion” respectively, so the multiplier would be a thousand, a million, and a billion respectively.
We also see the character “H” which probably denotes the prefix “hecto”, so the multiplier would be a hundred.
Numbers would be interpreted as is in the exponent, and we will consider other characters as a 1 multiplier (0 exponent).
So we create the following function to convert the exponent character to an integer multiplier:
convertStrMultiplier <- function(x) {
x <- toupper(x)
intExp <- strtoi(x)
mult <- c("K", "M", "B")
if(x == "H") exponent <- 2
else if(x %in% mult) exponent <- 3 * which(mult == x)
else if(!is.na(intExp)) exponent <- intExp
else exponent <- 0
10 ^ exponent
}
Then we create 2 new variables fullPropDMG and
fullCropDMG which represent total damage estimates after
multiplying by the output of the function
convertStrMultiplier().
data$fullPropDMG <- data$PROPDMG * sapply(data$PROPDMGEXP, convertStrMultiplier)
data$fullCropDMG <- data$CROPDMG * sapply(data$CROPDMGEXP, convertStrMultiplier)
Re-applying the same technique from the injuries calculation above,
we use tapply() to get the total sum of property damage per
event type.
property <- tapply(data$fullPropDMG, data$EVTYPE, sum)
Then we explore this property vector sorted in a
descending order to determine which events caused the most amount of
property damage.
head(sort(property, decreasing = TRUE), 10)
## FLOOD HURRICANE/TYPHOON TORNADO STORM SURGE
## 144657709807 69305840000 56947380677 43323536000
## FLASH FLOOD HAIL HURRICANE TROPICAL STORM
## 16822673979 15735267513 11868319010 7703890550
## WINTER STORM HIGH WIND
## 6688497251 5270046295
From this data, we can see that only 7 types of events caused more than 10 billion dollars worth of property damage.
Using the same method again, we calculate the vector
crop which we can use to explore the most amount of crop
damage per event type:
crop <- tapply(data$fullCropDMG, data$EVTYPE, sum)
head(sort(crop, decreasing = TRUE), 10)
## DROUGHT FLOOD RIVER FLOOD ICE STORM
## 13972566000 5661968450 5029459000 5022113500
## HAIL HURRICANE HURRICANE/TYPHOON FLASH FLOOD
## 3025954473 2741910000 2607872800 1421317100
## EXTREME COLD FROST/FREEZE
## 1292973000 1094086000
From this data, we can see that only 7 types of events caused more than 2 billion dollars worth of crop damage.
We calculate the total amount of economic damage by adding property damage to crop damage.
economicDMG <- property + crop
head(sort(economicDMG, decreasing = TRUE), 10)
## FLOOD HURRICANE/TYPHOON TORNADO STORM SURGE
## 150319678257 71913712800 57362333947 43323541000
## HAIL FLASH FLOOD DROUGHT HURRICANE
## 18761221986 18243991079 15018672000 14610229010
## RIVER FLOOD ICE STORM
## 10148404500 8967041360
From this data, we can see that the same top 3 event types that caused the greatest amount of property damage, caused the greatest economic consequences overall.
We plot the data for property damage, crop damage and overall economic consequence in a bar plot after editing the y-axis to show numbers in billions of dollars:
property <- property / 10 ^ 9
crop <- crop / 10 ^ 9
economicDMG <- economicDMG / 10 ^ 9
par(mfrow = c(2, 2), mar = c(10, 4, 4, 2), oma = c(2, 0, 0, 0))
barplot(property[property > 10], main = "Total Property Damage in Billion Dollars", las = 2)
barplot(crop[crop > 2], main = "Total Crop Damage in Billion Dollars", las = 2)
barplot(economicDMG[economicDMG > 50], main = "Total Economic Consequences in Billion Dollars")
mtext("Economic Consequences of Weather Events", side = 1, line = 1, outer = T)
box("inner")
We can see that floods caused the most damage to property with losses estimated by about 145 billion dollars, followed by hurricanes/typhoons, tornadoes and storm surges.
With respect to crop damage, it was drought that topped the list with about 14 billion dollars of losses, followed by flood, river flood and ice storms.
Overall, It was floods that caused the greatest economic consequences with about 150 billion dollars of losses, followed by hurricanes/typhoons which caused about 72 billion dollars of losses, then tornadoes with about 57 billions.