Synopsis

In this report we aim to investigate which weather events cause the greatest damages, we focused our study on the degree of harm to population and the economic consequences. To perform this investigation, we obtained data about major storms and weather events in the United States between 1950 and the end of November 2011 from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database which includes estimates of any fatalities, injuries, and property damage. From this data we found that across the U.S. Tornadoes have caused the greatest damage to human population with respect to fatalities and total injuries. Regarding the effects of these events on the economy, floods had the greatest consequences to property, while drought did the most harm to crops. However, it was floods that caused the greatest economic consequences overall.

Data Processing

We obtained major storms and weather events data in the United States between 1950 and the end of November 2011 from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.

Defining constants

First, we define constants for the data URL and the name and path of the file that will be downloaded

dataURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
dataDir <- "data"
filePath <- file.path(dataDir, "StormData.csv.bz2")

Reading in the data

Then, we check if the data is already loaded, and only if it is not, we create a new directory (if it is not already created) and download the data to it. We then read the compressed csv data (.csv.bz2) using read.csv() because it can read compressed data.

if(!exists('data') || !is.data.frame(get('data'))) {
  if(!file.exists(filePath)) {
    if(!dir.exists(dataDir)) dir.create(dataDir)
    download.file(dataURL, filePath, method = "curl")
  }
  data <- read.csv(filePath)
}
head(data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6

Results

Effects on population

Injuries

In order to measure the sum of injuries for each type of event, we use tapply() to get a vector of the total number of injuries for each weather event type.

injuries <- tapply(data$INJURIES, data$EVTYPE, sum)

Then we explore this injuries vector sorted in a descending order to determine which events caused the most amount of injuries.

head(sort(injuries, decreasing = TRUE), 10)
##           TORNADO         TSTM WIND             FLOOD    EXCESSIVE HEAT 
##             91346              6957              6789              6525 
##         LIGHTNING              HEAT         ICE STORM       FLASH FLOOD 
##              5230              2100              1975              1777 
## THUNDERSTORM WIND              HAIL 
##              1488              1361

From this data, we can see that only 5 types of events caused more than 5000 injuries.

Fatalities

Using the same method we used to calculate injuries per event type we get the vector fatalities which we can use to explore the most amount of fatalities per event type:

fatalities <- tapply(data$FATALITIES, data$EVTYPE, sum)
head(sort(fatalities, decreasing = TRUE), 10)
##        TORNADO EXCESSIVE HEAT    FLASH FLOOD           HEAT      LIGHTNING 
##           5633           1903            978            937            816 
##      TSTM WIND          FLOOD    RIP CURRENT      HIGH WIND      AVALANCHE 
##            504            470            368            248            224

From this data, we can see that only 5 types of events caused more than 800 injuries.

Total population harm

However, what we are really interested in finding out is the total amount of human population damage that is caused by weather events. So we sum the 2 previous vectors injuries and fatalities to get total population damage.

populationDMG <- injuries + fatalities
head(sort(populationDMG, decreasing = TRUE), 10)
##           TORNADO    EXCESSIVE HEAT         TSTM WIND             FLOOD 
##             96979              8428              7461              7259 
##         LIGHTNING              HEAT       FLASH FLOOD         ICE STORM 
##              6046              3037              2755              2064 
## THUNDERSTORM WIND      WINTER STORM 
##              1621              1527

From this data, we can see that the same top 5 event types that caused the greatest number of injuries, caused the most harm to population overall. We plot this data in a bar plot:

par(oma = c(2, 0, 0, 0))
barplot(populationDMG[populationDMG > 5000])
mtext("Total Harm to Population (Injuries + Fatalities) by Weather Event Type", side = 1, line = 1, outer = T)
box("inner")

We can see that tornadoes caused the most harm to human population with 96 thousand affected humans, followed by excessive heat, TSTM wind, floods and lightning which caused 8.4K, 7.4K, 7.2K, 6K humans affected respectively.

Effects on the Economy

Effects on the economy are measured by estimated property damage and crop damage.

head(data[,c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")], 10)
##     EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1  TORNADO    25.0          K       0           
## 2  TORNADO     2.5          K       0           
## 3  TORNADO    25.0          K       0           
## 4  TORNADO     2.5          K       0           
## 5  TORNADO     2.5          K       0           
## 6  TORNADO     2.5          K       0           
## 7  TORNADO     2.5          K       0           
## 8  TORNADO     2.5          K       0           
## 9  TORNADO    25.0          K       0           
## 10 TORNADO    25.0          K       0

Looking at this data, we can tell that the columns PROPDMG and CROPDMG have only the significant digits of the estimated total number, and this number (significant digits) needs to be multiplied by a specific multiplier depending on the character in the corresponding PROPDMGEXP and CROPDMGEXP columns which denotes the exponent.

We explore the unique characters (exponents) of the columns PROPDMGEXP and CROPDMGEXP columns:

unique(data$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(data$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

We can see in the unique characters set: “K”, “M”, “B” which denote “Kilo”, “Million”, “Billion” respectively, so the multiplier would be a thousand, a million, and a billion respectively.

We also see the character “H” which probably denotes the prefix “hecto”, so the multiplier would be a hundred.

Numbers would be interpreted as is in the exponent, and we will consider other characters as a 1 multiplier (0 exponent).

So we create the following function to convert the exponent character to an integer multiplier:

convertStrMultiplier <- function(x) {
  x <- toupper(x)
  intExp <- strtoi(x)
  mult <- c("K", "M", "B")
  
  if(x == "H") exponent <- 2
  else if(x %in% mult) exponent <- 3 * which(mult == x)
  else if(!is.na(intExp)) exponent <- intExp
  else exponent <- 0

  10 ^ exponent
}

Then we create 2 new variables fullPropDMG and fullCropDMG which represent total damage estimates after multiplying by the output of the function convertStrMultiplier().

data$fullPropDMG <- data$PROPDMG * sapply(data$PROPDMGEXP, convertStrMultiplier)
data$fullCropDMG <- data$CROPDMG * sapply(data$CROPDMGEXP, convertStrMultiplier)

Properties

Re-applying the same technique from the injuries calculation above, we use tapply() to get the total sum of property damage per event type.

property <- tapply(data$fullPropDMG, data$EVTYPE, sum)

Then we explore this property vector sorted in a descending order to determine which events caused the most amount of property damage.

head(sort(property, decreasing = TRUE), 10)
##             FLOOD HURRICANE/TYPHOON           TORNADO       STORM SURGE 
##      144657709807       69305840000       56947380677       43323536000 
##       FLASH FLOOD              HAIL         HURRICANE    TROPICAL STORM 
##       16822673979       15735267513       11868319010        7703890550 
##      WINTER STORM         HIGH WIND 
##        6688497251        5270046295

From this data, we can see that only 7 types of events caused more than 10 billion dollars worth of property damage.

Crops

Using the same method again, we calculate the vector crop which we can use to explore the most amount of crop damage per event type:

crop <- tapply(data$fullCropDMG, data$EVTYPE, sum)
head(sort(crop, decreasing = TRUE), 10)
##           DROUGHT             FLOOD       RIVER FLOOD         ICE STORM 
##       13972566000        5661968450        5029459000        5022113500 
##              HAIL         HURRICANE HURRICANE/TYPHOON       FLASH FLOOD 
##        3025954473        2741910000        2607872800        1421317100 
##      EXTREME COLD      FROST/FREEZE 
##        1292973000        1094086000

From this data, we can see that only 7 types of events caused more than 2 billion dollars worth of crop damage.

Total Economic Consequences

We calculate the total amount of economic damage by adding property damage to crop damage.

economicDMG <- property + crop
head(sort(economicDMG, decreasing = TRUE), 10)
##             FLOOD HURRICANE/TYPHOON           TORNADO       STORM SURGE 
##      150319678257       71913712800       57362333947       43323541000 
##              HAIL       FLASH FLOOD           DROUGHT         HURRICANE 
##       18761221986       18243991079       15018672000       14610229010 
##       RIVER FLOOD         ICE STORM 
##       10148404500        8967041360

From this data, we can see that the same top 3 event types that caused the greatest amount of property damage, caused the greatest economic consequences overall.

We plot the data for property damage, crop damage and overall economic consequence in a bar plot after editing the y-axis to show numbers in billions of dollars:

property <- property / 10 ^ 9
crop <- crop / 10 ^ 9
economicDMG <- economicDMG / 10 ^ 9

par(mfrow = c(2, 2), mar = c(10, 4, 4, 2), oma = c(2, 0, 0, 0))
barplot(property[property > 10], main = "Total Property Damage in Billion Dollars", las = 2)
barplot(crop[crop > 2], main = "Total Crop Damage in Billion Dollars", las = 2)
barplot(economicDMG[economicDMG > 50], main = "Total Economic Consequences in Billion Dollars")
mtext("Economic Consequences of Weather Events", side = 1, line = 1, outer = T)
box("inner")

We can see that floods caused the most damage to property with losses estimated by about 145 billion dollars, followed by hurricanes/typhoons, tornadoes and storm surges.

With respect to crop damage, it was drought that topped the list with about 14 billion dollars of losses, followed by flood, river flood and ice storms.

Overall, It was floods that caused the greatest economic consequences with about 150 billion dollars of losses, followed by hurricanes/typhoons which caused about 72 billion dollars of losses, then tornadoes with about 57 billions.