An overview of the impacts of storms and severe weather events on population health and economic problems.

Synopsis

Weather events can have harmful effects on population health and can also have negative economic consequences. I address the following question:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Across the United States, which types of events have the greatest economic consequences?

This reports uses the Paretal law to focus on the small set of weather events that has the largest negative effect on population health (fatalities and injuries) and economy (property and crop damage).

Excessive heat and tornadoes cause most harm to population health. Different types of flooding and lightning follow. Floodings and heavy storms are most economically damaging.

Note by the author

Results can’t be taken seriously untill all the event types are properly organised and labeled. I have chosen to not perform that part of data processing, because to do it completely would take too long and to do it partially would bias the results.

The author has been instructed to work with this dataset and has not searched for other similar datasets, or validated this dataset to be the most appropriate to report on the topic at hand. This report should only be used to guide decisions on national level. It does not specify results on any geographical area, seasonal differences or other criteria. Although this dataset does allow for such analysis.

1 Data Processing

The data availabe for this report comes from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The data is available for download here.

1.1 Loading the data

Check whether the file is already in the local directory, if not download it. Then load the data into R.

if(!file.exists("repdata_data_StormData.csv.bz2")) {
  fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  download.file(fileURL, destfile = "repdata_data_StormData.csv.bz2")
}
rawData <- read.csv("repdata_data_StormData.csv.bz2")

1.2 Cleaning the data

library(dplyr)

We are interested in the types of events that are harmful for population health and the economy. First we need to identify the types of events, define population health in metrics and the same for economy. In our data set are the following columns.

names(rawData)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

We wish to keep the following variables: EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP.

eventData <- select( rawData, BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

Remove data from based on year

Earlier years had a lack of data recording and hence their data will skew the results. First we take the year component of the begin date.

eventData$year <- as.numeric(format(as.Date(eventData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))
nbins <- max(eventData$year) - min(eventData$year)

Plot the events per year to choose a cut-off point of data.

nbins <- max(eventData$year) - min(eventData$year)
hist(eventData$year, breaks = nbins, xlab = "Year", main = "Plot 1: Histogram of number of events per year")

Based on this plot I will remove any data from before 1996. First remove the unecessary BGN_DATE variable.

eventData <- select(eventData, -BGN_DATE)
eventData <- eventData[eventData$year >= 1996, ]

Cleaning Event Data

First we will check how many types of weather events are documented.

numEvents <- length(levels(eventData$EVTYPE))
numEvents
## [1] 985

There are 985 types of events. To get an idea about the events let’s look at a random sample of event names:

set.seed(144)
sample(levels(eventData$EVTYPE), size = 20)
##  [1] "BRUSH FIRE"                     "Summary of April 13"           
##  [3] "Summary of June 24"             "Microburst"                    
##  [5] "HEAVY SNOW   FREEZING RAIN"     "Sml Stream Fld"                
##  [7] "BLIZZARD/HEAVY SNOW"            "Summary of June 18"            
##  [9] "COLD WAVE"                      "THUNDERSTORM WINDS LIGHTNING"  
## [11] "RECORD COLD"                    "FLOOD/FLASH"                   
## [13] "PROLONG COLD/SNOW"              "EXTREME HEAT"                  
## [15] "DRY MICROBURST 50"              "Coastal Flood"                 
## [17] "RAIN AND WIND"                  "MONTHLY TEMPERATURE"           
## [19] "THUNDERSTORM WINDS/FLASH FLOOD" "HIGH WIND/BLIZZARD/FREEZING RA"

The event names are all over the place. This definitely will have to be sorted. Due to time constraints I will not perform this operation for this project. To clean all events will take too long and to clean only a part will bias the results in favour of the part that I clean.

Cleaning Fatality and Injury Data

Fatality and Injury data are both numeric. Let’s look at NA values, min and maximum values.

sum(is.na(eventData$FATALITIES))
## [1] 0
sum(is.na(eventData$INJURIES))
## [1] 0
summary(eventData$FATALITIES)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##   0.00000   0.00000   0.00000   0.01336   0.00000 158.00000
summary(eventData$INJURIES)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##    0.0000    0.0000    0.0000    0.0887    0.0000 1150.0000

Not expecting any problems from this data.

Calculate the total number of recorded fatalities and injuries.

TotFat <- sum(eventData$FATALITIES)
TotInj <- sum(eventData$INJURIES)

Total number of fatalities is 8732 and total number of injuries is 57975. We will require these numbers to compare the severity of an event type harm in comparison to the total.

Cleaning Property and Crop Damage Data

The property en crop data numbers are kind of split into columns. For both there is the amount in PROPDMG and CROPDMG and a multiplier for the amount in PROPDMGEXP and CROPDMGEXP. We want one column for both property and crop damage each, with the full numerical value of the damage.

First of all look at the levels of PROPDMGEXP and CROPDMGEXP.

levels(eventData$PROPDMGEXP)
##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
levels(eventData$CROPDMGEXP)
## [1] ""  "?" "0" "2" "B" "k" "K" "m" "M"

Any - number value remains - h or H is 100 - k or K is 1,000 - m or M is 1,000,000 - b or B is 1,000,000,000 - other levels are discarded

eventData$PROPDMGEXP <- gsub("[Hh]", "2", eventData$PROPDMGEXP)
eventData$PROPDMGEXP <- gsub("[Kk]", "3", eventData$PROPDMGEXP)
eventData$PROPDMGEXP <- gsub("[Mm]", "6", eventData$PROPDMGEXP)
eventData$PROPDMGEXP <- gsub("[Bb]", "9", eventData$PROPDMGEXP)
eventData$PROPDMGEXP <- gsub("\\+|\\-|\\?\\ ", "0",  eventData$PROPDMGEXP)
eventData$PROPDMGEXP <- as.numeric(eventData$PROPDMGEXP)
eventData$PROPDMGEXP[is.na(eventData$PROPDMGEXP)] <- 0

eventData$CROPDMGEXP <- gsub("[Hh]", "2", eventData$CROPDMGEXP)
eventData$CROPDMGEXP <- gsub("[Kk]", "3", eventData$CROPDMGEXP)
eventData$CROPDMGEXP <- gsub("[Mm]", "6", eventData$CROPDMGEXP)
eventData$CROPDMGEXP <- gsub("[Bb]", "9", eventData$CROPDMGEXP)
eventData$CROPDMGEXP <- gsub("\\+|\\-|\\?\\ ", "0", eventData$CROPDMGEXP)
eventData$CROPDMGEXP <- as.numeric(eventData$CROPDMGEXP)
eventData$CROPDMGEXP[is.na(eventData$CROPDMGEXP)] <- 0

Now combine the numbers with the exponents.

eventData$PROPDMG <- eventData$PROPDMG * (10 ^ eventData$PROPDMGEXP)
eventData$CROPDMG <- eventData$CROPDMG * (10 ^ eventData$CROPDMGEXP)

Discard the unecessary variables.

eventData <- select(eventData, EVTYPE, FATALITIES, INJURIES, PROPDMG, CROPDMG)

Calculate total economic consequences for property damage and crop damage.

TotPROPDMG <- sum(eventData$PROPDMG)
TotCROPDMG <- sum(eventData$CROPDMG)

Total property damage: 366767615380 Total crop damage: 34752728730

1.3 Core data processing

In this section we will create subsets of the raw data that can be used for meaningful visualisations in the results section.

1.3.1 Population health

First we ‘bin’ the fatalities and injuries. For each event type we sum the total amount of injuries and fatalities.

FatPerEve <- tapply(eventData$FATALITIES, eventData$EVTYPE, sum)
InjPerEve <- tapply(eventData$INJURIES, eventData$EVTYPE, sum)

Next we create a news data frame that has a row for each event type. It has 3 columns, one is the event type name, another the total fatalities for that event type and a last the total injuries for that event type. Immediatly we remove any event types for which the number of fatalities and injuries both are 0. We store that data in PHDF which stands for population health data frame.

PHDFAll <- data.frame(FatPerEve, InjPerEve, levels(eventData$EVTYPE))
PHDF <- subset(PHDFAll, FatPerEve > 0 & InjPerEve > 0)

Next we use arrange from the dplyr package to arrange the event types in order of most fatalities to least fatalities. Event types that have equal number of fatalities are ordered by number of injuries.

PHDFord <- arrange(PHDF, desc(FatPerEve), desc(InjPerEve))

Next we calculate the percentage that an event type’s fatalities is of the total number of fatalities and store it in the FatPerc variable. We also calculate the cumulative sum of fatalities row-wise (FatCumSum variable), and the percentage of total that sum corresponds to (FatCumPerc variable). We do the same for injuries.
The reason for the CumSum variables is to see the percentage of total harm caused as we the next worsed event type, it will make more sense when we print the data frame.

for(i in 1:dim(PHDFord)[1]) {
  PHDFord$FatPerc[i] <- round(100*PHDFord$FatPerEve[i]/TotFat, digits = 2)
  PHDFord$FatCumSum[i] <- sum(head(PHDFord$FatPerEve,i))
  PHDFord$FatCumPerc[i] <- round(100*PHDFord$FatCumSum[i]/TotFat)
  PHDFord$InjPerc[i] <- round(100*PHDFord$InjPerEve[i]/TotInj, digits = 2)
  PHDFord$InjCumSum[i] <- sum(head(PHDFord$InjPerEve,i))
  PHDFord$InjCumPerc[i] <- round(100*PHDFord$InjCumSum[i]/TotInj)
}

Now we will print the top 20 event types.

head(PHDFord, n=20)
##    FatPerEve InjPerEve levels.eventData.EVTYPE. FatPerc FatCumSum
## 1       1797      6391           EXCESSIVE HEAT   20.58      1797
## 2       1511     20667                  TORNADO   17.30      3308
## 3        887      1674              FLASH FLOOD   10.16      4195
## 4        651      4141                LIGHTNING    7.46      4846
## 5        414      6758                    FLOOD    4.74      5260
## 6        340       209              RIP CURRENT    3.89      5600
## 7        241      3629                TSTM WIND    2.76      5841
## 8        237      1222                     HEAT    2.71      6078
## 9        235      1083                HIGH WIND    2.69      6313
## 10       223       156                AVALANCHE    2.55      6536
## 11       202       294             RIP CURRENTS    2.31      6738
## 12       191      1292             WINTER STORM    2.19      6929
## 13       130      1400        THUNDERSTORM WIND    1.49      7059
## 14       125        24  EXTREME COLD/WIND CHILL    1.43      7184
## 15       113        79             EXTREME COLD    1.29      7297
## 16       107       698               HEAVY SNOW    1.23      7404
## 17       103       278              STRONG WIND    1.18      7507
## 18        95        12          COLD/WIND CHILL    1.09      7602
## 19        94       230               HEAVY RAIN    1.08      7696
## 20        87       146                HIGH SURF    1.00      7783
##    FatCumPerc InjPerc InjCumSum InjCumPerc
## 1          21   11.02      6391         11
## 2          38   35.65     27058         47
## 3          48    2.89     28732         50
## 4          55    7.14     32873         57
## 5          60   11.66     39631         68
## 6          64    0.36     39840         69
## 7          67    6.26     43469         75
## 8          70    2.11     44691         77
## 9          72    1.87     45774         79
## 10         75    0.27     45930         79
## 11         77    0.51     46224         80
## 12         79    2.23     47516         82
## 13         81    2.41     48916         84
## 14         82    0.04     48940         84
## 15         84    0.14     49019         85
## 16         85    1.20     49717         86
## 17         86    0.48     49995         86
## 18         87    0.02     50007         86
## 19         88    0.40     50237         87
## 20         89    0.25     50383         87

We can see that the the top twenty fatality-causing events account for a total of 89 percent of fatalities (FatCumPerc) and 87 percent of injuries (InjCumPerc). The top twenty event types only accounts for 2% of event types! (remeber total number of events is almost a thousand). In the results section we will plot the results.

1.3.2 Economic consequences

We will apply the same process as we did for population health except for some small details. When we arrange the results we will not arrange it hierachically like fatalities first and then by injury, instead we will arrange by the sum of both property and crop damage. We will also look at the results of the sum of damage.

First, find the sum of both property and crop damage per event type.

CROPDMGPerEve <- tapply(eventData$CROPDMG, eventData$EVTYPE, sum)
PROPDMGPerEve <- tapply(eventData$PROPDMG, eventData$EVTYPE, sum)

Next, create a data frame with event types as rows of a column with the corresponding propert and crop damages. Then remove all complete zero rows.

ECDFAll <- data.frame(PROPDMGPerEve, CROPDMGPerEve, levels(eventData$EVTYPE))
ECDF <- subset(ECDFAll, PROPDMGPerEve > 0 & CROPDMGPerEve > 0)

Here, we apply the only difference in process. We arrange the rows by the sum of property and crop damage together.

ECDFord <- arrange(ECDF, desc((PROPDMGPerEve+CROPDMGPerEve)))

We repeat the same confusing process of percentage of total, cumulative sum and percentage of cumulative sum. This time we also add a calculation of the percentage of damage from the total for both property and crop damage together. We do this because unlike the fatality and injury (which can’t really be compared and summed), we can sum the economic damage caused to properties and crops.

for(i in 1:dim(ECDFord)[1]) {
    ECDFord$PROPPerc[i] <- round(100*ECDFord$PROPDMGPerEve[i]/TotPROPDMG, digits = 2)
    ECDFord$PROPCumSum[i] <- sum(head(ECDFord$PROPDMGPerEve,i))
    ECDFord$PROPCumPerc[i] <- round(100*ECDFord$PROPCumSum[i]/TotPROPDMG)
    ECDFord$CROPPerc[i] <- round(100*ECDFord$CROPDMGPerEve[i]/TotCROPDMG, digits = 2)
    ECDFord$CROPCumSum[i] <- sum(head(ECDFord$CROPDMGPerEve,i))
    ECDFord$CROPCumPerc[i] <- round(100*ECDFord$CROPCumSum[i]/TotCROPDMG)
    ECDFord$TotPerc[i] <- round(100*(ECDFord$PROPDMGPerEve[i]+ECDFord$CROPDMGPerEve[i])/(TotPROPDMG+TotCROPDMG), digits = 2)
}

Again we explore the top 20 event types.

head(ECDFord, n=20)
##    PROPDMGPerEve CROPDMGPerEve levels.eventData.EVTYPE. PROPPerc
## 1   143944833550    4974778400                    FLOOD    39.25
## 2    69305840000    2607872800        HURRICANE/TYPHOON    18.90
## 3    43193536000          5000              STORM SURGE    11.78
## 4    24616945710     283425010                  TORNADO     6.71
## 5    14595143420    2476029450                     HAIL     3.98
## 6    15222203910    1334901700              FLASH FLOOD     4.15
## 7    11812819010    2741410000                HURRICANE     3.22
## 8     1046101000   13367566000                  DROUGHT     0.29
## 9     7642475550     677711000           TROPICAL STORM     2.08
## 10    5247860360     633561300                HIGH WIND     1.43
## 11    4758667000     295472800                 WILDFIRE     1.30
## 12    4478026440     553915350                TSTM WIND     1.22
## 13    4641188000        850000         STORM SURGE/TIDE     1.27
## 14    3382654440     398331000        THUNDERSTORM WIND     0.92
## 15    3642248810      15660000                ICE STORM     0.99
## 16    3001782500     106782330         WILD/FOREST FIRE     0.82
## 17    1532743250      11944000             WINTER STORM     0.42
## 18     584864440     728169800               HEAVY RAIN     0.16
## 19      19760400    1288973000             EXTREME COLD     0.01
## 20       9480000    1094086000             FROST/FREEZE     0.00
##      PROPCumSum PROPCumPerc CROPPerc  CROPCumSum CROPCumPerc TotPerc
## 1  143944833550          39    14.31  4974778400          14   37.09
## 2  213250673550          58     7.50  7582651200          22   17.91
## 3  256444209550          70     0.00  7582656200          22   10.76
## 4  281061155260          77     0.82  7866081210          23    6.20
## 5  295656298680          81     7.12 10342110660          30    4.25
## 6  310878502590          85     3.84 11677012360          34    4.12
## 7  322691321600          88     7.89 14418422360          41    3.62
## 8  323737422600          88    38.46 27785988360          80    3.59
## 9  331379898150          90     1.95 28463699360          82    2.07
## 10 336627758510          92     1.82 29097260660          84    1.46
## 11 341386425510          93     0.85 29392733460          85    1.26
## 12 345864451950          94     1.59 29946648810          86    1.25
## 13 350505639950          96     0.00 29947498810          86    1.16
## 14 353888294390          96     1.15 30345829810          87    0.94
## 15 357530543200          97     0.05 30361489810          87    0.91
## 16 360532325700          98     0.31 30468272140          88    0.77
## 17 362065068950          99     0.03 30480216140          88    0.38
## 18 362649933390          99     2.10 31208385940          90    0.33
## 19 362669693790          99     3.71 32497358940          94    0.33
## 20 362679173790          99     3.15 33591444940          97    0.27

When we look at the top 20 weather events causing most economic damage we see that only the top 13 events have more than 1% impact each. This top thirteen cause 96% of property damage and 86% of crop damage. In the results section there will be a barplot of the top eleven with their percentage impact.

2 Results

This section shows the results in visually actractive manner. If you prefer details see the the previous section.

2.1 Plot Population health results

The next figure shows a bar plot of the 20 most deadly weather events and the percentage of fatalities and injuries from all weather event types caused by that particular event type.

par(mfrow = c(2,1),mar = c(0.5,4,1,1), oma = c(13,1,2,0))
barplot(head(PHDFord$FatPerc, n=20), col = rainbow(20), ylab = "Percentage (%)", main = "Plot 2: Percentage of all fatalities per weather event")
barplot(head(PHDFord$InjPerc, n=20), names.arg = PHDFord$levels.eventData.EVTYPE.[1:20], las =2, col = rainbow(20), main = "Percentage of all injuries per weather event", ylab = "Percentage (%)")

Excessive heat and tornadoes cause most harm to population health. Different types of flooding and lightning follow. You can also see many very extreme events that don’t cause many injuries but are very deadly, like rip currents and avalanches.

2.2 Economic Consequence results

The next figure shows a bar plot of the 13 most economically destructive weather events and the percentage of of the sum of property and crop damage from all weather event types caused by that particular event type.

par(mfrow = c(1,1),mar = c(14,4,4,1))
barplot(head(ECDFord$TotPerc, n=13), col = rainbow(13), ylab = "Percentage (%)", main = "Plot 3: Percentage of combined property and crop damage per weather event", names.arg = ECDFord$levels.eventData.EVTYPE.[1:13], las = 2)

Floodings and heavy storms are most economically damaging. Excessive heat is not near the top. That can be expected because property shouldn’t be damaged much by heat, and crops will be watered unless it is too extreme.