In this report we use data from the NOAA Storm Database to evaluate the health and economic consequences of different weather phenomena. The data set includes information about storms and other sever weather events such as: location, duration, injuries and fatalities, as well as damage to crops and properties. The record starts in 1950 and ends in November 2011. We first describe the process to load and clean the data. Next, a simple analysis is performed to determine which events had the greatest health and economic consequences between 1950 and 2011.
The data set is contained in a file compressed using the bzip2 algorithm to reduce its size. We begin by decompressing the file and reading it into the variable data. We use the fread function because it is faster than read.csv. You will need to install the R.utils package for the process to work correctly.
library(data.table)
data<-fread("repdata-data-StormData.csv.bz2")
sdata<-dim(data)
By examining data we can see that there are 902297observations with 37 variables each. The names of the variables are:
names(data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
We will not be using all variables in this analysis, so we subset data to remove the unnecessary columns.
library(stringr)
data <- data [,-c(1,5:7,9:20,30:37)]
data$YEAR<-str_sub(data$BGN_DATE, -12, -8)
NOAA’s documentation states 48 valid event types. However, there are 985 event types in the data set, meaning there are some typos and invalid names. We use a list of the valid event types to correct as many typos as possible.
library(stringdist)
## List of valid event types
evtype <- toupper(c("Astronomical Low Tide", "Avalanche", "Blizzard","Coastal Flood","Cold/Wind Chill", "Debris Flow","Dense Fog","Dense Smoke","Drought","Dust Devil","Dust Storm", "Excessive Heat","Extreme Cold/Wind Chill","Flash Flood", "Flood", "Frost/Freeze", "Funnel Cloud", "Freezing Fog","Hail","Heat","Heavy Rain","Heavy Snow","High Surf","High Wind", "Hurricane (Typhoon)","Ice Storm", "Lake-Effect Snow","Lakeshore Flood","Lightning","Marine Hail","Marine High Wind", "Marine Strong Wind","Marine Thunderstorm Wind", "Rip Current","Seiche","Sleet","Storm Surge/Tide","Strong Wind","Thunderstorm Wind","Tornado","Tropical Depression","Tropical Storm","Tsunami","Volcanic Ash", "Waterspout","Wildfire", "Winter Storm", "Winter Weather"))
## We use amatch to correct typos
matched <- amatch(x = data$EVTYPE,table = evtype,maxDist = 9)
## Replace invalid names with valid ones
data$EVTYPE<- evtype[matched]
After correction, we end up with 49 types of events. The additional event type is NA, assigned to those entries for which a match was not encountered. They represent only 0.5668865% of the entries so we have chosen to ignore them in this analysis.
Next, we convert the property damage exponent (PROPDMGEXP) and crop damage exponent (CROPDMGEXP) from a letter to its corresponding numeric value. We have based our conversion on the work presented here
data$PROPDMGEXP<- as.factor(data$PROPDMGEXP)
## We substitute the exponent letter for its numeric equivalent
levels(data$PROPDMGEXP) <- list(levels(data$PROPDMGEXP), "0" = c("","-","?"),"1" = c("+"), "10" = "0":"8", "100" = c("h","H"), "1000" = c("K"), "1000000" = c("M","m"), "1000000000" = c("B"))
data$PROPDMGEXP <- as.numeric(as.character(data$PROPDMGEXP))
## Calculate the actual damage
data$PROPDMGTOT <- data$PROPDMG*data$PROPDMGEXP
data$CROPDMGEXP<- as.factor(data$CROPDMGEXP)
## We substitute the exponent letter for its numeric equivalent
levels(data$CROPDMGEXP) <- list(levels(data$CROPDMGEXP), "0" = c("","-","?"),"1" = c("+"), "10" = "0":"8", "100" = c("h","H"), "1000" = c("K"), "1000000" = c("M","m"), "1000000000" = c("B"))
data$CROPDMGEXP <- as.numeric(as.character(data$CROPDMGEXP))
## Calculate the actual damage
data$CROPDMGTOT <- data$CROPDMG*data$CROPDMGEXP
Once the data has been cleaned and processed, we can proceed to evaluate the health and economic consequences of storms and other severe weather events in the USA between 1950 and 2011. There are several options to rank the impact of each event type. We have chosen to calculate the average effect per year and ranked them accordingly.
There are two variables related to the economic consequences: crop damage and property damage. We first estimate the yearly average damage to crops and property for each event type.
## We first estimate the total impact of every event type for every available year
crop<-tapply(data$CROPDMGTOT, list(data$EVTYPE,data$YEAR), sum, na.rm=TRUE)
property<-tapply(data$PROPDMGTOT, list(data$EVTYPE,data$YEAR), sum, na.rm=TRUE)
## Then we calculate the yearly average impact on crops and property separately
avg_crop<-data.frame(avg_crop=rowMeans(crop, na.rm = TRUE))
avg_crop$evtype<- row.names(avg_crop)
avg_property<-data.frame(avg_property=rowMeans(property, na.rm = TRUE))
avg_property$evtype<- row.names(avg_property)
## Now we calculate the total impact (crop+property)
avg_totaldmg <- data.frame(evtype=avg_crop$evtype,total=avg_crop$avg_crop+avg_property$avg_property)
## We sort the data frame in descending order
avg_crop<-avg_crop[order(avg_crop$avg_crop, decreasing = TRUE),]
avg_property<-avg_property[order(avg_property$avg_property, decreasing = TRUE),]
avg_totaldmg<-avg_totaldmg[order(avg_totaldmg$total, decreasing = TRUE),]
## We remove the row names for aesthetic purposes
rownames(avg_crop) <- NULL
rownames(avg_property) <- NULL
rownames(avg_totaldmg) <- NULL
The events can be ranked based on their impact on crops, property, or a combination of both. The top ten event types based on crop damage and property damage separatedly are presented below. Table 1 and 2 show the top ten event types based on their average yearly economic impact to crops and property, respectively.
library(scales)
library(knitr)
library(kableExtra)
## We create the table for the crop damage
kable(head(avg_crop[,c(2,1)], n = 10), format="html",
caption = "Table 1. Top 10 Event types by crop damage",full_width = F, col.names = c("Event Type", "Average crop damage per year ($)"),align=rep('c', 2)) %>% kable_styling("striped", full_width = F)
| Event Type | Average crop damage per year ($) |
|---|---|
| DROUGHT | 735399000 |
| HURRICANE (TYPHOON) | 460563800 |
| FLASH FLOOD | 345417955 |
| FLOOD | 302579682 |
| ICE STORM | 264321763 |
| SEICHE | 152328378 |
| FROST/FREEZE | 91325733 |
| FUNNEL CLOUD | 68051211 |
| HAIL | 53094345 |
| HEAVY RAIN | 42310674 |
## We create the table for the property damage
kable(head(avg_property[,c(2,1)], n = 10), format = "html",
caption = "Table 2. Top 10 Event types by property damage", col.names = c("Event Type", "Average property damage per year ($)"),align=rep('c', 2)) %>% kable_styling("striped", full_width = F)
| Event Type | Average property damage per year ($) |
|---|---|
| HURRICANE (TYPHOON) | 12122964333 |
| FLOOD | 7653316232 |
| STORM SURGE/TIDE | 2664718000 |
| FLASH FLOOD | 1166560149 |
| TORNADO | 918419947 |
| SEICHE | 660198251 |
| WILDFIRE | 464068239 |
| TROPICAL STORM | 406020555 |
| WINTER STORM | 352079329 |
| THUNDERSTORM WIND | 348313689 |
We now take a look at the rank based on the combined effect (crop+property) wchich we consider a more adequate way of ranking the economic consequences. Table 3 shows the top ten event types based on their total yearly average economic impact. Fig. 1 shows the same information in a barplot for the top five events. It is clear that the highest economic impact is produced by HURRICANE (TYPHOON) causing on average $12,583,528,133 in losses every year.
## We create the table for the combined damage
kable(head(avg_totaldmg, n = 10), format = "html",
caption = "Table 3. Top 10 Event types by combined (crop+property) damage", col.names = c("Event Type", "Average economic damage per year ($)"),align=rep('c', 2)) %>% kable_styling("striped", full_width = F)
| Event Type | Average economic damage per year ($) |
|---|---|
| HURRICANE (TYPHOON) | 12583528133 |
| FLOOD | 7955895914 |
| STORM SURGE/TIDE | 2664765500 |
| FLASH FLOOD | 1511978104 |
| TORNADO | 925112923 |
| SEICHE | 812526628 |
| DROUGHT | 790814821 |
| WILDFIRE | 486323431 |
| ICE STORM | 484742106 |
| TROPICAL STORM | 442594029 |
barplot(height = avg_totaldmg$total[1:5],names.arg = avg_totaldmg$evtype[1:5],main = "Fig. 1 Top Five Events for Economic Consequences",xlab = "Event Type",ylab = "AVERAGE ECONOMIC CONSEQUENCES PER YEAR ($)",cex.names = 0.6)
The health consequences can also be evaluated based on two variables: fatalities and injuries. We begin by calculating the average number of deaths or injuries per year for each event type.
## We estimate the total number of fatalities and injuries for each year and event type
fatalities<-tapply(data$FATALITIES, list(data$EVTYPE,data$YEAR), sum, na.rm=TRUE)
injuries<-tapply(data$INJURIES, list(data$EVTYPE,data$YEAR), sum, na.rm=TRUE)
## Next, we calculate the yearly average
avg_fat<-data.frame(avg_fat=rowMeans(fatalities, na.rm = TRUE))
avg_fat$evtype<- row.names(avg_fat)
avg_inj<-data.frame(avg_inj=rowMeans(injuries, na.rm = TRUE))
avg_inj$evtype<- row.names(avg_inj)
## Then we calculate the combined effect (fatalities+injuries)
avg_total <- data.frame(evtype=avg_fat$evtype,total=avg_fat$avg_fat+avg_inj$avg_inj)
## We order the list in decreasing order
avg_fat<-avg_fat[order(avg_fat$avg_fat, decreasing = TRUE),]
avg_inj<-avg_inj[order(avg_inj$avg_inj, decreasing = TRUE),]
avg_total<-avg_total[order(avg_total$total, decreasing = TRUE),]
## We remove the row names for aesthetic purposes
rownames(avg_fat) <- NULL
rownames(avg_inj) <- NULL
rownames(avg_total) <- NULL
We now look at the top ten events based on the average number of fatalities per year (Table 4) and the average number of injuries (Table 5). We can see that the ranking changes depending on the variable used.
## We create the table for the fatalities
kable(head(avg_fat[,c(2,1)], n = 10), format="html",
caption = "Table 4. Top 10 Event types by fatalities",full_width = F, col.names = c("Event Type", "Average fatalities per year"),align=rep('c', 2)) %>% kable_styling("striped", full_width = F)
| Event Type | Average fatalities per year |
|---|---|
| EXCESSIVE HEAT | 112.22222 |
| TORNADO | 90.85484 |
| HEAT | 79.71429 |
| FLASH FLOOD | 54.78947 |
| LIGHTNING | 43.05263 |
| FLOOD | 32.10526 |
| RIP CURRENT | 31.77778 |
| HIGH WIND | 14.15789 |
| THUNDERSTORM WIND | 13.40000 |
| HURRICANE (TYPHOON) | 12.00000 |
## We create the table for the injuries
kable(head(avg_inj[,c(2,1)], n = 10), format="html",
caption = "Table 5. Top 10 Event types by injuries",full_width = F, col.names = c("Event Type", "Average injuries per year"),align=rep('c', 2)) %>% kable_styling("striped", full_width = F)
| Event Type | Average injuries per year |
|---|---|
| TORNADO | 1473.61290 |
| FLOOD | 416.47368 |
| EXCESSIVE HEAT | 372.38889 |
| LIGHTNING | 275.42105 |
| HURRICANE (TYPHOON) | 212.83333 |
| HEAT | 175.92857 |
| THUNDERSTORM WIND | 163.46667 |
| HIGH WIND | 149.92982 |
| ICE STORM | 106.42105 |
| FLASH FLOOD | 94.84211 |
However, we consider that the health consequences should be evaluated based on the combined number of injuries and fatalities. That ranking is presented in Table 6 and Fig. 2. Based on the combined ranking, the weather event with the highest impact on human health is TORNADO causing on average 1564 injuries and fatalities every year.
## We create the table for the combined effect
kable(head(avg_total, n = 10), format="html",
caption = "Table 6. Top 10 Event types by fatalities and injuries combined",full_width = F, col.names = c("Event Type", "Average fatalities and/or injuries per year"),align=rep('c', 2)) %>% kable_styling("striped", full_width = F)
| Event Type | Average fatalities and/or injuries per year |
|---|---|
| TORNADO | 1564.4677 |
| EXCESSIVE HEAT | 484.6111 |
| FLOOD | 448.5789 |
| LIGHTNING | 318.4737 |
| HEAT | 255.6429 |
| HURRICANE (TYPHOON) | 224.8333 |
| THUNDERSTORM WIND | 176.8667 |
| HIGH WIND | 164.0877 |
| FLASH FLOOD | 149.6316 |
| ICE STORM | 111.4737 |
barplot(height = avg_total$total[1:5],names.arg = avg_total$evtype[1:5],main = "Fig. 2 Top Five Events for Health Consequences",xlab = "Event Type",ylab = "AVERAGE INJURIES AND/OR FATALITIES PER YEAR",cex.names = 0.6)