library(readr)
## Warning: package 'readr' was built under R version 3.3.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.3
This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) Storm Database to learn about the consequences of storms and other severe weather events for public health and the economy. The analysis looks to answer the following two broad questions:
For the first question, the analysis focuses on two variables that describe the number of fatalities (FATALITIES) and the number of injuries (INJURIES) resulting form a severe weather event. The most harmful events are defined as those that have caused the greatest total number of fatalities and injuries during 1950-2011. To answer the second question, the analysis focuses on two sources of economic damage from severe weather events - property damage (PROPDMG) and crop damage (CROPDMG). The most harmful events are defined as those that have caused the greatest total damage over the period of 1950-2011. To facilitate decisionmaking, the analysis ranks different types of events (EVTYPE) according to the degree of their severity.
Data are loaded using read_csv function from readr package. This function is capable of reading in archived datasets without having to previously extract them:
mydata <- read_csv("data/StormData.csv.bz2", col_names = TRUE)
## Parsed with column specification:
## cols(
## .default = col_character(),
## STATE__ = col_double(),
## COUNTY = col_double(),
## BGN_RANGE = col_double(),
## COUNTY_END = col_double(),
## END_RANGE = col_double(),
## LENGTH = col_double(),
## WIDTH = col_double(),
## F = col_integer(),
## MAG = col_double(),
## FATALITIES = col_double(),
## INJURIES = col_double(),
## PROPDMG = col_double(),
## CROPDMG = col_double(),
## LATITUDE = col_double(),
## LONGITUDE = col_double(),
## LATITUDE_E = col_double(),
## LONGITUDE_ = col_double(),
## REFNUM = col_double()
## )
## See spec(...) for full column specifications.
The following code summarizes the total injuries per event type over the period 1950-2011, calculates the number of severe weather events for each event type, and also calculates the average number of injuries per event across different types. The results are printed in tabular form in the order of decreasing severity (only the top 25 most severe event types are shown):
tot_inj <- mydata %>%
filter(!is.na(INJURIES)) %>%
group_by(EVTYPE) %>%
summarize(TotalInjuries = sum(INJURIES), N_Events = n(), Avg_Inj_per_Event = round(TotalInjuries/N_Events, digits=1)) %>%
arrange(desc(TotalInjuries)) %>%
filter(TotalInjuries > 0)
## Warning: package 'bindrcpp' was built under R version 3.3.3
print(tot_inj, n=25)
## # A tibble: 158 x 4
## EVTYPE TotalInjuries N_Events Avg_Inj_per_Event
## <chr> <dbl> <int> <dbl>
## 1 TORNADO 91346 60652 1.5
## 2 TSTM WIND 6957 219944 0.0
## 3 FLOOD 6789 25326 0.3
## 4 EXCESSIVE HEAT 6525 1678 3.9
## 5 LIGHTNING 5230 15755 0.3
## 6 HEAT 2100 767 2.7
## 7 ICE STORM 1975 2006 1.0
## 8 FLASH FLOOD 1777 54278 0.0
## 9 THUNDERSTORM WIND 1488 82563 0.0
## 10 HAIL 1361 288661 0.0
## 11 WINTER STORM 1321 11433 0.1
## 12 HURRICANE/TYPHOON 1275 88 14.5
## 13 HIGH WIND 1137 20212 0.1
## 14 HEAVY SNOW 1021 15708 0.1
## 15 WILDFIRE 911 2761 0.3
## 16 THUNDERSTORM WINDS 908 20843 0.0
## 17 BLIZZARD 805 2719 0.3
## 18 FOG 734 538 1.4
## 19 WILD/FOREST FIRE 545 1457 0.4
## 20 DUST STORM 440 427 1.0
## 21 WINTER WEATHER 398 7026 0.1
## 22 DENSE FOG 342 1293 0.3
## 23 TROPICAL STORM 340 690 0.5
## 24 HEAT WAVE 309 74 4.2
## 25 HIGH WINDS 302 1533 0.2
## # ... with 133 more rows
The following code plots Total Injuries from severe weather events across different event types in the decreasing order of severity with the most harmul events starting at the left. To correctly order the evemts in the plot, it was necessary to transform the EVTYPE variable into a factor with explicitly ordered levels that follow the decreasing levels of Total Injuries:
tot_inj$EVTYPE <- factor(tot_inj$EVTYPE,
levels = tot_inj$EVTYPE[order(tot_inj$TotalInjuries, decreasing = TRUE)])
ggplot(tot_inj[1:25,], aes(EVTYPE, TotalInjuries)) +
theme(axis.text.x=element_text(angle=90,hjust=1)) + geom_col() +
labs(title = "Total Injuries by Event Type 1950-2011", x = "Event Type", y = "Total Injuries")
Because some events are substantially more frequent than others, it is important to rank the events based on the average injuries an event causes. For example, a rae but extremely severe event can result in a relatively high number of injuries but low total (cumulative) number of injuries over the years:
tot_inj <- mydata %>%
filter(!is.na(INJURIES)) %>%
group_by(EVTYPE) %>%
summarize(Avg_Inj_per_Event = round(sum(INJURIES)/n(), digits=1),
N_Events = n(),
TotalInjuries = sum(INJURIES)) %>%
arrange(desc(Avg_Inj_per_Event)) %>%
filter(TotalInjuries > 0) %>%
print(n=25)
## # A tibble: 158 x 4
## EVTYPE Avg_Inj_per_Event N_Events TotalInjuries
## <chr> <dbl> <int> <dbl>
## 1 Heat Wave 70.0 1 70
## 2 TROPICAL STORM GORDON 43.0 1 43
## 3 WILD FIRES 37.5 4 150
## 4 THUNDERSTORMW 27.0 1 27
## 5 HIGH WIND AND SEAS 20.0 1 20
## 6 SNOW/HIGH WINDS 18.0 2 36
## 7 GLAZE/ICE STORM 15.0 1 15
## 8 HEAT WAVE DROUGHT 15.0 1 15
## 9 WINTER STORM HIGH WINDS 15.0 1 15
## 10 HURRICANE/TYPHOON 14.5 88 1275
## 11 WINTER WEATHER MIX 11.3 6 68
## 12 EXTREME HEAT 7.0 22 155
## 13 NON-SEVERE WIND DAMAGE 7.0 1 7
## 14 GLAZE 6.8 32 216
## 15 TSUNAMI 6.5 20 129
## 16 WINTER STORMS 5.7 3 17
## 17 TORNADO F2 5.3 3 16
## 18 EXCESSIVE RAINFALL 5.2 4 21
## 19 WATERSPOUT/TORNADO 5.2 8 42
## 20 HEAT WAVE 4.2 74 309
## 21 Torrential Rainfall 4.0 1 4
## 22 EXCESSIVE HEAT 3.9 1678 6525
## 23 HEAT 2.7 767 2100
## 24 MIXED PRECIP 2.6 10 26
## 25 MARINE MISHAP 2.5 2 5
## # ... with 133 more rows
Similar analysis is condcuted for the measure of Total Fatalities:
tot_fat <- mydata %>%
filter(!is.na(FATALITIES)) %>%
group_by(EVTYPE) %>%
summarize(TotalFatalities = sum(FATALITIES), N = n(), Avg_F_per_Event = round(TotalFatalities/N, digits = 1)) %>%
arrange(desc(TotalFatalities)) %>%
filter(TotalFatalities > 0)
print(tot_fat, n=25)
## # A tibble: 168 x 4
## EVTYPE TotalFatalities N Avg_F_per_Event
## <chr> <dbl> <int> <dbl>
## 1 TORNADO 5633 60652 0.1
## 2 EXCESSIVE HEAT 1903 1678 1.1
## 3 FLASH FLOOD 978 54278 0.0
## 4 HEAT 937 767 1.2
## 5 LIGHTNING 816 15755 0.1
## 6 TSTM WIND 504 219944 0.0
## 7 FLOOD 470 25326 0.0
## 8 RIP CURRENT 368 470 0.8
## 9 HIGH WIND 248 20212 0.0
## 10 AVALANCHE 224 386 0.6
## 11 WINTER STORM 206 11433 0.0
## 12 RIP CURRENTS 204 304 0.7
## 13 HEAT WAVE 172 74 2.3
## 14 EXTREME COLD 160 655 0.2
## 15 THUNDERSTORM WIND 133 82563 0.0
## 16 HEAVY SNOW 127 15708 0.0
## 17 EXTREME COLD/WIND CHILL 125 1002 0.1
## 18 STRONG WIND 103 3566 0.0
## 19 BLIZZARD 101 2719 0.0
## 20 HIGH SURF 101 725 0.1
## 21 HEAVY RAIN 98 11723 0.0
## 22 EXTREME HEAT 96 22 4.4
## 23 COLD/WIND CHILL 95 539 0.2
## 24 ICE STORM 89 2006 0.0
## 25 WILDFIRE 75 2761 0.0
## # ... with 143 more rows
The results (25 deadliest) are graphically displayed in the following bar chart:
tot_fat$EVTYPE <- factor(tot_fat$EVTYPE, levels = tot_fat$EVTYPE[order(tot_fat$TotalFatalities, decreasing = TRUE)])
ggplot(tot_fat[1:25,], aes(EVTYPE, TotalFatalities)) + theme(axis.text.x=element_text(angle=90,hjust=1)) + geom_col() +
labs(title = "Total Fatalities by Event Type 1950-2011", x = "Event Type", y = "Total Fatalities")
Finally, the average numbers fatalities per event across different event types:
tot_fat <- mydata %>%
filter(!is.na(FATALITIES)) %>%
group_by(EVTYPE) %>%
summarize(Avg_F_per_Event = round(sum(FATALITIES)/n(), digits = 1), N = n(), TotalFatalities = sum(FATALITIES)) %>%
arrange(desc(Avg_F_per_Event)) %>%
filter(TotalFatalities > 0) %>%
print(tot_fat, n=25)
## # A tibble: 168 x 4
## EVTYPE Avg_F_per_Event N TotalFatalities
## <chr> <dbl> <int> <dbl>
## 1 TORNADOES, TSTM WIND, HAIL 25.0 1 25
## 2 COLD AND SNOW 14.0 1 14
## 3 TROPICAL STORM GORDON 8.0 1 8
## 4 RECORD/EXCESSIVE HEAT 5.7 3 17
## 5 EXTREME HEAT 4.4 22 96
## 6 HEAT WAVE DROUGHT 4.0 1 4
## 7 HIGH WIND/SEAS 4.0 1 4
## 8 MARINE MISHAP 3.5 2 7
## 9 WINTER STORMS 3.3 3 10
## 10 Heavy surf and wind 3.0 1 3
## 11 HIGH WIND AND SEAS 3.0 1 3
## 12 ROUGH SEAS 2.7 3 8
## 13 HEAT WAVES 2.5 2 5
## 14 RIP CURRENTS/HEAVY SURF 2.5 2 5
## 15 HEAT WAVE 2.3 74 172
## 16 UNSEASONABLY WARM AND DRY 2.2 13 29
## 17 HURRICANE OPAL/HIGH WINDS 2.0 1 2
## 18 TSUNAMI 1.6 20 33
## 19 HEAVY SEAS 1.5 2 3
## 20 Hypothermia/Exposure 1.3 3 4
## 21 COLD WEATHER 1.2 4 5
## 22 HEAT 1.2 767 937
## 23 EXCESSIVE HEAT 1.1 1678 1903
## 24 AVALANCE 1.0 1 1
## 25 COASTALSTORM 1.0 1 1
## # ... with 143 more rows
In this analysis, economic consequences of severe weather events are defined as the sum of the property damage (PROPDMG) and crop damage (CROPDMG) from an event. This measure has to be derived by summing up the amounts of PROPDMG and CROPDMG. In order to do that, the measures of PROPDMG and CROPDMG have to be expressed in comparable units - dollars, thousands of dollars, millions of dollars etc. In the original dataset this is not the case: the amount of damage for each type of damage is described by two variables - one variable (PROPDMG or CROPDMG) gives a numeric measure, and the second, character variable tells us the measuement units (“K” corresponds to thousands of dollars, “M” to billions, and “B” to billions). Therefore, those two-column measures for each type of economic effect need to be transformed into a new variable measuring the effect in dollars; then the dollar amounts for property damage and crop damage are added up and again, for convenience, converted to millions of dollars. Any records that have other characters than “K”, “M”, or “B” are treated as entry errors and removed from the analysis:
dmg <- mydata %>%
filter(grepl("[KkMmBb]", PROPDMGEXP) | is.na(PROPDMGEXP)) %>%
mutate(PROPDMGEXP = toupper(PROPDMGEXP)) %>%
filter(grepl("[KkMmBb]", CROPDMGEXP) | is.na(CROPDMGEXP)) %>%
mutate(CROPDMGEXP = toupper(CROPDMGEXP)) %>%
mutate(Prop_Dmg_Dollars = ifelse(PROPDMG==0, 0, ifelse(PROPDMG>0 & PROPDMGEXP=="K", PROPDMG*1000,
ifelse(PROPDMG>0 & PROPDMGEXP=="M", PROPDMG*1000000,
ifelse(PROPDMG>0 & PROPDMGEXP=="B", PROPDMG*1000000000, PROPDMG))))) %>%
mutate(Crop_Dmg_Dollars = ifelse(CROPDMG==0, 0, ifelse(CROPDMG>0 & CROPDMGEXP=="K", CROPDMG*1000,
ifelse(CROPDMG>0 & CROPDMGEXP=="M", CROPDMG*1000000,
ifelse(CROPDMG>0 & CROPDMGEXP=="B", CROPDMG*1000000000, CROPDMG))))) %>%
mutate(Dmg_Dollars_M = (Prop_Dmg_Dollars+Crop_Dmg_Dollars)/1000000)
Now, the Total Economic Damage (Total_Econ_Dmg_M) in millions of dollars over the whole period of time for each event type, along with the number of events, and the average damage caused by each event (Dmg_per_Event_M) are the following:
dmg_tot <- dmg %>%
group_by(EVTYPE) %>%
summarise(Total_Econ_Dmg_M = round(sum(Dmg_Dollars_M), digits = 1), N = n(), Avg_Dmg_per_Event_M = round(Total_Econ_Dmg_M/N, digits=1)) %>%
arrange(desc(Total_Econ_Dmg_M)) %>%
print(n=25)
## # A tibble: 973 x 4
## EVTYPE Total_Econ_Dmg_M N Avg_Dmg_per_Event_M
## <chr> <dbl> <int> <dbl>
## 1 HURRICANE/TYPHOON 71913.7 88 817.2
## 2 STORM SURGE 43323.5 261 166.0
## 3 DROUGHT 15018.7 2487 6.0
## 4 HURRICANE 14610.2 174 84.0
## 5 RIVER FLOOD 10148.4 173 58.7
## 6 ICE STORM 8967.0 2005 4.5
## 7 TROPICAL STORM 8382.2 690 12.1
## 8 WINTER STORM 6715.4 11432 0.6
## 9 HIGH WIND 5908.6 20210 0.3
## 10 WILDFIRE 5060.6 2761 1.8
## 11 TSTM WIND 5047.0 219943 0.0
## 12 STORM SURGE/TIDE 4642.0 148 31.4
## 13 HURRICANE OPAL 3191.8 9 354.6
## 14 WILD/FOREST FIRE 3108.6 1457 2.1
## 15 HEAVY RAIN/SEVERE WEATHER 2500.0 2 1250.0
## 16 TORNADOES, TSTM WIND, HAIL 1602.5 1 1602.5
## 17 HEAVY RAIN 1427.6 11723 0.1
## 18 EXTREME COLD 1360.7 655 2.1
## 19 SEVERE THUNDERSTORM 1205.6 13 92.7
## 20 FROST/FREEZE 1103.6 1342 0.8
## 21 HEAVY SNOW 1067.2 15705 0.1
## 22 BLIZZARD 771.3 2719 0.3
## 23 WILD FIRES 624.1 4 156.0
## 24 TYPHOON 601.1 11 54.6
## 25 EXCESSIVE HEAT 500.2 1678 0.3
## # ... with 948 more rows
The following graph shows the Total Economic Damage caused by each event type over the period of 1950-2011 in the order of decreasing importance (from the most damaging at the left to the least damaging to the right):
dmg_tot$EVTYPE <- factor(dmg_tot$EVTYPE, levels = dmg_tot$EVTYPE[order(dmg_tot$Total_Econ_Dmg_M, decreasing = TRUE)])
ggplot(dmg_tot[1:25,], aes(EVTYPE, Total_Econ_Dmg_M)) +
theme(axis.text.x=element_text(angle=90,hjust=1)) + geom_col() +
labs(title = "Total Economic Damage by Event Type 1950-2011", x = "Event Type", y = "Total Economic Damage")
Becasue some events are much more frequent than others, the following table can be useful by showing how the different types of events rank based on the average damage per single event across different event types:
dmg_event <- dmg %>% group_by(EVTYPE) %>%
summarise(Avg_Dmg_per_Event_M = round(sum(Dmg_Dollars_M)/n(), digits=1), N = n(), Total_Econ_Dmg_M = round(sum(Dmg_Dollars_M), digits=1)) %>%
arrange(desc(Avg_Dmg_per_Event_M)) %>%
print(n=25)
## # A tibble: 973 x 4
## EVTYPE Avg_Dmg_per_Event_M N Total_Econ_Dmg_M
## <chr> <dbl> <int> <dbl>
## 1 TORNADOES, TSTM WIND, HAIL 1602.5 1 1602.5
## 2 HEAVY RAIN/SEVERE WEATHER 1250.0 2 2500.0
## 3 HURRICANE/TYPHOON 817.2 88 71913.7
## 4 HURRICANE OPAL 354.6 9 3191.8
## 5 STORM SURGE 166.0 261 43323.5
## 6 WILD FIRES 156.0 4 624.1
## 7 EXCESSIVE WETNESS 142.0 1 142.0
## 8 HURRICANE OPAL/HIGH WINDS 110.0 1 110.0
## 9 SEVERE THUNDERSTORM 92.7 13 1205.6
## 10 HURRICANE 84.0 174 14610.2
## 11 HAILSTORM 80.3 3 241.0
## 12 COLD AND WET CONDITIONS 66.0 1 66.0
## 13 WINTER STORM HIGH WINDS 65.0 1 65.0
## 14 RIVER FLOOD 58.7 173 10148.4
## 15 HURRICANE ERIN 56.3 7 394.1
## 16 TYPHOON 54.6 11 601.1
## 17 HURRICANE EMILY 50.0 1 50.0
## 18 DAMAGING FREEZE 45.0 6 270.1
## 19 Early Frost 42.0 1 42.0
## 20 MAJOR FLOOD 35.0 3 105.0
## 21 STORM SURGE/TIDE 31.4 148 4642.0
## 22 River Flooding 26.8 5 134.2
## 23 HIGH WINDS/COLD 23.5 5 117.5
## 24 FLOOD/RAIN/WINDS 18.8 6 112.8
## 25 Damaging Freeze 17.1 2 34.1
## # ... with 948 more rows
The analysis conducted here provides exploratory insights with respect to what types of severe weather events cause most injuries to people, what events are the deadliest, and which events cause most economic damage.
Thus, in the context of the harm to population health, the analysis shows that TORNADOS have caused the greatest number of injuries (91346) between 1950-2011. There were 60652 such events, on average causing 1.5 injuries per event. There are many other more severe event conditions that caused up to 70 injuries per event (such as Heat Wave), but they are relatively rare.
Tornado also leads as the deadliest type of event in terms of the Total Fatalities (5633) followed by Excessive Heat with cumulative 1903 fatalities and Flash Flood accounting for 978 fatalities. In terms of average fatalieies per event, those are not the deadliest events but they are quite frequent. Much less frequent (only one occurence between 1950 and 2011) but more severe in terms of fatalities are TORNADOES, TSTM WIND, HAIL, COLD AND SNOW, and TROPICAL STORM GORDON events.
Finally, Hurricane/Typhoon and Storm Surge lead the ranking of the most severe weather events on the measure of the Total Economic Damage accounting for 71,913.7 and 43,323.5 millions of dollars of total economic damage respectively. These events are quite frequent causing on average 817.2 and 166.0 millions of dollars of damage per event. On the other hand, there are several types of unfrequent events that have more substantial economic consequences at the levels of 1602.5 and 1250.0 millions of dollars per event on average.