Reza Hosseini
Originally published on Sept. 15, 2020
Republished on Feb. 03, 2022

Data repository:
Storm Data [47Mb]
Documentation:
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ


1. SYNOPSIS

I used the storm data from National Oceanic and Atmospheric Administration (NOAA) to investigate the effect of 48 different events on population health, as well as their financial damage. There are 902,297 rows and 37 columns in this data set. I deleted many rows and columns that were unrelated to the research question to make the data more manageable. I used the number of fatalities and injuries to compare the health outcome of each event. Similarly, I used the amount of property and crop damage (in dollors) to evaluate their economic effect. Based on these data, tornados were the most harmful events to public health, and floods incured the highest economic damages.


2. DATA PROCESSING

First of all, I load my required packages:

library(tidyverse)
## Warning: replacing previous import 'vctrs::data_frame' by 'tibble::data_frame'
## when loading 'dplyr'
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ dplyr   1.0.0
## ✓ tidyr   1.1.0     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()


2.1 Reading the data

I download and unzip the data repository:

URL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

download.file(URL, destfile = "StormData.csv.bz2", method = "curl")

Then, I read all the columns of the data set as character type (to prevent errors happening as a result of automatic column type assignment):

StormData <- read_csv("StormData.csv.bz2",
                       col_types = cols(.default = "c"))

The respective figures for the row and column number of the StormData data set are:

dim(StormData)
## [1] 902297     37


2.2 Subsetting the data set

As we know from the documentation, the NOAA (National Oceanic and Atmospheric Administration) records 48 different storm events. As the goal of this report is to compare the health and economic damage of different events, it is important to choose a time period in which all 48 events have been recorded. Our data set contains records from 1950 to 2011. However, according to NOAA, it has been only since 1996 that all 48 events were recorded. As a result, I delete the records before 1996 to make the analysis of our data set more convenient:

library(lubridate)

StormData$BGN_DATE <- mdy_hms(StormData$BGN_DATE)

StormData <- StormData %>% 
              filter(year(BGN_DATE) >= 1996)

Lets see how many rows we have now:

nrow(StormData)
## [1] 653530

So the number of rows are now reduced from 902,297 to 653,530.
Next, I select only the seven (out of 37) columns that I’m going to work with, and convert the numeric columns back using as.numeric():

subData <- StormData %>% 
            select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP,
                   CROPDMG, CROPDMGEXP) %>% 
              mutate(across(.cols = c(FATALITIES, INJURIES, PROPDMG, CROPDMG),
                            .fns = as.numeric))

These seven columns are:

column name description
EVTYPE Event type
FATALITIES
INJURIES
PROPDMG Property damage in dollars
PROPDMGEXP The exponent part of the amount of property damage
CROPDMG Crop damage in dollars
CROPDMGEXP The exponent part of the amount of crop damage

I’m going to use FATALITIES and INJURIES to evaluate the health damage of different storm events, and PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP to evaluate their economic effects.

I delete the original data set to free up some RAM:

rm(StormData)

Now, as we are interested in the events that caused the highest health and economic damage, I want to see how many rows have zero values for FATALITIES, INJURIES, PROPDMG, and CROPDMG, all at the same time:

subData %>% 
  filter(FATALITIES == 0 &
         INJURIES == 0 &
         PROPDMG == 0 &
         CROPDMG == 0 ) %>% 
    count()
## # A tibble: 1 x 1
##        n
##    <int>
## 1 452212

As you can see, 452,212 out of 653,530 rows have zero values for all of the four mentioned variables, and so it is safe to delete them and make our data set more manageable:

subData <- subData %>% 
            filter(FATALITIES != 0 |
                   INJURIES != 0 |
                   PROPDMG != 0 |
                   CROPDMG != 0 )


2.3 Total prop. and crop damage

Each of property damage and crop damage is represented by two columns in the data set. The first for each of them are PROPDMG and CROPDMG, respectively, which are in dollars. These should be multiplied by 10^(exponent) to calculate the total property or crop damage, where exponents are in the PROPDMGEXP and CROPDMGEXP columns.
So, first, I determine what values PROPDMGEXP and CROPDMGEXP can have, after all that subsetting that I did:

union(subData$PROPDMGEXP, subData$CROPDMGEXP)
## [1] "K" NA  "M" "B"

According to the documentation, we know that:

  • NA = 10^0 = 1
  • K = 10^3 = 1,000
  • M = 10^6 = 1,000,000
  • B = 10^9 = 1,000,000,000

Number of missing values in each column of our data set:

sapply(subData, function(x){sum(is.na(x))})
##     EVTYPE FATALITIES   INJURIES    PROPDMG PROPDMGEXP    CROPDMG CROPDMGEXP 
##          0          0          0          0       8448          0     102767

So I replace the PROPDMGEXP and CROPDMGEXP columns with their numeric amounts:

subData <- subData %>% 
              replace_na(list(PROPDMGEXP = "1", CROPDMGEXP = "1"))

subData <- subData %>% 
              mutate(PROPDMGEXP = str_replace(PROPDMGEXP, "K", "1000")) %>% 
                mutate(CROPDMGEXP = str_replace(CROPDMGEXP, "K", "1000"))

subData <- subData %>% 
              mutate(PROPDMGEXP = str_replace(PROPDMGEXP, "M", "1000000")) %>% 
                mutate(CROPDMGEXP = str_replace(CROPDMGEXP, "M", "1000000"))

subData <- subData %>% 
              mutate(PROPDMGEXP = str_replace(PROPDMGEXP, "B", "1000000000")) %>% 
                mutate(CROPDMGEXP = str_replace(CROPDMGEXP, "B", "1000000000"))

I, then, convert these two columns to numeric type:

subData <- subData %>% 
              mutate(PROPDMGEXP = as.numeric(PROPDMGEXP)) %>% 
                mutate(CROPDMGEXP = as.numeric(CROPDMGEXP))

Now, I can calculate the true property and crop damage by a simple multiplication, and then I would get rid of the unnecessary columns:

subData <- subData %>% 
              mutate(propDmgMerge = PROPDMG * PROPDMGEXP) %>% 
                mutate(cropDmgMerge = CROPDMG * CROPDMGEXP) %>% 
                  select(-c(PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP))

I also add two other columns, one for the sum of FATALITIES and INJURIES, and the other one for the sum of propDmgMerge and cropDmgMerge:

subData <- subData %>% 
              mutate(FatalInjSum = FATALITIES + INJURIES) %>% 
                mutate(PropCropSum = propDmgMerge + cropDmgMerge) %>% 
                  select(EVTYPE,
                         FATALITIES, INJURIES, FatalInjSum,
                         propDmgMerge, cropDmgMerge, PropCropSum)

Our data set looks like this so far:

subData
## # A tibble: 201,318 x 7
##    EVTYPE  FATALITIES INJURIES FatalInjSum propDmgMerge cropDmgMerge PropCropSum
##    <chr>        <dbl>    <dbl>       <dbl>        <dbl>        <dbl>       <dbl>
##  1 WINTER…          0        0           0       380000        38000      418000
##  2 TORNADO          0        0           0       100000            0      100000
##  3 TSTM W…          0        0           0         3000            0        3000
##  4 TSTM W…          0        0           0         5000            0        5000
##  5 TSTM W…          0        0           0         2000            0        2000
##  6 HIGH W…          0        0           0       400000            0      400000
##  7 TSTM W…          0        0           0        12000            0       12000
##  8 TSTM W…          0        0           0         8000            0        8000
##  9 TSTM W…          0        0           0        12000            0       12000
## 10 FLASH …          0        0           0        75000            0       75000
## # … with 201,308 more rows


2.4 Cleaning the events

First, I put the unique lower-cased values of EVTYPE into a new character vector named DataEventType. The length of this object is 183:

DataEventType <- sort(unique(tolower(subData$EVTYPE)))

length(DataEventType)
## [1] 183

I have omitted many EVTYPEs so far for reaching to 183 unique values. I tried different methods for categorizing these 183 event types into the original 48 introduced by the NOAA, but none of them could be very accurate. So I decided to code these 183 events manually into their 48 original groups, and make a matching table from it:

NoaaEventType <- c(
 "1" = "Frost/Freeze",    "2" = "Astronomical Low Tide",     "3" = "Astronomical Low Tide",
 "4" = "Avalanche",       "5" = "Coastal Flood",             "6" = "Ice Storm",
 "7" = "Blizzard",        "8" = "Dust Storm",                "9" = "Heavy Snow",
 "10" = "Wildfire",       "11" = "Coastal Flood",            "12" = "Coastal Flood",
 "13" = "Coastal Flood",  "14" = "Coastal Flood",            "15" = "Coastal Flood",
 "16" = "Marine Thunderstorm Wind","17" = "Marine Thunderstorm Wind","18" = "Cold/Wind Chill",
 "19" = "Cold/Wind Chill",       "20" = "Cold/Wind Chill",      "21" = "Cold/Wind Chill",
 "22" = "Cold/Wind Chill",       "23" = "Flood",                "24" = "Frost/Freeze",
 "25" = "Dense Fog",             "26" = "Dense Smoke",          "27" = "Thunderstorm Wind",
 "28" = "Drought",               "29" = "Flood",                "30" = "Thunderstorm Wind",
 "31" = "Dust Devil",            "32" = "Dust Storm",           "33" = "Frost/Freeze",
 "34" = "Coastal Flood",         "35" = "Excessive Heat",       "36" = "Heavy Snow",
 "37" = "Cold/Wind Chill", "38" = "Extreme Cold/Wind Chill", "39" = "Extreme Cold/Wind Chill",
 "40" = "Extreme Cold/Wind Chill", "41" = "Heavy Snow",         "42" = "Flash Flood",
 "43" = "Flash Flood",           "44" = "Flood",                "45" = "Flash Flood",
 "46" = "Dense Fog",             "47" = "Frost/Freeze",         "48" = "Sleet",
 "49" = "Freezing Fog",          "50" = "Sleet",                "51" = "Sleet",
 "52" = "Frost/Freeze",          "53" = "Frost/Freeze",         "54" = "Funnel Cloud",
 "55" = "Frost/Freeze",          "56" = "Strong Wind",          "57" = "Strong Wind",
 "58" = "Strong Wind",           "59" = "Heavy Rain",           "60" = "Strong Wind",
 "61" = "Strong Wind",           "62" = "Hail",                 "63" = "Frost/Freeze",
 "64" = "High Surf",             "65" = "Heat",                 "66" = "Heat",
 "67" = "Heavy Rain",            "68" = "Heavy Rain",           "69" = "High Surf",
 "70" = "Heavy Snow",            "71" = "Heavy Snow",           "72" = "High Surf",
 "73" = "High Surf",             "74" = "High Surf",            "75" = "High Surf",
 "76" = "High Surf",             "77" = "High Surf",            "78" = "High Surf",
 "79" = "High Surf",             "80" = "High Wind",            "81" = "High Wind",
 "82" = "High Wind",             "83" = "Hurricane (Typhoon)",  "84" = "Hurricane (Typhoon)",
 "85" = "Hurricane (Typhoon)",   "86" = "Heat",                 "87" = "Cold/Wind Chill",
 "88" = "Flood",                 "89" = "Winter Weather",       "90" = "Winter Weather",
 "91" = "Ice Storm",             "92" = "Winter Weather",       "93" = "Lake-Effect Snow",
 "94" = "Lake-Effect Snow",      "95" = "Lakeshore Flood",      "96" = "Debris Flow",
 "97" = "Debris Flow",           "98" = "Debris Flow",          "99" = "Tornado",
 "100" = "Heavy Snow",           "101" = "Sleet",               "102" = "Winter Storm",
 "103" = "Winter Storm",         "104" = "Lightning",           "105" = "Marine Strong Wind",
 "106" = "Marine Hail",          "107" = "Marine High Wind",    "108" = "Marine Strong Wind",
 "109"="Marine Thunderstorm Wind","110"="Marine Thunderstorm Wind","111"="Thunderstorm Wind",
 "112" = "Heavy Rain",           "113" = "Heavy Rain",          "114" = "Debris Flow",
 "115" = "Debris Flow",          "116" = "Debris Flow",         "117" = "Strong Wind",
 "118" = "High Wind",            "119" = "Strong Wind",         "120" = "Other",
 "121" = "Heavy Rain",           "122" = "Heavy Rain",          "123" = "Excessive Heat",
 "124" = "Rip Current",          "125" = "Rip Current",         "126" = "Flood",
 "127" = "Flood",                "128" = "Debris Flow",         "129" = "High Surf",
 "130" = "High Surf",            "131" = "High Surf",           "132" = "Seiche",
 "133" = "Hail",                 "134" = "Heavy Snow",          "135" = "Heavy Snow",
 "136" = "Heavy Snow",           "137" = "Heavy Snow",          "138" = "Storm Surge/Tide",
 "139" = "Storm Surge/Tide",     "140" = "Strong Wind",         "141" = "Strong Wind",
 "142" = "Thunderstorm Wind",    "143" = "Thunderstorm Wind",   "144" = "Thunderstorm Wind",
 "145" = "Coastal Flood",        "146" = "Tornado",             "147" = "Heavy Rain",
 "148" = "Hurricane (Typhoon)",  "149" = "Tropical Storm",      "150" = "Thunderstorm Wind",
 "151" = "Thunderstorm Wind",    "152" = "Thunderstorm Wind",   "153" = "Thunderstorm Wind",
 "154" = "Thunderstorm Wind",    "155" = "Thunderstorm Wind",   "156" = "Thunderstorm Wind",
 "157" = "Thunderstorm Wind",    "158" = "Thunderstorm Wind",   "159" = "Thunderstorm Wind",
 "160" = "Thunderstorm Wind",    "161" = "Tsunami",             "162" = "Hurricane (Typhoon)",
 "163" = "Cold/Wind Chill",      "164" = "Cold/Wind Chill",     "165" = "Heat",
 "166" = "Heavy Rain",           "167" = "Flood",               "168" = "Volcanic Ash",
 "169" = "Heat",                 "170" = "Waterspout",          "171" = "Thunderstorm Wind",
 "172" = "Tornado",              "173" = "Wildfire",            "174" = "Wildfire",
 "175" = "Strong Wind",          "176" = "Marine Strong Wind",  "177" = "Strong Wind",
 "178" = "Strong Wind",          "179" = "Winter Storm",        "180" = "Winter Weather",
 "181" = "Winter Weather",       "182" = "Winter Weather",      "183" = "Winter Weather")

EventTable <- tibble(1:183, DataEventType, NoaaEventType)

My matching table looks like this (scroll to see all 183 rows):

library(kableExtra)
kbl(EventTable,
    col.names = c("Row", "Event types in dataset", "Event types from NOAA")) %>% 
  kable_paper() %>% 
    scroll_box(width = "500px", height = "400px") %>% 
      kable_material_dark(c("striped", "hover"))
Row Event types in dataset Event types from NOAA
1 agricultural freeze Frost/Freeze
2 astronomical high tide Astronomical Low Tide
3 astronomical low tide Astronomical Low Tide
4 avalanche Avalanche
5 beach erosion Coastal Flood
6 black ice Ice Storm
7 blizzard Blizzard
8 blowing dust Dust Storm
9 blowing snow Heavy Snow
10 brush fire Wildfire
11 coastal flooding/erosion Coastal Flood
12 coastal erosion Coastal Flood
13 coastal flood Coastal Flood
14 coastal flooding Coastal Flood
15 coastal flooding/erosion Coastal Flood
16 coastal storm Marine Thunderstorm Wind
17 coastalstorm Marine Thunderstorm Wind
18 cold Cold/Wind Chill
19 cold and snow Cold/Wind Chill
20 cold temperature Cold/Wind Chill
21 cold weather Cold/Wind Chill
22 cold/wind chill Cold/Wind Chill
23 dam break Flood
24 damaging freeze Frost/Freeze
25 dense fog Dense Fog
26 dense smoke Dense Smoke
27 downburst Thunderstorm Wind
28 drought Drought
29 drowning Flood
30 dry microburst Thunderstorm Wind
31 dust devil Dust Devil
32 dust storm Dust Storm
33 early frost Frost/Freeze
34 erosion/cstl flood Coastal Flood
35 excessive heat Excessive Heat
36 excessive snow Heavy Snow
37 extended cold Cold/Wind Chill
38 extreme cold Extreme Cold/Wind Chill
39 extreme cold/wind chill Extreme Cold/Wind Chill
40 extreme windchill Extreme Cold/Wind Chill
41 falling snow/ice Heavy Snow
42 flash flood Flash Flood
43 flash flood/flood Flash Flood
44 flood Flood
45 flood/flash/flood Flash Flood
46 fog Dense Fog
47 freeze Frost/Freeze
48 freezing drizzle Sleet
49 freezing fog Freezing Fog
50 freezing rain Sleet
51 freezing spray Sleet
52 frost Frost/Freeze
53 frost/freeze Frost/Freeze
54 funnel cloud Funnel Cloud
55 glaze Frost/Freeze
56 gradient wind Strong Wind
57 gusty wind Strong Wind
58 gusty wind/hail Strong Wind
59 gusty wind/hvy rain Heavy Rain
60 gusty wind/rain Strong Wind
61 gusty winds Strong Wind
62 hail Hail
63 hard freeze Frost/Freeze
64 hazardous surf High Surf
65 heat Heat
66 heat wave Heat
67 heavy rain Heavy Rain
68 heavy rain/high surf Heavy Rain
69 heavy seas High Surf
70 heavy snow Heavy Snow
71 heavy snow shower Heavy Snow
72 heavy surf High Surf
73 heavy surf and wind High Surf
74 heavy surf/high surf High Surf
75 high seas High Surf
76 high surf High Surf
77 high surf advisory High Surf
78 high swells High Surf
79 high water High Surf
80 high wind High Wind
81 high wind (g40) High Wind
82 high winds High Wind
83 hurricane Hurricane (Typhoon)
84 hurricane edouard Hurricane (Typhoon)
85 hurricane/typhoon Hurricane (Typhoon)
86 hyperthermia/exposure Heat
87 hypothermia/exposure Cold/Wind Chill
88 ice jam flood (minor Flood
89 ice on road Winter Weather
90 ice roads Winter Weather
91 ice storm Ice Storm
92 icy roads Winter Weather
93 lake effect snow Lake-Effect Snow
94 lake-effect snow Lake-Effect Snow
95 lakeshore flood Lakeshore Flood
96 landslide Debris Flow
97 landslides Debris Flow
98 landslump Debris Flow
99 landspout Tornado
100 late season snow Heavy Snow
101 light freezing rain Sleet
102 light snow Winter Storm
103 light snowfall Winter Storm
104 lightning Lightning
105 marine accident Marine Strong Wind
106 marine hail Marine Hail
107 marine high wind Marine High Wind
108 marine strong wind Marine Strong Wind
109 marine thunderstorm wind Marine Thunderstorm Wind
110 marine tstm wind Marine Thunderstorm Wind
111 microburst Thunderstorm Wind
112 mixed precip Heavy Rain
113 mixed precipitation Heavy Rain
114 mud slide Debris Flow
115 mudslide Debris Flow
116 mudslides Debris Flow
117 non tstm wind Strong Wind
118 non-severe wind damage High Wind
119 non-tstm wind Strong Wind
120 other Other
121 rain Heavy Rain
122 rain/snow Heavy Rain
123 record heat Excessive Heat
124 rip current Rip Current
125 rip currents Rip Current
126 river flood Flood
127 river flooding Flood
128 rock slide Debris Flow
129 rogue wave High Surf
130 rough seas High Surf
131 rough surf High Surf
132 seiche Seiche
133 small hail Hail
134 snow Heavy Snow
135 snow and ice Heavy Snow
136 snow squall Heavy Snow
137 snow squalls Heavy Snow
138 storm surge Storm Surge/Tide
139 storm surge/tide Storm Surge/Tide
140 strong wind Strong Wind
141 strong winds Strong Wind
142 thunderstorm Thunderstorm Wind
143 thunderstorm wind Thunderstorm Wind
144 thunderstorm wind (g40) Thunderstorm Wind
145 tidal flooding Coastal Flood
146 tornado Tornado
147 torrential rainfall Heavy Rain
148 tropical depression Hurricane (Typhoon)
149 tropical storm Tropical Storm
150 tstm wind Thunderstorm Wind
151 tstm wind (g45) Thunderstorm Wind
152 tstm wind (41) Thunderstorm Wind
153 tstm wind (g35) Thunderstorm Wind
154 tstm wind (g40) Thunderstorm Wind
155 tstm wind (g45) Thunderstorm Wind
156 tstm wind 40 Thunderstorm Wind
157 tstm wind 45 Thunderstorm Wind
158 tstm wind and lightning Thunderstorm Wind
159 tstm wind g45 Thunderstorm Wind
160 tstm wind/hail Thunderstorm Wind
161 tsunami Tsunami
162 typhoon Hurricane (Typhoon)
163 unseasonable cold Cold/Wind Chill
164 unseasonably cold Cold/Wind Chill
165 unseasonably warm Heat
166 unseasonal rain Heavy Rain
167 urban/sml stream fld Flood
168 volcanic ash Volcanic Ash
169 warm weather Heat
170 waterspout Waterspout
171 wet microburst Thunderstorm Wind
172 whirlwind Tornado
173 wild/forest fire Wildfire
174 wildfire Wildfire
175 wind Strong Wind
176 wind and wave Marine Strong Wind
177 wind damage Strong Wind
178 winds Strong Wind
179 winter storm Winter Storm
180 winter weather Winter Weather
181 winter weather mix Winter Weather
182 winter weather/mix Winter Weather
183 wintry mix Winter Weather

Then, I replaced all EVTYPEs (201,318 rows) in my subsetted data set with their equivalent NOAA type (one of those original 48 categories), by using the matching table:

index <- match(tolower(subData$EVTYPE), DataEventType)
subData <- subData %>% 
              mutate(EVTYPE = EventTable$NoaaEventType[index])

For accessing the population heath harm and economic damage of each event type, I calculate the total amount of fatalities, injuries, fatalities and injuries ensemble, as well as property damage, crop damage, and property and crop damage ensemble, in the period of 1996 to 2011:

subData <- subData %>% 
              group_by(EVTYPE) %>% 
                summarize(FATALITIES = sum(FATALITIES),
                          INJURIES = sum(INJURIES),
                          FatalInjSum = sum(FatalInjSum),
                          propDmgMerge = sum(propDmgMerge),
                          cropDmgMerge = sum(cropDmgMerge),
                          PropCropSum = sum(PropCropSum))

The final tidied data set looks like this:

subData
## # A tibble: 48 x 7
##    EVTYPE  FATALITIES INJURIES FatalInjSum propDmgMerge cropDmgMerge PropCropSum
##    <chr>        <dbl>    <dbl>       <dbl>        <dbl>        <dbl>       <dbl>
##  1 Astron…          0        0           0      9745000            0     9745000
##  2 Avalan…        223      156         379      3711800            0     3711800
##  3 Blizza…         70      385         455    525658950      7060000   532718950
##  4 Coasta…          6        8          14    407318560            0   407318560
##  5 Cold/W…        139       24         163      2644000     30742500    33386500
##  6 Debris…         43       55          98    326628100     20017000   346645100
##  7 Dense …         69      855         924     20464500            0    20464500
##  8 Dense …          0        0           0       100000            0      100000
##  9 Drought          0        4           4   1046101000  13367566000 14413667000
## 10 Dust D…          2       39          41       663630            0      663630
## # … with 38 more rows

3. RESULTS

Now, I investigate the health harm and economic damage of different storm events, based on their respective total casualties and property/crop damage:

3.1 Population health damage

First, I reorder the data based on the sum of fatalities and injuries for each event in the descending order. Then, I choose the top ten:

health <- subData %>% 
            arrange(desc(FatalInjSum)) %>% 
              slice(1:10)

And here is its plot:

ggplot(health, aes(x = reorder(EVTYPE, -FatalInjSum), y = FatalInjSum)) +
  geom_col(alpha = 0.75, color = "black") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
  ggtitle("Most harmful storm events to population health (1996 - 2011)") +
  xlab("Top ten storm events in terms of total fatalities and injuries") +
  ylab("Number of people dead or injured (1996 - 2011)") +
  geom_line(aes(x = reorder(EVTYPE, -FatalInjSum),
                y = FATALITIES, color = "Fatalities"),
            group = 1) +
  geom_line(aes(x = reorder(EVTYPE, -FatalInjSum),
                y = INJURIES, color = "Injuries"),
            group = 1) +
  scale_color_manual(name = "Legend", values = c("red", "blue"))


Based on the plot, Tornado is by far the most harmful storm event for population health.


3.2 Economic damage

Here, I do the same as above, but this time, I arrange the data based on the sum of property and crop damage for each event in the descending order. I also choose the top ten. Additionally, I divide the three financial columns by 10^9 to convert them into billion dollars:

economic <- subData %>% 
              arrange(desc(PropCropSum)) %>% 
                slice(1:10) %>% 
                  mutate(propDmgMerge = propDmgMerge / 10^9,
                         cropDmgMerge = cropDmgMerge / 10^9,
                         PropCropSum = PropCropSum / 10^9)

and this is its plot:

require(scales)
ggplot(economic, aes(x = reorder(EVTYPE, -PropCropSum), y = PropCropSum)) +
  geom_col(alpha = 0.5, color = "black") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
  ggtitle("Storm events with highest economic damage (1996 - 2011)") +
  xlab("Top ten storm events in terms of total property and crop damage") +
  ylab("Total cost of damage in billion dollors (1996 - 2011)") +
  geom_line(aes(x = reorder(EVTYPE, -PropCropSum),
                y = propDmgMerge, color = "Property damage"),
            group = 1) +
  geom_line(aes(x = reorder(EVTYPE, -PropCropSum),
                y = cropDmgMerge, color = "Crop damage"),
            group = 1) +
  scale_color_manual(name = "Legend", values = c("purple", "lightseagreen")) +
  scale_y_continuous(labels = comma)


Based on the plot, Flood incurs the highest economic damage.



Thank you for reading my report.