Reza Hosseini
Originally published on Sept. 15, 2020
Republished on Feb. 03, 2022
Data repository:
Storm Data [47Mb]
Documentation:
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
I used the storm data from National Oceanic and Atmospheric Administration (NOAA) to investigate the effect of 48 different events on population health, as well as their financial damage. There are 902,297 rows and 37 columns in this data set. I deleted many rows and columns that were unrelated to the research question to make the data more manageable. I used the number of fatalities and injuries to compare the health outcome of each event. Similarly, I used the amount of property and crop damage (in dollors) to evaluate their economic effect. Based on these data, tornados were the most harmful events to public health, and floods incured the highest economic damages.
First of all, I load my required packages:
library(tidyverse)
## Warning: replacing previous import 'vctrs::data_frame' by 'tibble::data_frame'
## when loading 'dplyr'
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.0.3 ✓ dplyr 1.0.0
## ✓ tidyr 1.1.0 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
I download and unzip the data repository:
URL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(URL, destfile = "StormData.csv.bz2", method = "curl")
Then, I read all the columns of the data set as character type (to prevent errors happening as a result of automatic column type assignment):
StormData <- read_csv("StormData.csv.bz2",
col_types = cols(.default = "c"))
The respective figures for the row and column number of the StormData data set are:
dim(StormData)
## [1] 902297 37
As we know from the documentation, the NOAA (National Oceanic and Atmospheric Administration) records 48 different storm events. As the goal of this report is to compare the health and economic damage of different events, it is important to choose a time period in which all 48 events have been recorded. Our data set contains records from 1950 to 2011. However, according to NOAA, it has been only since 1996 that all 48 events were recorded. As a result, I delete the records before 1996 to make the analysis of our data set more convenient:
library(lubridate)
StormData$BGN_DATE <- mdy_hms(StormData$BGN_DATE)
StormData <- StormData %>%
filter(year(BGN_DATE) >= 1996)
Lets see how many rows we have now:
nrow(StormData)
## [1] 653530
So the number of rows are now reduced from 902,297 to 653,530.
Next, I select only the seven (out of 37) columns that I’m going to work with, and convert the numeric columns back using as.numeric():
subData <- StormData %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP,
CROPDMG, CROPDMGEXP) %>%
mutate(across(.cols = c(FATALITIES, INJURIES, PROPDMG, CROPDMG),
.fns = as.numeric))
These seven columns are:
| column name | description |
|---|---|
| EVTYPE | Event type |
| FATALITIES | |
| INJURIES | |
| PROPDMG | Property damage in dollars |
| PROPDMGEXP | The exponent part of the amount of property damage |
| CROPDMG | Crop damage in dollars |
| CROPDMGEXP | The exponent part of the amount of crop damage |
I’m going to use FATALITIES and INJURIES to evaluate the health damage of different storm events, and PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP to evaluate their economic effects.
I delete the original data set to free up some RAM:
rm(StormData)
Now, as we are interested in the events that caused the highest health and economic damage, I want to see how many rows have zero values for FATALITIES, INJURIES, PROPDMG, and CROPDMG, all at the same time:
subData %>%
filter(FATALITIES == 0 &
INJURIES == 0 &
PROPDMG == 0 &
CROPDMG == 0 ) %>%
count()
## # A tibble: 1 x 1
## n
## <int>
## 1 452212
As you can see, 452,212 out of 653,530 rows have zero values for all of the four mentioned variables, and so it is safe to delete them and make our data set more manageable:
subData <- subData %>%
filter(FATALITIES != 0 |
INJURIES != 0 |
PROPDMG != 0 |
CROPDMG != 0 )
Each of property damage and crop damage is represented by two columns in the data set. The first for each of them are PROPDMG and CROPDMG, respectively, which are in dollars. These should be multiplied by 10^(exponent) to calculate the total property or crop damage, where exponents are in the PROPDMGEXP and CROPDMGEXP columns.
So, first, I determine what values PROPDMGEXP and CROPDMGEXP can have, after all that subsetting that I did:
union(subData$PROPDMGEXP, subData$CROPDMGEXP)
## [1] "K" NA "M" "B"
According to the documentation, we know that:
Number of missing values in each column of our data set:
sapply(subData, function(x){sum(is.na(x))})
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 0 0 0 0 8448 0 102767
So I replace the PROPDMGEXP and CROPDMGEXP columns with their numeric amounts:
subData <- subData %>%
replace_na(list(PROPDMGEXP = "1", CROPDMGEXP = "1"))
subData <- subData %>%
mutate(PROPDMGEXP = str_replace(PROPDMGEXP, "K", "1000")) %>%
mutate(CROPDMGEXP = str_replace(CROPDMGEXP, "K", "1000"))
subData <- subData %>%
mutate(PROPDMGEXP = str_replace(PROPDMGEXP, "M", "1000000")) %>%
mutate(CROPDMGEXP = str_replace(CROPDMGEXP, "M", "1000000"))
subData <- subData %>%
mutate(PROPDMGEXP = str_replace(PROPDMGEXP, "B", "1000000000")) %>%
mutate(CROPDMGEXP = str_replace(CROPDMGEXP, "B", "1000000000"))
I, then, convert these two columns to numeric type:
subData <- subData %>%
mutate(PROPDMGEXP = as.numeric(PROPDMGEXP)) %>%
mutate(CROPDMGEXP = as.numeric(CROPDMGEXP))
Now, I can calculate the true property and crop damage by a simple multiplication, and then I would get rid of the unnecessary columns:
subData <- subData %>%
mutate(propDmgMerge = PROPDMG * PROPDMGEXP) %>%
mutate(cropDmgMerge = CROPDMG * CROPDMGEXP) %>%
select(-c(PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP))
I also add two other columns, one for the sum of FATALITIES and INJURIES, and the other one for the sum of propDmgMerge and cropDmgMerge:
subData <- subData %>%
mutate(FatalInjSum = FATALITIES + INJURIES) %>%
mutate(PropCropSum = propDmgMerge + cropDmgMerge) %>%
select(EVTYPE,
FATALITIES, INJURIES, FatalInjSum,
propDmgMerge, cropDmgMerge, PropCropSum)
Our data set looks like this so far:
subData
## # A tibble: 201,318 x 7
## EVTYPE FATALITIES INJURIES FatalInjSum propDmgMerge cropDmgMerge PropCropSum
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 WINTER… 0 0 0 380000 38000 418000
## 2 TORNADO 0 0 0 100000 0 100000
## 3 TSTM W… 0 0 0 3000 0 3000
## 4 TSTM W… 0 0 0 5000 0 5000
## 5 TSTM W… 0 0 0 2000 0 2000
## 6 HIGH W… 0 0 0 400000 0 400000
## 7 TSTM W… 0 0 0 12000 0 12000
## 8 TSTM W… 0 0 0 8000 0 8000
## 9 TSTM W… 0 0 0 12000 0 12000
## 10 FLASH … 0 0 0 75000 0 75000
## # … with 201,308 more rows
First, I put the unique lower-cased values of EVTYPE into a new character vector named DataEventType. The length of this object is 183:
DataEventType <- sort(unique(tolower(subData$EVTYPE)))
length(DataEventType)
## [1] 183
I have omitted many EVTYPEs so far for reaching to 183 unique values. I tried different methods for categorizing these 183 event types into the original 48 introduced by the NOAA, but none of them could be very accurate. So I decided to code these 183 events manually into their 48 original groups, and make a matching table from it:
NoaaEventType <- c(
"1" = "Frost/Freeze", "2" = "Astronomical Low Tide", "3" = "Astronomical Low Tide",
"4" = "Avalanche", "5" = "Coastal Flood", "6" = "Ice Storm",
"7" = "Blizzard", "8" = "Dust Storm", "9" = "Heavy Snow",
"10" = "Wildfire", "11" = "Coastal Flood", "12" = "Coastal Flood",
"13" = "Coastal Flood", "14" = "Coastal Flood", "15" = "Coastal Flood",
"16" = "Marine Thunderstorm Wind","17" = "Marine Thunderstorm Wind","18" = "Cold/Wind Chill",
"19" = "Cold/Wind Chill", "20" = "Cold/Wind Chill", "21" = "Cold/Wind Chill",
"22" = "Cold/Wind Chill", "23" = "Flood", "24" = "Frost/Freeze",
"25" = "Dense Fog", "26" = "Dense Smoke", "27" = "Thunderstorm Wind",
"28" = "Drought", "29" = "Flood", "30" = "Thunderstorm Wind",
"31" = "Dust Devil", "32" = "Dust Storm", "33" = "Frost/Freeze",
"34" = "Coastal Flood", "35" = "Excessive Heat", "36" = "Heavy Snow",
"37" = "Cold/Wind Chill", "38" = "Extreme Cold/Wind Chill", "39" = "Extreme Cold/Wind Chill",
"40" = "Extreme Cold/Wind Chill", "41" = "Heavy Snow", "42" = "Flash Flood",
"43" = "Flash Flood", "44" = "Flood", "45" = "Flash Flood",
"46" = "Dense Fog", "47" = "Frost/Freeze", "48" = "Sleet",
"49" = "Freezing Fog", "50" = "Sleet", "51" = "Sleet",
"52" = "Frost/Freeze", "53" = "Frost/Freeze", "54" = "Funnel Cloud",
"55" = "Frost/Freeze", "56" = "Strong Wind", "57" = "Strong Wind",
"58" = "Strong Wind", "59" = "Heavy Rain", "60" = "Strong Wind",
"61" = "Strong Wind", "62" = "Hail", "63" = "Frost/Freeze",
"64" = "High Surf", "65" = "Heat", "66" = "Heat",
"67" = "Heavy Rain", "68" = "Heavy Rain", "69" = "High Surf",
"70" = "Heavy Snow", "71" = "Heavy Snow", "72" = "High Surf",
"73" = "High Surf", "74" = "High Surf", "75" = "High Surf",
"76" = "High Surf", "77" = "High Surf", "78" = "High Surf",
"79" = "High Surf", "80" = "High Wind", "81" = "High Wind",
"82" = "High Wind", "83" = "Hurricane (Typhoon)", "84" = "Hurricane (Typhoon)",
"85" = "Hurricane (Typhoon)", "86" = "Heat", "87" = "Cold/Wind Chill",
"88" = "Flood", "89" = "Winter Weather", "90" = "Winter Weather",
"91" = "Ice Storm", "92" = "Winter Weather", "93" = "Lake-Effect Snow",
"94" = "Lake-Effect Snow", "95" = "Lakeshore Flood", "96" = "Debris Flow",
"97" = "Debris Flow", "98" = "Debris Flow", "99" = "Tornado",
"100" = "Heavy Snow", "101" = "Sleet", "102" = "Winter Storm",
"103" = "Winter Storm", "104" = "Lightning", "105" = "Marine Strong Wind",
"106" = "Marine Hail", "107" = "Marine High Wind", "108" = "Marine Strong Wind",
"109"="Marine Thunderstorm Wind","110"="Marine Thunderstorm Wind","111"="Thunderstorm Wind",
"112" = "Heavy Rain", "113" = "Heavy Rain", "114" = "Debris Flow",
"115" = "Debris Flow", "116" = "Debris Flow", "117" = "Strong Wind",
"118" = "High Wind", "119" = "Strong Wind", "120" = "Other",
"121" = "Heavy Rain", "122" = "Heavy Rain", "123" = "Excessive Heat",
"124" = "Rip Current", "125" = "Rip Current", "126" = "Flood",
"127" = "Flood", "128" = "Debris Flow", "129" = "High Surf",
"130" = "High Surf", "131" = "High Surf", "132" = "Seiche",
"133" = "Hail", "134" = "Heavy Snow", "135" = "Heavy Snow",
"136" = "Heavy Snow", "137" = "Heavy Snow", "138" = "Storm Surge/Tide",
"139" = "Storm Surge/Tide", "140" = "Strong Wind", "141" = "Strong Wind",
"142" = "Thunderstorm Wind", "143" = "Thunderstorm Wind", "144" = "Thunderstorm Wind",
"145" = "Coastal Flood", "146" = "Tornado", "147" = "Heavy Rain",
"148" = "Hurricane (Typhoon)", "149" = "Tropical Storm", "150" = "Thunderstorm Wind",
"151" = "Thunderstorm Wind", "152" = "Thunderstorm Wind", "153" = "Thunderstorm Wind",
"154" = "Thunderstorm Wind", "155" = "Thunderstorm Wind", "156" = "Thunderstorm Wind",
"157" = "Thunderstorm Wind", "158" = "Thunderstorm Wind", "159" = "Thunderstorm Wind",
"160" = "Thunderstorm Wind", "161" = "Tsunami", "162" = "Hurricane (Typhoon)",
"163" = "Cold/Wind Chill", "164" = "Cold/Wind Chill", "165" = "Heat",
"166" = "Heavy Rain", "167" = "Flood", "168" = "Volcanic Ash",
"169" = "Heat", "170" = "Waterspout", "171" = "Thunderstorm Wind",
"172" = "Tornado", "173" = "Wildfire", "174" = "Wildfire",
"175" = "Strong Wind", "176" = "Marine Strong Wind", "177" = "Strong Wind",
"178" = "Strong Wind", "179" = "Winter Storm", "180" = "Winter Weather",
"181" = "Winter Weather", "182" = "Winter Weather", "183" = "Winter Weather")
EventTable <- tibble(1:183, DataEventType, NoaaEventType)
My matching table looks like this (scroll to see all 183 rows):
library(kableExtra)
kbl(EventTable,
col.names = c("Row", "Event types in dataset", "Event types from NOAA")) %>%
kable_paper() %>%
scroll_box(width = "500px", height = "400px") %>%
kable_material_dark(c("striped", "hover"))
| Row | Event types in dataset | Event types from NOAA |
|---|---|---|
| 1 | agricultural freeze | Frost/Freeze |
| 2 | astronomical high tide | Astronomical Low Tide |
| 3 | astronomical low tide | Astronomical Low Tide |
| 4 | avalanche | Avalanche |
| 5 | beach erosion | Coastal Flood |
| 6 | black ice | Ice Storm |
| 7 | blizzard | Blizzard |
| 8 | blowing dust | Dust Storm |
| 9 | blowing snow | Heavy Snow |
| 10 | brush fire | Wildfire |
| 11 | coastal flooding/erosion | Coastal Flood |
| 12 | coastal erosion | Coastal Flood |
| 13 | coastal flood | Coastal Flood |
| 14 | coastal flooding | Coastal Flood |
| 15 | coastal flooding/erosion | Coastal Flood |
| 16 | coastal storm | Marine Thunderstorm Wind |
| 17 | coastalstorm | Marine Thunderstorm Wind |
| 18 | cold | Cold/Wind Chill |
| 19 | cold and snow | Cold/Wind Chill |
| 20 | cold temperature | Cold/Wind Chill |
| 21 | cold weather | Cold/Wind Chill |
| 22 | cold/wind chill | Cold/Wind Chill |
| 23 | dam break | Flood |
| 24 | damaging freeze | Frost/Freeze |
| 25 | dense fog | Dense Fog |
| 26 | dense smoke | Dense Smoke |
| 27 | downburst | Thunderstorm Wind |
| 28 | drought | Drought |
| 29 | drowning | Flood |
| 30 | dry microburst | Thunderstorm Wind |
| 31 | dust devil | Dust Devil |
| 32 | dust storm | Dust Storm |
| 33 | early frost | Frost/Freeze |
| 34 | erosion/cstl flood | Coastal Flood |
| 35 | excessive heat | Excessive Heat |
| 36 | excessive snow | Heavy Snow |
| 37 | extended cold | Cold/Wind Chill |
| 38 | extreme cold | Extreme Cold/Wind Chill |
| 39 | extreme cold/wind chill | Extreme Cold/Wind Chill |
| 40 | extreme windchill | Extreme Cold/Wind Chill |
| 41 | falling snow/ice | Heavy Snow |
| 42 | flash flood | Flash Flood |
| 43 | flash flood/flood | Flash Flood |
| 44 | flood | Flood |
| 45 | flood/flash/flood | Flash Flood |
| 46 | fog | Dense Fog |
| 47 | freeze | Frost/Freeze |
| 48 | freezing drizzle | Sleet |
| 49 | freezing fog | Freezing Fog |
| 50 | freezing rain | Sleet |
| 51 | freezing spray | Sleet |
| 52 | frost | Frost/Freeze |
| 53 | frost/freeze | Frost/Freeze |
| 54 | funnel cloud | Funnel Cloud |
| 55 | glaze | Frost/Freeze |
| 56 | gradient wind | Strong Wind |
| 57 | gusty wind | Strong Wind |
| 58 | gusty wind/hail | Strong Wind |
| 59 | gusty wind/hvy rain | Heavy Rain |
| 60 | gusty wind/rain | Strong Wind |
| 61 | gusty winds | Strong Wind |
| 62 | hail | Hail |
| 63 | hard freeze | Frost/Freeze |
| 64 | hazardous surf | High Surf |
| 65 | heat | Heat |
| 66 | heat wave | Heat |
| 67 | heavy rain | Heavy Rain |
| 68 | heavy rain/high surf | Heavy Rain |
| 69 | heavy seas | High Surf |
| 70 | heavy snow | Heavy Snow |
| 71 | heavy snow shower | Heavy Snow |
| 72 | heavy surf | High Surf |
| 73 | heavy surf and wind | High Surf |
| 74 | heavy surf/high surf | High Surf |
| 75 | high seas | High Surf |
| 76 | high surf | High Surf |
| 77 | high surf advisory | High Surf |
| 78 | high swells | High Surf |
| 79 | high water | High Surf |
| 80 | high wind | High Wind |
| 81 | high wind (g40) | High Wind |
| 82 | high winds | High Wind |
| 83 | hurricane | Hurricane (Typhoon) |
| 84 | hurricane edouard | Hurricane (Typhoon) |
| 85 | hurricane/typhoon | Hurricane (Typhoon) |
| 86 | hyperthermia/exposure | Heat |
| 87 | hypothermia/exposure | Cold/Wind Chill |
| 88 | ice jam flood (minor | Flood |
| 89 | ice on road | Winter Weather |
| 90 | ice roads | Winter Weather |
| 91 | ice storm | Ice Storm |
| 92 | icy roads | Winter Weather |
| 93 | lake effect snow | Lake-Effect Snow |
| 94 | lake-effect snow | Lake-Effect Snow |
| 95 | lakeshore flood | Lakeshore Flood |
| 96 | landslide | Debris Flow |
| 97 | landslides | Debris Flow |
| 98 | landslump | Debris Flow |
| 99 | landspout | Tornado |
| 100 | late season snow | Heavy Snow |
| 101 | light freezing rain | Sleet |
| 102 | light snow | Winter Storm |
| 103 | light snowfall | Winter Storm |
| 104 | lightning | Lightning |
| 105 | marine accident | Marine Strong Wind |
| 106 | marine hail | Marine Hail |
| 107 | marine high wind | Marine High Wind |
| 108 | marine strong wind | Marine Strong Wind |
| 109 | marine thunderstorm wind | Marine Thunderstorm Wind |
| 110 | marine tstm wind | Marine Thunderstorm Wind |
| 111 | microburst | Thunderstorm Wind |
| 112 | mixed precip | Heavy Rain |
| 113 | mixed precipitation | Heavy Rain |
| 114 | mud slide | Debris Flow |
| 115 | mudslide | Debris Flow |
| 116 | mudslides | Debris Flow |
| 117 | non tstm wind | Strong Wind |
| 118 | non-severe wind damage | High Wind |
| 119 | non-tstm wind | Strong Wind |
| 120 | other | Other |
| 121 | rain | Heavy Rain |
| 122 | rain/snow | Heavy Rain |
| 123 | record heat | Excessive Heat |
| 124 | rip current | Rip Current |
| 125 | rip currents | Rip Current |
| 126 | river flood | Flood |
| 127 | river flooding | Flood |
| 128 | rock slide | Debris Flow |
| 129 | rogue wave | High Surf |
| 130 | rough seas | High Surf |
| 131 | rough surf | High Surf |
| 132 | seiche | Seiche |
| 133 | small hail | Hail |
| 134 | snow | Heavy Snow |
| 135 | snow and ice | Heavy Snow |
| 136 | snow squall | Heavy Snow |
| 137 | snow squalls | Heavy Snow |
| 138 | storm surge | Storm Surge/Tide |
| 139 | storm surge/tide | Storm Surge/Tide |
| 140 | strong wind | Strong Wind |
| 141 | strong winds | Strong Wind |
| 142 | thunderstorm | Thunderstorm Wind |
| 143 | thunderstorm wind | Thunderstorm Wind |
| 144 | thunderstorm wind (g40) | Thunderstorm Wind |
| 145 | tidal flooding | Coastal Flood |
| 146 | tornado | Tornado |
| 147 | torrential rainfall | Heavy Rain |
| 148 | tropical depression | Hurricane (Typhoon) |
| 149 | tropical storm | Tropical Storm |
| 150 | tstm wind | Thunderstorm Wind |
| 151 | tstm wind (g45) | Thunderstorm Wind |
| 152 | tstm wind (41) | Thunderstorm Wind |
| 153 | tstm wind (g35) | Thunderstorm Wind |
| 154 | tstm wind (g40) | Thunderstorm Wind |
| 155 | tstm wind (g45) | Thunderstorm Wind |
| 156 | tstm wind 40 | Thunderstorm Wind |
| 157 | tstm wind 45 | Thunderstorm Wind |
| 158 | tstm wind and lightning | Thunderstorm Wind |
| 159 | tstm wind g45 | Thunderstorm Wind |
| 160 | tstm wind/hail | Thunderstorm Wind |
| 161 | tsunami | Tsunami |
| 162 | typhoon | Hurricane (Typhoon) |
| 163 | unseasonable cold | Cold/Wind Chill |
| 164 | unseasonably cold | Cold/Wind Chill |
| 165 | unseasonably warm | Heat |
| 166 | unseasonal rain | Heavy Rain |
| 167 | urban/sml stream fld | Flood |
| 168 | volcanic ash | Volcanic Ash |
| 169 | warm weather | Heat |
| 170 | waterspout | Waterspout |
| 171 | wet microburst | Thunderstorm Wind |
| 172 | whirlwind | Tornado |
| 173 | wild/forest fire | Wildfire |
| 174 | wildfire | Wildfire |
| 175 | wind | Strong Wind |
| 176 | wind and wave | Marine Strong Wind |
| 177 | wind damage | Strong Wind |
| 178 | winds | Strong Wind |
| 179 | winter storm | Winter Storm |
| 180 | winter weather | Winter Weather |
| 181 | winter weather mix | Winter Weather |
| 182 | winter weather/mix | Winter Weather |
| 183 | wintry mix | Winter Weather |
Then, I replaced all EVTYPEs (201,318 rows) in my subsetted data set with their equivalent NOAA type (one of those original 48 categories), by using the matching table:
index <- match(tolower(subData$EVTYPE), DataEventType)
subData <- subData %>%
mutate(EVTYPE = EventTable$NoaaEventType[index])
For accessing the population heath harm and economic damage of each event type, I calculate the total amount of fatalities, injuries, fatalities and injuries ensemble, as well as property damage, crop damage, and property and crop damage ensemble, in the period of 1996 to 2011:
subData <- subData %>%
group_by(EVTYPE) %>%
summarize(FATALITIES = sum(FATALITIES),
INJURIES = sum(INJURIES),
FatalInjSum = sum(FatalInjSum),
propDmgMerge = sum(propDmgMerge),
cropDmgMerge = sum(cropDmgMerge),
PropCropSum = sum(PropCropSum))
The final tidied data set looks like this:
subData
## # A tibble: 48 x 7
## EVTYPE FATALITIES INJURIES FatalInjSum propDmgMerge cropDmgMerge PropCropSum
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Astron… 0 0 0 9745000 0 9745000
## 2 Avalan… 223 156 379 3711800 0 3711800
## 3 Blizza… 70 385 455 525658950 7060000 532718950
## 4 Coasta… 6 8 14 407318560 0 407318560
## 5 Cold/W… 139 24 163 2644000 30742500 33386500
## 6 Debris… 43 55 98 326628100 20017000 346645100
## 7 Dense … 69 855 924 20464500 0 20464500
## 8 Dense … 0 0 0 100000 0 100000
## 9 Drought 0 4 4 1046101000 13367566000 14413667000
## 10 Dust D… 2 39 41 663630 0 663630
## # … with 38 more rows
Now, I investigate the health harm and economic damage of different storm events, based on their respective total casualties and property/crop damage:
First, I reorder the data based on the sum of fatalities and injuries for each event in the descending order. Then, I choose the top ten:
health <- subData %>%
arrange(desc(FatalInjSum)) %>%
slice(1:10)
And here is its plot:
ggplot(health, aes(x = reorder(EVTYPE, -FatalInjSum), y = FatalInjSum)) +
geom_col(alpha = 0.75, color = "black") +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
ggtitle("Most harmful storm events to population health (1996 - 2011)") +
xlab("Top ten storm events in terms of total fatalities and injuries") +
ylab("Number of people dead or injured (1996 - 2011)") +
geom_line(aes(x = reorder(EVTYPE, -FatalInjSum),
y = FATALITIES, color = "Fatalities"),
group = 1) +
geom_line(aes(x = reorder(EVTYPE, -FatalInjSum),
y = INJURIES, color = "Injuries"),
group = 1) +
scale_color_manual(name = "Legend", values = c("red", "blue"))
Based on the plot, Tornado is by far the most harmful storm event for population health.
Here, I do the same as above, but this time, I arrange the data based on the sum of property and crop damage for each event in the descending order. I also choose the top ten. Additionally, I divide the three financial columns by 10^9 to convert them into billion dollars:
economic <- subData %>%
arrange(desc(PropCropSum)) %>%
slice(1:10) %>%
mutate(propDmgMerge = propDmgMerge / 10^9,
cropDmgMerge = cropDmgMerge / 10^9,
PropCropSum = PropCropSum / 10^9)
and this is its plot:
require(scales)
ggplot(economic, aes(x = reorder(EVTYPE, -PropCropSum), y = PropCropSum)) +
geom_col(alpha = 0.5, color = "black") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
ggtitle("Storm events with highest economic damage (1996 - 2011)") +
xlab("Top ten storm events in terms of total property and crop damage") +
ylab("Total cost of damage in billion dollors (1996 - 2011)") +
geom_line(aes(x = reorder(EVTYPE, -PropCropSum),
y = propDmgMerge, color = "Property damage"),
group = 1) +
geom_line(aes(x = reorder(EVTYPE, -PropCropSum),
y = cropDmgMerge, color = "Crop damage"),
group = 1) +
scale_color_manual(name = "Legend", values = c("purple", "lightseagreen")) +
scale_y_continuous(labels = comma)
Based on the plot, Flood incurs the highest economic damage.
Thank you for reading my report.