This is an R Markdown document, a simple formatting syntax for authoring HTML, PDF, and MS Word documents, for more details on using R Markdown. In every field, there is now a computed field of: Knitr & RPubs allow for literate statistical programming and reproducible research.
This data set is interesting to analyse as it could be used as a reference to control for and/or compare current severe weather, due to climate change, to past events.
This is a COURSERA Assignment submission for the Reproducible Research course, the data set was given as a downloadable zip file. See the National Weather Service Storm Data Documentation, to find out how the variables are constructed/defined.
This report addresses the impact of past severe weather on public health and the economy of the United States of America.
The total death toll stood at 15 145 for deaths caused by severe weather, during the period 1950 - 2011, and 140528 injuries. Results show that Tonardos, Floods and Flash Floods caused both high casualties (97 023, 7 422, 2 804) and damage to the economy of (59, 161, 18) Trillion USD respectively during the same period, other extreme events like Excessive Heat, Lightning and Heat caused less economic damage of around (505, 946, 505) Million USD but lead to (8 748, 6 049, 3 623) casualties respectively.
The data for this analysis originates from the National Oceanic and Atmospheric Administration (NOAA) database. The NOAA labelling protocol was respected during the storm event by type classification.
It was assumed that severe weather which caused harm to population health, could be calculated by summing total injuries and total fatalities, listed by storm event type and those storm events having economic consequences could be calculated by summing the total damage caused to crops and property.
According to the National Climatic Data Center Storm Events FAQ lightning data contained within Storm Data are list only those events that result in fatality, injury and/or property and crop damage and that Tornadoes may contain multiple segments. The raw data was uploaded, cleaned and analyzed with R4.0 in a windows10 (64bit) environment. R is a free open source programming tool, please refer to the CRAN documents to download and install R4.0 in a windows 10.64bit environment.
The information prior to 2001 up until 1950 was sparse, but maintained. Data cleaning was done with the gsub() function to do global replacements of error inputs of the storm event type. Data manipulation was done with dplyr. The cleaned data was then queried to find summary statistics, viewed as a data frame and plotted to visualise. All code and intermediate results are hosted at this Github repository feel free to fork and download this project to collaborate, and/or contact me for further information.
The code to reproduce these figures are in the data analysis section.
Storm Impacts
Initial exploration shows that there are a few events which cause most of the harm and damage. From 1950 to 2011, Tornadoes were the most deadly and caused significant economic damage.
Impact on Human Health
From this graph we can see that Tornadoes caused over 90 Thousand injuries, about 10 times more than the next dangerous storm events, Excessive Heat, Thunderstorm Wind and Flood, which stand around 10 Thousand injuries each.
Impact to the Economy
Once again the scale of impact of Tonordos is much greater than other NOAA storm events. Flooding and Thunderstorms also cause considerable damage.
#setwd("./OneDrive/R/ReproducibleResearch/")
library(knitr)
library(R.utils)
## Warning: package 'R.utils' was built under R version 4.0.3
## Loading required package: R.oo
## Warning: package 'R.oo' was built under R version 4.0.3
## Loading required package: R.methodsS3
## Warning: package 'R.methodsS3' was built under R version 4.0.3
## R.methodsS3 v1.8.1 (2020-08-26 16:20:06 UTC) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.24.0 (2020-08-26 16:11:58 UTC) successfully loaded. See ?R.oo for help.
##
## Attaching package: 'R.oo'
## The following object is masked from 'package:R.methodsS3':
##
## throw
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
## The following objects are masked from 'package:base':
##
## attach, detach, load, save
## R.utils v2.10.1 (2020-08-26 22:50:31 UTC) successfully loaded. See ?R.utils for help.
##
## Attaching package: 'R.utils'
## The following object is masked from 'package:utils':
##
## timestamp
## The following objects are masked from 'package:base':
##
## cat, commandArgs, getOption, inherits, isOpen, nullfile, parse,
## warnings
library(xtable)
library(stringdist)
## Warning: package 'stringdist' was built under R version 4.0.3
##
## Attaching package: 'stringdist'
## The following object is masked from 'package:R.utils':
##
## extract
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.4 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0.9000 v forcats 0.5.0
## Warning: package 'tibble' was built under R version 4.0.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x lubridate::as.difftime() masks base::as.difftime()
## x lubridate::date() masks base::date()
## x tidyr::extract() masks stringdist::extract(), R.utils::extract()
## x dplyr::filter() masks stats::filter()
## x lubridate::intersect() masks base::intersect()
## x dplyr::lag() masks stats::lag()
## x lubridate::setdiff() masks base::setdiff()
## x lubridate::union() masks base::union()
library(dplyr)
library(ggplot2)
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
if(!file.exists('stormdata.csv')) {
zipfile <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
download.file(zipfile, 'stormdata.zip')
bunzip2('stormdata.zip', 'stormdata.csv', remove=FALSE)
}
storm_data <- read.csv('stormdata.csv')
The code to reproduce these figures are in the data analysis section.
Tornadoes, Floods and Flash Floods cause both the most fatalities and damage to both the economy, see data analysis below.
| Event.Type | Total.Fatalities | Total.Injuries | Total.Casualties | Crop.Property |
|---|---|---|---|---|
| T R I L L I O N S of Dollars Damages to the Economy | ||||
| Flood | 533 | 6 889 | 7 422 | 161 Trillion |
| Hurricane (Typhoon) | 135 | 1333 | 1468 | 91 Trillion |
| Tornado | 5 659 | 91 364 | 97 023 | 59 Trillion |
| Storm Surge/Tide | 24 | 43 | 67 | 48 Trillion |
| Dense Fog | 80 | 1076 | 1 156 | 23 Trillion |
| Hail | 15 | 1 371 | 1 386 | 19 Trillion |
| Flash Flood | 1 019 | 1 785 | 2 804 | 18 Trillion |
| Drought | 35 | 19 | 54 | 15 Trillion |
| Thunderstorm Wind | 715 | 9 537 | 10 252 | 12 Trillion |
| Tropical Storm | 66 | 383 | 449 | 8 Trillion |
| Ice Storm | 90 | 1 978 | 2 068 | 8 Trillion |
| Wildfire | 90 | 1 608 | 1 698 | 8 Trillion |
| High Wind | 286 | 1 451 | 1 737 | 7 Trillion |
| Winter Storm | 217 | 1 353 | 1 570 | 6 Trillion |
| Heavy Rain | 101 | 280 | 381 | 4 Trillion |
| Extreme Cold/Wind Chill | 306 | 260 | 566 | 2 Trillion |
| Frost/Freeze | 24 | 196 | 220 | 2 Trillion |
| Heavy Snow | 148 | 1 155 | 1 303 | 1 Trillion |
| _ | _ | _ | _ | _ |
| B I L L I O N S of Dollars Damages to the Economy | ||||
| Blizzard | 101 | 805 | 906 | 772 Billion |
| Coastal Flood | 6 | 7 | 13 | 429 Billion |
| Avalanche | 269 | 225 | 494 | 351 Billion |
| Strong Wind | 140 | 408 | 548 | 264 Billion |
| Tsunami | 33 | 129 | 162 | 144 Billion |
| High Surf | 177 | 273 | 450 | 101 Billion |
| Cold/Wind Chill | 167 | 60 | 227 | 94 Billion |
| Waterspout | 6 | 72 | 78 | 61 Billion |
| Winter Weather | 61 | 470 | 531 | 47 Billion |
| Sleet | 25 | 443 | 468 | 14 Billion |
| Dust Storm | 22 | 440 | 462 | 9 Billion |
| Marine Thunderstorm Wind | 24 | 38 | 62 | 6 Billion |
| Marine High Wind | 12 | 6 | 18 | 2 Billion |
| Freezing Fog | 1 | 1 | 2 | 2 Billion |
| OTHER | 0 | 4 | 4 | 1 Billion |
| _ | _ | _ | _ | _ |
| M I L L I O N S of Dollars Damages to the Economy | ||||
| Lightning | 817 | 5 232 | 6 049 | 946 Million |
| Dust Devil | 2 | 43 | 45 | 719 Million |
| Excessive Heat | 2 018 | 6 730 | 8 748 | 505 Million |
| Heat | 1 125 | 2 498 | 3 623 | 419 Million |
| Marine Strong Wind | 19 | 30 | 49 | 433 Million |
| Funnel Cloud | 0 | 3 | 3 | 195 Million |
| Rip Current | 577 | 529 | 1 106 | 163 Million |
| _ | _ | _ | _ | _ |
| see exact figures below |
Data inspection - see dimensions and data types
dim_storm_data <- dim(storm_data)
dim_storm_data
## [1] 902297 37
head(storm_data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
summary(storm_data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## Min. : 1.0 Length:902297 Length:902297 Length:902297
## 1st Qu.:19.0 Class :character Class :character Class :character
## Median :30.0 Mode :character Mode :character Mode :character
## Mean :31.2
## 3rd Qu.:45.0
## Max. :95.0
##
## COUNTY COUNTYNAME STATE EVTYPE
## Min. : 0.0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 31.0 Class :character Class :character Class :character
## Median : 75.0 Mode :character Mode :character Mode :character
## Mean :100.6
## 3rd Qu.:131.0
## Max. :873.0
##
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE
## Min. : 0.000 Length:902297 Length:902297 Length:902297
## 1st Qu.: 0.000 Class :character Class :character Class :character
## Median : 0.000 Mode :character Mode :character Mode :character
## Mean : 1.484
## 3rd Qu.: 1.000
## Max. :3749.000
##
## END_TIME COUNTY_END COUNTYENDN END_RANGE
## Length:902297 Min. :0 Mode:logical Min. : 0.0000
## Class :character 1st Qu.:0 NA's:902297 1st Qu.: 0.0000
## Mode :character Median :0 Median : 0.0000
## Mean :0 Mean : 0.9862
## 3rd Qu.:0 3rd Qu.: 0.0000
## Max. :0 Max. :925.0000
##
## END_AZI END_LOCATI LENGTH WIDTH
## Length:902297 Length:902297 Min. : 0.0000 Min. : 0.000
## Class :character Class :character 1st Qu.: 0.0000 1st Qu.: 0.000
## Mode :character Mode :character Median : 0.0000 Median : 0.000
## Mean : 0.2301 Mean : 7.503
## 3rd Qu.: 0.0000 3rd Qu.: 0.000
## Max. :2315.0000 Max. :4400.000
##
## F MAG FATALITIES INJURIES
## Min. :0.0 Min. : 0.0 Min. : 0.0000 Min. : 0.0000
## 1st Qu.:0.0 1st Qu.: 0.0 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median :1.0 Median : 50.0 Median : 0.0000 Median : 0.0000
## Mean :0.9 Mean : 46.9 Mean : 0.0168 Mean : 0.1557
## 3rd Qu.:1.0 3rd Qu.: 75.0 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :5.0 Max. :22000.0 Max. :583.0000 Max. :1700.0000
## NA's :843563
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0.00 Length:902297 Min. : 0.000 Length:902297
## 1st Qu.: 0.00 Class :character 1st Qu.: 0.000 Class :character
## Median : 0.00 Mode :character Median : 0.000 Mode :character
## Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :5000.00 Max. :990.000
##
## WFO STATEOFFIC ZONENAMES LATITUDE
## Length:902297 Length:902297 Length:902297 Min. : 0
## Class :character Class :character Class :character 1st Qu.:2802
## Mode :character Mode :character Mode :character Median :3540
## Mean :2875
## 3rd Qu.:4019
## Max. :9706
## NA's :47
## LONGITUDE LATITUDE_E LONGITUDE_ REMARKS
## Min. :-14451 Min. : 0 Min. :-14455 Length:902297
## 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0 Class :character
## Median : 8707 Median : 0 Median : 0 Mode :character
## Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. : 17124 Max. :9706 Max. :106220
## NA's :40
## REFNUM
## Min. : 1
## 1st Qu.:225575
## Median :451149
## Mean :451149
## 3rd Qu.:676723
## Max. :902297
##
This is a long dataset with 902297 rows & 37 columns. On inspection it seems that data entry for the type of storm events were not recorded as requested by the National Weather Service Storm Data Documentation and the remarks are blank. Data cleaning was done (with gsub) in two parts first to reduce the 985 rows by quick fixes, then after manual inspection, more precise call were made, which reduced the storm events closer to the required NOAA set of 48 Storm Events, not all entries could be cleaned as some entries did not make sense.
This quick gsub() treatment, to replace expected input errors reduced the event types from 985 to 393 unique storm events.
storm_data_corrected <- storm_data
storm_data_corrected$EVTYPE <- toupper(storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^(SMALL )?HAIL.*", "HAIL", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("TSTM|THUNDERSTORMS?", "THUNDERSTORM", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("STORMS?", "STORM", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("WINDS?|WINDS?/HAIL", "WIND", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("RAINS?", "RAIN", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^TH?UN?DEE?RS?TO?RO?M ?WIND.*|^(SEVERE )?THUNDERSTORM$|^WIND STORM$|^(DRY )?MI[CR][CR]OBURST.*|^THUNDERSTORMW$", "THUNDERSTORM WIND", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^COASTAL ?STORM$|^MARINE ACCIDENT$", "MARINE THUNDERSTORM WIND", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^FLOODS?.*|^URBAN/SML STREAM FLD$|^(RIVER|TIDAL|MAJOR|URBAN|MINOR|ICE JAM|RIVER AND STREAM|URBAN/SMALL STREAM)? FLOOD(ING)?S?$|^HIGH WATER$|^URBAN AND SMALL STREAM FLOODIN$|^DROWNING$|^DAM BREAK$", "FLOOD", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^FLASH FLOOD.*|^RAPIDLY RISING WATER$", "FLASH FLOOD", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("WATERSPOUTS?", "WATERSPOUT", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("WEATHER/MIX", "WEATHER", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("CURRENTS?", "CURRENT", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^WINDCHILL$|^COLD.*|^LOW TEMPERATURE$|^UNSEASONABLY COLD$", "COLD/WIND CHILL", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^EXTREME WIND ?CHILL$|^(EXTENDED|EXTREME|RECORD)? COLDS?$", "EXTREME COLD/WIND CHILL", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^WILD/FOREST FIRE$|^(WILD|BRUSH|FOREST)? ?FIRES?$", "WILDFIRE", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^RAIN/SNOW$|^(BLOWING|HEAVY|EXCESSIVE|BLOWING|ICE AND|RECORD)? ?SNOWS?.*", "HEAVY SNOW", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^FOG$", "DENSE FOG", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^(GUSTY|NON-SEVERE|NON ?-?THUNDERSTORM)? ?WIND.*|^ICE/STRONG WIND$", "STRONG WIND", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("SURGE$", "SURGE/TIDE", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("CLOUDS?", "CLOUD", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^FROST[/\\]FREEZE$|^FROST$|^(DAMAGING)? ?FREEZE$|^HYP[OE]R?THERMIA.*|^ICE$|^(ICY|ICE) ROADS$|^BLACK ICE$|^ICE ON ROAD$", "FROST/FREEZE", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^GLAZE.*|^FREEZING (RAIN|DRIZZLE|RAIN/SNOW|SPRAY$)$|^WINTRY MIX$|^MIXED PRECIP(ITATION)?$|^WINTER WEATHER MIX$|^LIGHT SNOW$|^FALLING SNOW/ICE$|^SLEET.*", "SLEET", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^HURRICANE.*", "HURRICANE/TYPHOON", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^HEAT WAVES?$|^UNSEASONABLY WARM$|^WARM WEATHER$", "HEAT", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^(EXTREME|RECORD/EXCESSIVE|RECORD) HEAT$", "EXCESSIVE HEAT", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^HEAVY SURF(/HIGH SURF)?.*$|^(ROUGH|HEAVY) SEAS?.*|^(ROUGH|ROGUE|HAZARDOUS) SURF.*|^HIGH WIND AND SEAS$|^HIGH SURF.*", "HIGH SURF", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^LAND(SLUMP|SLIDE)?S?$|^MUD ?SLIDES?$|^AVALANCH?E$", "AVALANCHE", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^UNSEASONABLY WARM AND DRY$|^DROUGHT.*|^HEAT WAVE DROUGHT$", "DROUGHT", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^TORNADO.*", "TORNADO", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^TROPICAL STORM.*", "TROPICAL STORM", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^MARINE MISHAP$|^HIGH WIND/SEAS$", "MARINE HIGH WIND", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^HIGH WIND.*", "HIGH WIND", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^HIGH SEAS$", "MARINE STRONG WIND", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^RIP CURRENT.*", "RIP CURRENT", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^WATERSPOUT.*", "WATERSPOUT", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^EXCESSIVE RAINFALL$|^RAIN.*|^TORRENTIAL RAINFALL$|^(HEAVY|HVY)? (RAIN|MIX|PRECIPITATION).*", "HEAVY RAIN", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^FOG.*", "FREEZING FOG", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^WINTER STORM.*", "WINTER STORM", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^THUNDERSNOW$|^ICE STORM.*", "ICE STORM", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("WAVES?|SWELLS?", "SURF", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^LIGHTNING.*", "LIGHTNING", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^WHIRLWIND$|^GUSTNADO$|^TORNDAO$", "TORNADO", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^COASTAL FLOOD.*", "COASTAL FLOOD", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^TYPHOON", "HURRICANE/TYPHOON", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^EROSION/CSTL FLOOD$|^COASTAL FLOOD/EROSION$|^COASTAL SURGE/TIDE$", "COASTAL FLOOD", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^ASTRONOMICAL HIGH TIDE$", "STORM SURGE/TIDE", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^(GROUND)? ?BLIZZARD.*$", "BLIZZARD", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^DUST STORM.*$", "DUST STORM", storm_data_corrected$EVTYPE)
#unique(storm_data_corrected$EVTYPE)
length(unique(storm_data_corrected$EVTYPE))
## [1] 393
length(unique(storm_data$EVTYPE))
## [1] 985
After inspection the list was reduced even further, but not all listed storm events could be sorted as some data input like “HIGH” make no sense.
storm_data_corrected2 <- storm_data_corrected
storm_data_corrected2$EVTYPE <- toupper(storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("\\sASTRONOMICAL LOW TIDE$|^ASTRONOMICAL LOW TIDE$", "Astronomical Low Tide", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("\\sAVALANCHE$|^AVALANCHE$", "Avalanche", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("\\sBLIZZARD$|^BLIZZARD$", "Blizzard", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("\\sCOASTAL FLOOD$|^BEACH EROSION$/^COASTAL FLOOD$|\\sBEACH EROSION$|^BEACH EROSIN$|^BEACH FLOOD$|^COASTAL FLOODING/EROSION$|^COASTAL/TIDAL FLOOD$|^CSTL FLOODING/EROSION$|^COASTAL EROSION$|^COASTALFLOOD$", "Coastal Flood", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^COLD/WIND CHILL$|^BITTER WIND CHILL TEMPERATURES$|^BITTER WIND CHILL$|^LOW WIND CHILL$|^EXTREME WINDCHILL TEMPERATURES$|^WAKE LOW WIND$", "Cold/Wind Chill", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("\\sDEBRIS FLOW$|^REMNANTS OF FLOYD$", "Debris Flow", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^DENSE FOG$|^PATCHY DENSE FOG$|^VOG$", "Dense Fog", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^DENSE SMOKE$|^SMOKE$", "Dense Smoke", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^DROUGHT$|^EXCESSIVE HEAT/DROUGHT$|^HEAT DROUGHT$|^HEAT/DROUGHT$|^RECORD LOW RAINFALL$|^UNSEASONABLY DRY$|^DRY CONDITIONS$|^DRY$|^VERY DRY$|^DRY SPELL$|^DRY WEATHER$|^DRIEST MONTH$|^DRY HOT WEATHER$|^DRY PATTERN$|^DRYNESS$|^EXCESSIVELY DRY$|^MILD AND DRY PATTERN$|^MILD PATTERN$|^MILD/DRY PATTERN$|^RECORD DRY MONTH$|^WARM DRY CONDITIONS$|^ABNORMALLY DRY$|^BELOW NORMAL PRECIPITATION$|^ABNORMALLY DRY$|^BELOW NORMAL PRECIPITATION$|^HOT AND DRY$|^RECORD DRYNESS$", "Drought", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^DUST DEVIL$|^DUST DEVIL WATERSPOUT$|\\sDUST DEVEL$", "Dust Devil", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^DUST STORM$|^SAHARAN DUST$|^BLOWING DUST$|^DUSTSTORM$", "Dust Storm", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^EXCESSIVE HEAT$|^HEATBURST$|^UNSEASONABLY HOT$|^UNUSUAL WARMTH$|^UNUSUAL/RECORD WARMTH$|^HIGH TEMPERATURE RECORD$|^ABNORMAL WARMTH$|^HOT PATTERN$|^HOT WEATHER$|^HOT/DRY PATTERN$|^RECORD HIGH TEMPERATURES$|^VERY WARM$|^HOT SPELL$", "Excessive Heat", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^EXTREME WIND CHILL/BLOWING SNO$|^EXTREME WIND CHILLS$|^EXTREME COLD/WIND CHILL$|^EXTREME COLD WIND CHILL$", "Extreme Cold/Wind Chill", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^FLASH FLOOD$|\\sFLASH FLOOD$|^FLASH FLOOODING$|^LOCAL FLASH FLOOD$", "Flash Flood", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^FLOOD$|^SMALL STREAM FLOODING$|^SMALL STREAM/URBAN FLOOD$|^SMALL STREAM FLOOD$|^URBAN SMALL STREAM FLOOD$|^URBAN/SMALL STREAM$|^URBAN/SMALL STREAM FLOOD$|^STREET FLOOD$|^STREET FLOODING$|\\sURBAN AND SMALL STREAM FLOOD$|^URBAN/STREET FLOODING$|^BREAKUP FLOODING$|^HIGHWAY FLOODING$|^LOCAL FLOOD$|^LANDSLIDE/URBAN FLOOD$|^MUD SLIDES URBAN FLOODING$|^SMALL STREAM AND URBAN FLOODIN$|^SMALL STREAM AND URBAN FLOODIN$|^STREAM FLOODING$|^URBAN FLOOD LANDSLIDE$|^ICE JAM FLOOD [:punct:]MINOR$|^URBAN AND SMALL STREAM$|^SMALL STREAM URBAN FLOOD$|^URBAN/SMALL FLOODING$|^URBAN/SML STREAM FLDG$|^SML STREAM FLD$|^URBAN/SMALL STRM FLDG$|^SMALL STREAM$|^SMALL STREAM AND$|^SMALL STREAM AND URBAN FLOOD$|^RURAL FLOOD$", "Flood", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^FROST FREEZE$|^FROST/FREEZE$|^FIRST FROST$|^EARLY FROST$|^RECORD COLD/FROST$|^AGRICULTURAL FREEZE$|^HARD FREEZE$|^EARLY FREEZE$|$LATE FREEZE$|^UNSEASONAL LOW TEMP$|^UNSEASONABLE COLD$|^ICE FLOES$", "Frost/Freeze", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^FUNNEL$|^FUNNEL CLOUD$|^FUNNEL CLOUD[:punct:]$|^FUNNEL CLOUD/HAIL$|^FUNNELS$|^WALL CLOUD/FUNNEL CLOUD$|^ROTATING WALL CLOUD$|^WALL CLOUD$|^LARGE WALL CLOUD$", "Funnel Cloud", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^FREEZING FOG$|^ICE FOG$", "Freezing Fog", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^HAIL$|^NON SEVERE HAIL$|^DEEP HAIL$|^LATE SEASON HAIL$|^THUNDERSTORM HAIL$", "Hail", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^HEAT$|^RECORD HEAT SURF$|^RECORD WARMTH$|^PROLONG WARMTH$|^UNUSUALLY WARM$|^UNSEASONABLY WARM YEAR$|^RECORD HIGH TEMPERATURE$|^RECORD WARM$|^RECORD WARM TEMPS[:punct:]$", "Heat", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^HEAVY RAIN$|^PROLONGED RAIN$|^EXCESSIVE RAIN$|^MONTHLY RAINFALL$|^RECORD RAINFALL$|^UNSEASONAL RAIN$|^THUNDERSTORM HEAVY RAIN$|^EARLY RAIN$|^LOCALLY HEAVY RAIN$|^RECORD/EXCESSIVE RAINFALL$|^TORRENTIAL RAIN$|^MONTHLY PRECIPITATION$|^WET MONTH$|^WET YEAR$|^WET MICROBURST$|^UNSEASONABLY WET$|^UNSEASONABLY WARM/WET$|^NORMAL PRECIPITATION$|^ABNORMALLY WET$|^EXCESSIVE PRECIPITATION$|^EXCESSIVE WETNESS$|^EXTREMELY WET$|^HEAVY PRECIPATATION$|^HEAVY SHOWERS$|^MUD/ROCK SLIDE$|^MUDSLIDE/LANDSLIDE$|^RECORD PRECIPITATION$|^UNSEASONABLY WARM [:punct:] WET$|^WET WEATHER$|^HEAVY SHOWER$|^WET MICOBURST$", "Heavy Rain", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^HEAVY SNOW$|^MODERATE SNOWFALL$|^ICE/SNOW$|^EARLY SNOWFALL$|^FIRST SNOW$|^LIGHT SNOW/FLURRIES$|^EARLY SNOW$|^ACCUMULATED SNOWFALL$|^DRIFTING SNOW$|^HEAVY WET SNOW$|^LATE-SEASON SNOWFALL$|^LATE SEASON SNOW$|^LIGHT SNOW/FREEZING PRECIP$|^LIGHT SNOWFALL$|^PROLONG COLD$|^SEASONAL SNOWFALL$", "Heavy Snow", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("\\sHIGH SURF ADVISORY$|^HIGH SURF ADVISORY$|^HIGH SURF$|^HEAVY SURF$|^HIGH SURF$|^ROGUE SURF$", "High Surf", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^HIGH WIND$|^RECORD COLD AND HIGH WIND$|^GUSTY LAKE WIND$|^STORM FORCE WIND$|^SOUTHEAST$|^WND$", "High Wind", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("\\sHURRICANE TYPHOON$|^HURRICANE TYPHOON$|^HURRICANE/TYPHOON$", "Hurricane (Typhoon)", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("\\sICE STORM$|^ICE STORM$", "Ice Storm", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^HEAVY LAKE SNOW$|\\sLAKE-EFFECT SNOW$|^LAKE EFFECT SNOW$", "Lake-Effect Snow", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^LAKESHORE FLOOD$|^LAKE FLOOD$|^RECORD WINTER SNOW$", "LAKESHORE FLOOD", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^LIGHTNING$|^LIGHTING$|\\sLIGHTNING$|^LIGNTNING$", "Lightning", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^MARINE HAIL$|\\sMARINE HAIL$", "Marine Hail", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^MARINE STRONG WIND$|\\sMARINE STRONG WIND$", "Marine Strong Wind", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^MARINE STRONG WIND$|\\sMARINE STRONG WIND$", "MARINE STRONG WIND", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^MARINE THUNDERSTORM WIND$|\\sMARINE THUNDERSTORM WIND$", "Marine Thunderstorm Wind", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^RIP CURRENT$|\\sRIP CURRENT$", "Rip Current", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^SEICHE$|\\sSEICHE$", "Seiche", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^SLEET$|^FREEZING RAIN AND SLEET$|^FREEZING RAIN/SLEET$|^LIGHT FREEZING RAIN$|^UNSEASONABLY COOL & WET$|^WET SNOW$|^FREEZING RAIN AND SNOW$|^FREEZING RAIN SLEET AND FREEZING RAIN SLEET AND LIGHT$", "Sleet", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^STORM SURGE$/^TIDE$|^BLOW[:punct:]OUT TIDE$|^BLOW[:punct:]OUT TIDES$|^HIGH TIDES$", "Storm Surge/Tide", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^STRONG WIND$|^GRADIENT WIND$|^SEVERE TURBULENCE$|^STRONG WIND GUST$|^DOWNBURST WIND$", "Strong Wind", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^THUNDERSTORM WIND$|^SEVERE THUNDERSTORM WIND$|^THUNDERSTORM WIND$|^GUSTY THUNDERSTORM WIND$|^THUNDERSTORM DAMAGE$|^THUNDESTORM WIND$|^THUNDERSTORMW WIND$|\\sTHUNDERSTORM WIND$|^THUNDERSTORM WIND [:punct:]G45[:punct:]$|^THUNDERESTORM WIND$|^THUNDERSNOW SHOWER$|^THUNDERSTORM DAMAGE TO$|^THUNDERSTORM W INDS$|^THUNDERSTORM WINS$|^THUNDERSTORM WND$|^THUNDERSTORMW 50$|^THUNDERTSORM WIND$|^THUNERSTORM WIND$", "Thunderstorm Wind", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^TORNADO$|^LANDSPOUT$", "Tornado", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^TROPICAL DEPRESSION$|\\sTROPICAL DEPRESSION$", "Tropical Depression", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^TROPICAL STORM$|\\sTROPICAL STORM$", "Tropical Storm", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^TSUNAMI$|\\sTSUNAMI$", "Tsunami", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^VOLCANIC ASH$|^VOLCANIC ERUPTION$|^VOLCANIC ASHFALL$|^VOLCANIC ASH PLUME$", "Volcanic Ash", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^WATERSPOUT$|\\sWATERSPOUT$|^WAYTERSPOUT$", "Waterspout", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^WILDFIRE$|^GRASS FIRES$|^WILD/FOREST FIRES$|^RED FLAG CRITERIA$|^RED FLAG FIRE WX$", "Wildfire", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^WINTER STORM$|\\sWINTER STORM$", "Winter Storm", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^WINTER WEATHER$|^RECORD COOL$|^UNUSUALLY COLD$|^UNSEASONABLY COOL$|^WINTERY MIX$|^WINTER MIX$|^EXTREME/RECORD COLD$|^ICE JAM$|^COOL AND WET$|^COOL SPELL$|^FREEZING DRIZZLE AND FREEZING$|^ICE PELLETS$|^ICESTORM/BLIZZARD$|^LOW TEMPERATURE RECORD$|^MODERATE SNOW$|^MOUNTAIN SNOWS$|^NEAR RECORD SNOW$|^NORTHERN LIGHTS$|^PATCHY ICE$|^PROLONG COLD/SNOW$|^RECORD COLD$|^RECORD MAY SNOW$|^SEVERE COLD$|^EXCESSIVE COLD$|^LATE SEASON SNOWFALL$|^LATE SNOW$|^LIGHT SNOW AND SLEET$|^MONTHLY SNOWFALL$", "Winter Weather", storm_data_corrected2$EVTYPE)
#storm_data_corrected2$EVTYPE <- gsub("|^TEMPERATURE RECORD$|^OTHER $|^RECORD TEMPERATURE$|^RECORD HIGH$|^RECORD TEMPERATURES$|^RECORD LOW$|^MONTHLY TEMPERATURE$|^URBAN AND SMALL$||^URBAN/SMALL$|^DOWNBURST$|^ROCK SLIDE$|^SUMMARY AUGUST 10$|^SUMMARY AUGUST 11$|^SUMMARY OF APRIL 12$|^NO SEVERE WEATHER$|^METRO STORM MAY 26$|^HIGH$|^LACK OF SNOW$|^SUMMARY AUGUST 17$|^SUMMARY AUGUST 2[:punct:]3$|^SUMMARY AUGUST 21$|^SUMMARY AUGUST 28$|^$|^SUMMARY AUGUST 4$|^SUMMARY AUGUST 7$|^SUMMARY AUGUST 9$|^SUMMARY JAN 17$|^SUMMARY JULY 23[:punct:]24$|^SUMMARY JUNE 18[:punct:]19$|^SUMMARY JUNE 5[:punct:]6$|^SUMMARY JUNE 6$|^SUMMARY OF APRIL 13$|^SUMMARY OF APRIL 27$|^SUMMARY OF APRIL 3RD$|^SUMMARY OF AUGUST 1$|^SUMMARY OF JULY 11$|^SUMMARY OF JULY 2$|^SUMMARY OF JULY 22$|^SUMMARY OF JULY 26$|^SUMMARY OF JULY 29$|^SUMMARY OF JULY 3$|^SUMMARY OF JUNE 10$|^SUMMARY OF JUNE 11$|^SUMMARY OF JUNE 12$|^SUMMARY OF JUNE 15$|^SUMMARY OF JUNE 16$|^SUMMARY OF JUNE 18$|^SUMMARY OF JUNE 23$|^SUMMARY OF JUNE 24$|^SUMMARY OF JUNE 30$|^SUMMARY OF JUNE 4$|^SUMMARY OF JUNE 6$|^SUMMARY OF MARCH 14$|^SUMMARY OF MARCH 24$|^SUMMARY OF MARCH 24[:punct:]25$|^SUMMARY OF MARCH 27$|^SUMMARY OF MARCH 29$|^SUMMARY OF MAY 10$|^SUMMARY OF MAY 13$|^SUMMARY OF MAY 14$|^SUMMARY OF MAY 22$|^SUMMARY OF MAY 22 AM$|^SUMMARY OF MAY 22 PM$|^SUMMARY OF MAY 26 AM$|^SUMMARY OF MAY 31 AM$|^SUMMARY OF MAY 9[:punct:]10$|^SUMMARY SEPT[:punct:] 25[:punct:]26$|^SUMMARY SEPTEMBER 20$|^SUMMARY SEPTEMBER 3$|^SUMMARY SEPTEMBER 4$|^SUMMARY[:punct:] NOV[:punct:] 6[:punct:]7$|^SUMMARY[:punct:] OCT[:punct:] 20[:punct:]21$|^SUMMARY[:punct:] OCTOBER 31$|^SUMMARY[:punct:] SEPT[:punct:] 18$|[:punct:]$", "error", storm_data_corrected2$EVTYPE) #makes a big mess :(
length(unique(storm_data_corrected2$EVTYPE))
## [1] 157
from the original 985 storm event entries, errors not qualified by the NOAA list of 48 were set to NULL, that is ignored.
It was assumed that severe weather which caused harm to population health, could be calculated by summing total injuries and total fatalities, listed by storm event type and those storm events having economic consequences could be calculated by summing the total damage caused to crops and property.
#head(storm_data_corrected2, 15)
#tail(storm_data_corrected2, 15)
names(storm_data_corrected2)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
#NamesOfColumnsSDC2[23:24]
#[1] "FATALITIES" "INJURIES"
#NamesOfColumnsSDC2[25:28]
#[1] "PROPDMG" "PROPDMGEXP" "CROPDMG" "CROPDMGEXP"
these values are found in the columns: - “FATALITIES” “INJURIES, to estimate public health impact -”PROPDMG" “PROPDMGEXP” “CROPDMG” “CROPDMGEXP”, the economic impact of storm events.
#length(unique(storm_data_corrected2$PROPDMG))
#length(unique(storm_data_corrected2$PROPDMGEXP))
PDamageExpUnits<- unique(storm_data_corrected2$PROPDMGEXP)
#PDamageExpUnits
# [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7"
#[16] "H" "-" "1" "8"
CDamageExpUnits<- unique(storm_data_corrected2$CROPDMGEXP)
#CDamageExpUnits
#[1] "" "M" "K" "m" "B" "?" "0" "k" "2"
The units for the Property Damage are not clear: K, M, , B, m, +, 0, 5, 6, ?, 4, 2, 3, h, 7, H, -, 1, 8 Nor is that of the Crop Damage: , M, K, m, B, ?, 0, k, 2
I had to create useful monary units that provides correct estimates of the crop and property damage, it was assumed that [K,k] = 1000; [M,m] =1000000; [B]=1000000000; [h]=100 and other values are ignored.
library(dplyr)
# used mutate()
stormSumByType <- storm_data_corrected2%>%
mutate(propMult = ifelse(is.na(PROPDMGEXP),
1,
ifelse((PROPDMGEXP =="K" | PROPDMGEXP == "k"),
1000,
ifelse((PROPDMGEXP == "M" | PROPDMGEXP =="m"),
1000000,
ifelse((PROPDMGEXP =="B" | PROPDMGEXP =="b"),
1000000000,
1)))),
cropMult = ifelse(is.na(CROPDMGEXP),
1,
ifelse((CROPDMGEXP =="K"| CROPDMGEXP =="k"),
1000,
ifelse((CROPDMGEXP =="M" | CROPDMGEXP =="m"),
1000000,
ifelse((CROPDMGEXP =="B"| CROPDMGEXP =="b"),
1000000000,
1)))),
propDamage = PROPDMG * propMult,
cropDamage = CROPDMG * cropMult,
totalDamage = propDamage + cropDamage)%>%
group_by(EVTYPE) %>%
summarise(totalFatalities = sum(FATALITIES),
totalInjuries = sum(INJURIES),
totalCasualties = totalFatalities + totalInjuries,
economicDamage = sum(totalDamage))
## `summarise()` ungrouping output (override with `.groups` argument)
# create new "helper" data sets
stormTypesWithCasualties <- stormSumByType%>%
filter(totalCasualties >=1) %>%
arrange(desc(totalCasualties))
stormTypesWithDamage <- stormSumByType%>%
filter(economicDamage > 1) %>% arrange(desc(economicDamage))
head(stormTypesWithCasualties)
## # A tibble: 6 x 5
## EVTYPE totalFatalities totalInjuries totalCasualties economicDamage
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Tornado 5659 91364 97023 58959516549.
## 2 Thunderstorm Wind 715 9537 10252 12249634666.
## 3 Excessive Heat 2018 6730 8748 505270700
## 4 Flood 533 6889 7422 161339246394.
## 5 Lightning 817 5232 6049 945834537.
## 6 Heat 1125 2498 3623 419278550
Results show that Tornadoes, Floods and Flash Floods caused both the most casualties and damage to the economy during the period 1950 - 2011. Economic damage rounded off to Trillion, Billion and Millions of USD.
Here are those events which cause Trillions of dollars of damage to the economy:
# Event Types:
TrillionDAMECO.CSV <- "Event Type, Total Fatalities, Total Injuries, Total Casualties, Crop Property
,,, , T R I L L I O N S of Dollars Damages to the Economy
Flood , 533 , 6 889 , 7 422 , 161 Trillion
Hurricane (Typhoon) , 135 , 1333 , 1468 , 91 Trillion
Tornado , 5 659 , 91 364 , 97 023 , 59 Trillion
Storm Surge/Tide , 24 , 43 , 67 , 48 Trillion
Dense Fog , 80 , 1076 , 1 156 , 23 Trillion
Hail , 15 , 1 371 , 1 386 , 19 Trillion
Flash Flood, 1 019 , 1 785 , 2 804 , 18 Trillion
Drought , 35 , 19 , 54 , 15 Trillion
Thunderstorm Wind , 715 , 9 537 , 10 252, 12 Trillion
Tropical Storm , 66 , 383 , 449 , 8 Trillion
Ice Storm , 90 , 1 978 , 2 068 , 8 Trillion
Wildfire , 90 , 1 608 , 1 698 , 8 Trillion
High Wind , 286 , 1 451 , 1 737 , 7 Trillion
Winter Storm , 217 , 1 353 , 1 570 , 6 Trillion
Heavy Rain , 101 , 280 , 381 , 4 Trillion
Extreme Cold/Wind Chill , 306 , 260 , 566 , 2 Trillion
Frost/Freeze , 24 , 196 , 220 , 2 Trillion
Heavy Snow , 148 , 1 155 , 1 303 , 1 Trillion"
TrillionDAMECO <- read.csv(textConnection(TrillionDAMECO.CSV),header=TRUE)
kable(TrillionDAMECO,format="markdown")
| Event.Type | Total.Fatalities | Total.Injuries | Total.Casualties | Crop.Property |
|---|---|---|---|---|
| T R I L L I O N S of Dollars Damages to the Economy | ||||
| Flood | 533 | 6 889 | 7 422 | 161 Trillion |
| Hurricane (Typhoon) | 135 | 1333 | 1468 | 91 Trillion |
| Tornado | 5 659 | 91 364 | 97 023 | 59 Trillion |
| Storm Surge/Tide | 24 | 43 | 67 | 48 Trillion |
| Dense Fog | 80 | 1076 | 1 156 | 23 Trillion |
| Hail | 15 | 1 371 | 1 386 | 19 Trillion |
| Flash Flood | 1 019 | 1 785 | 2 804 | 18 Trillion |
| Drought | 35 | 19 | 54 | 15 Trillion |
| Thunderstorm Wind | 715 | 9 537 | 10 252 | 12 Trillion |
| Tropical Storm | 66 | 383 | 449 | 8 Trillion |
| Ice Storm | 90 | 1 978 | 2 068 | 8 Trillion |
| Wildfire | 90 | 1 608 | 1 698 | 8 Trillion |
| High Wind | 286 | 1 451 | 1 737 | 7 Trillion |
| Winter Storm | 217 | 1 353 | 1 570 | 6 Trillion |
| Heavy Rain | 101 | 280 | 381 | 4 Trillion |
| Extreme Cold/Wind Chill | 306 | 260 | 566 | 2 Trillion |
| Frost/Freeze | 24 | 196 | 220 | 2 Trillion |
| Heavy Snow | 148 | 1 155 | 1 303 | 1 Trillion |
Here are those events which cause Billions of dollars of damage to the economy:
# Event Types:
BillionDAMECO.CSV <- "Event Type, Total Fatalities, Total Injuries, Total Casualties, Crop Property
,,,, B I L L I O N S of Dollars Damages to the Economy
Blizzard , 101 , 805 , 906 , 772 Billion
Coastal Flood , 6 , 7 , 13 , 429 Billion
Avalanche , 269 , 225 , 494 , 351 Billion
Strong Wind , 140 , 408 , 548 , 264 Billion
Tsunami , 33 , 129 , 162 , 144 Billion
High Surf , 177 , 273 , 450 , 101 Billion
Cold/Wind Chill , 167 , 60 , 227 , 94 Billion
Waterspout , 6 , 72 , 78 , 61 Billion
Winter Weather , 61 , 470 , 531 , 47 Billion
Sleet , 25 , 443 , 468 , 14 Billion
Dust Storm , 22 , 440 , 462 , 9 Billion
Marine Thunderstorm Wind, 24 , 38 , 62 , 6 Billion
Marine High Wind , 12 , 6 , 18 , 2 Billion
Freezing Fog , 1 , 1 , 2 , 2 Billion
OTHER, 0, 4, 4, 1 Billion "
BillionDAMECO <- read.csv(textConnection(BillionDAMECO.CSV),header=TRUE)
kable(BillionDAMECO,format="markdown")
| Event.Type | Total.Fatalities | Total.Injuries | Total.Casualties | Crop.Property |
|---|---|---|---|---|
| NA | NA | NA | B I L L I O N S of Dollars Damages to the Economy | |
| Blizzard | 101 | 805 | 906 | 772 Billion |
| Coastal Flood | 6 | 7 | 13 | 429 Billion |
| Avalanche | 269 | 225 | 494 | 351 Billion |
| Strong Wind | 140 | 408 | 548 | 264 Billion |
| Tsunami | 33 | 129 | 162 | 144 Billion |
| High Surf | 177 | 273 | 450 | 101 Billion |
| Cold/Wind Chill | 167 | 60 | 227 | 94 Billion |
| Waterspout | 6 | 72 | 78 | 61 Billion |
| Winter Weather | 61 | 470 | 531 | 47 Billion |
| Sleet | 25 | 443 | 468 | 14 Billion |
| Dust Storm | 22 | 440 | 462 | 9 Billion |
| Marine Thunderstorm Wind | 24 | 38 | 62 | 6 Billion |
| Marine High Wind | 12 | 6 | 18 | 2 Billion |
| Freezing Fog | 1 | 1 | 2 | 2 Billion |
| OTHER | 0 | 4 | 4 | 1 Billion |
Here are those events which cause Millions of dollars of damage to the economy:
# Event Types:
MillionDAMECO.CSV <- "Event Type, Total Fatalities, Total Injuries, Total Casualties, Crop Property
,,,, M I L L I O N S of Dollars Damages to the Economy
Lightning , 817 , 5 232 , 6 049 , 946 Million
Dust Devil , 2 , 43 , 45 , 719 Million
Excessive Heat, 2 018 , 6 730 , 8 748 , 505 Million
Heat , 1 125 , 2 498 , 3 623 , 419 Million
Marine Strong Wind , 19 , 30 , 49 , 433 Million
Funnel Cloud , 0 , 3 , 3 , 195 Million
Rip Current , 577 , 529 , 1 106 , 163 Million"
MillionDAMECO <- read.csv(textConnection(MillionDAMECO.CSV),header=TRUE)
kable(MillionDAMECO,format="markdown")
| Event.Type | Total.Fatalities | Total.Injuries | Total.Casualties | Crop.Property |
|---|---|---|---|---|
| M I L L I O N S of Dollars Damages to the Economy | ||||
| Lightning | 817 | 5 232 | 6 049 | 946 Million |
| Dust Devil | 2 | 43 | 45 | 719 Million |
| Excessive Heat | 2 018 | 6 730 | 8 748 | 505 Million |
| Heat | 1 125 | 2 498 | 3 623 | 419 Million |
| Marine Strong Wind | 19 | 30 | 49 | 433 Million |
| Funnel Cloud | 0 | 3 | 3 | 195 Million |
| Rip Current | 577 | 529 | 1 106 | 163 Million |
There are some storm events like, Excessive Heat, Lightning and Heat which caused only around (505, 946, 505) Million USD damage to the economy but lead to (8 748, 6 049, 3 623) casualties respectively.
mostCasualties <- stormSumByType[[which.max(stormSumByType$totalCasualties),1]]
mostCasualties
## [1] "Tornado"
#[1] "Tornado"
#max(stormSumByType$totalCasualties)
#[1] 97023
mostFatalities <- stormSumByType[[which.max(stormSumByType$totalFatalities),1]]
mostFatalities
## [1] "Tornado"
#[1] "Tornado"
#max(stormSumByType$totalFatalities)
#[1] 5659
mostInjuries <- stormSumByType[[which.max(stormSumByType$totalInjuries),1]]
mostInjuries
## [1] "Tornado"
#[1] "Tornado"
#max(stormSumByType$totalInjuries)
#[1] 91364
mostEconomicDamage <- stormSumByType[[which.max(stormSumByType$economicDamage),1]]
mostEconomicDamage
## [1] "Flood"
#[1] "Flood"
max(stormSumByType$economicDamage)
## [1] 161339246394
#[1] 161339246394
Cost to human life Maximum fatalities : Tornado : 5659 Maximum injuries : Tornado : 91364
Most Economic Damage Flood : 161339246394 USD
plot(stormTypesWithDamage)
Initial exploration shows that there are a few events which cause most of the harm and damage. From 1950 to 2011, From previous examination of the data table we know that Tornados were the most deadly and caused significant economic damage for example.
Used the cleaned data set, storm_data_corrected2 and the created variables of stormTypesWithDamage for the Visualizations:
#summary(storm_data_corrected2)
names(storm_data_corrected2)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
#keep to map, L8R
#storm_data_corrected2$STATE[1:10]
#storm_data_corrected2$LATITUDE[1:10]
#storm_data_corrected2$LATITUDE[1:10]
#storm_data_corrected2$REMARKS there were no remarks
names(stormTypesWithDamage)
## [1] "EVTYPE" "totalFatalities" "totalInjuries" "totalCasualties"
## [5] "economicDamage"
human_cost <- storm_data %>% group_by(EVTYPE) %>%
summarize(deaths = sum(FATALITIES),
injury = sum(INJURIES),
total = sum(FATALITIES) + sum(INJURIES))
## `summarise()` ungrouping output (override with `.groups` argument)
# Top 5 categories of events that are most harmful to human populations
head(arrange(human_cost, desc(total)))
## # A tibble: 6 x 4
## EVTYPE deaths injury total
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
#Gathering the data into a tidy format for plotting
phuman_cost <- tidyr::gather(human_cost[,-4], key = casualty, value=total,2:3) %>% arrange(desc(total))
# Subsetting for the top 15 in each casualty category
split_list <- lapply(split(phuman_cost, phuman_cost$casualty), head, 15)
phcost <- do.call(rbind.data.frame, split_list)
row.names(phcost) <-NULL
POPstormDAM_viz <- ggplot(phcost, aes(x=reorder(EVTYPE, total), y=total)) +
geom_col() + facet_wrap(casualty ~.) +
coord_flip() + theme_light() +
ggtitle("Dangerous Storm Events by Total Number of Casualties")
ggsave("plot1.png")
## Saving 7 x 5 in image
POPstormDAM_viz
From this graph we can see that Tornadoes caused 91364 total injuries, about 10 times more than the next dangerous storm events, Excessive heat, Thunderstorm Wind and Flood, which stand around 10000 injuries, see data frame ‘stormTypesWithCasualties’ for exact figures. It’s difficult to see because of the amount of Tornado caused injuries.
#Event Type to Total Fatalities
fatalities_by_event <- storm_data_corrected2 %>%
add_count(EVTYPE) %>%
summarize(fatal_sum = sum(FATALITIES), fatal_mean = mean(FATALITIES), n = mean(n))
fatalities_by_event$fatal_sum
## [1] 15145
#[1] 15145
fatalities_by_event$n
## [1] 219812
#[1] 219,812
head(fatalities_by_event)
## fatal_sum fatal_mean n
## 1 15145 0.01678494 219812
The total death toll stood at 15 145 for deaths caused by severe weather from 1950-2011.
#Event Type to Injuries
injuries_by_event <- storm_data_corrected2 %>%
add_count(EVTYPE) %>%
summarize(injuries_sum = sum(INJURIES), injuries_mean = mean(INJURIES), n = mean(n))
head(injuries_by_event)
## injuries_sum injuries_mean n
## 1 140528 0.1557447 219812
injuries_by_event$injuries_sum
## [1] 140528
#[1] 140,528
head(injuries_by_event)
## injuries_sum injuries_mean n
## 1 140528 0.1557447 219812
There were 140528 injuries caused by severe weather from 1950-2011.
crop_property <- storm_data %>% group_by(EVTYPE)%>%
summarise(property = round(sum(PROPDMG)),
crop = round(sum(CROPDMG)),
total = round(sum(PROPDMG) + sum(CROPDMG)))
## `summarise()` ungrouping output (override with `.groups` argument)
# Top five categories of storm events that are most damaging to properties and crops
head(arrange(crop_property, desc(total)))
## # A tibble: 6 x 4
## EVTYPE property crop total
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 3212258 100019 3312277
## 2 FLASH FLOOD 1420125 179200 1599325
## 3 TSTM WIND 1335966 109203 1445168
## 4 HAIL 688693 579596 1268290
## 5 FLOOD 899938 168038 1067976
## 6 THUNDERSTORM WIND 876844 66791 943636
# Subsetting for the top seven severe weather forms for property/crop damage
cp_cost <- tidyr::gather(crop_property[,-4], key = crop_prop, value=total,2:3) %>% arrange(desc(total))
split_list2 <- lapply(split(cp_cost, cp_cost$crop_prop), head,15)
cp_cost <- do.call(rbind.data.frame, split_list2)
row.names(cp_cost) <-NULL
p_cost <- cp_cost[cp_cost$crop_prop =="property", ]
c_cost <- cp_cost[cp_cost$crop_prop =="crop", ]
# Visualizing with Bar Charts
par(mfrow = c(1,2), mar = c(8,4,3,2), mgp = c(3,1,0), cex =0.4)
barplot(p_cost$total, las =3, names.arg = p_cost$EVTYPE,
main ="Worst Storm Events",
ylab ="Damage Cost ($ billions)",
col ="grey85")
barplot(c_cost$total, las =3, names.arg = c_cost$EVTYPE,
ylab ="Damage Cost ($ billions)",
col ="grey85")
All code and intermediate results are hosted at this Github repository feel free to fork and download this project to collaborate, and/or contact me. for further information.