Published on RPubs, October 2020

This is an R Markdown document, a simple formatting syntax for authoring HTML, PDF, and MS Word documents, for more details on using R Markdown. In every field, there is now a computed field of: Knitr & RPubs allow for literate statistical programming and reproducible research.

Contents:

  • Synopsis, Reproducible Research Checklist
  • Summary of Results, Past Significant Weather Events in the USA, 1950-2011
  • Raw Data: NOAA StormData.csv.bz2
  • Data Transformation, storm_data_corrected2
  • Data Processing, Storm Event Type Damage
  • Data Analysis, Tables and Plots

Synopsis:

This data set is interesting to analyse as it could be used as a reference to control for and/or compare current severe weather, due to climate change, to past events.

Reproducible Research Checklist:

This is a COURSERA Assignment submission for the Reproducible Research course, the data set was given as a downloadable zip file. See the National Weather Service Storm Data Documentation, to find out how the variables are constructed/defined.

This report addresses the impact of past severe weather on public health and the economy of the United States of America.

The total death toll stood at 15 145 for deaths caused by severe weather, during the period 1950 - 2011, and 140528 injuries. Results show that Tonardos, Floods and Flash Floods caused both high casualties (97 023, 7 422, 2 804) and damage to the economy of (59, 161, 18) Trillion USD respectively during the same period, other extreme events like Excessive Heat, Lightning and Heat caused less economic damage of around (505, 946, 505) Million USD but lead to (8 748, 6 049, 3 623) casualties respectively.

The data for this analysis originates from the National Oceanic and Atmospheric Administration (NOAA) database. The NOAA labelling protocol was respected during the storm event by type classification.

It was assumed that severe weather which caused harm to population health, could be calculated by summing total injuries and total fatalities, listed by storm event type and those storm events having economic consequences could be calculated by summing the total damage caused to crops and property.

According to the National Climatic Data Center Storm Events FAQ lightning data contained within Storm Data are list only those events that result in fatality, injury and/or property and crop damage and that Tornadoes may contain multiple segments. The raw data was uploaded, cleaned and analyzed with R4.0 in a windows10 (64bit) environment. R is a free open source programming tool, please refer to the CRAN documents to download and install R4.0 in a windows 10.64bit environment.

The information prior to 2001 up until 1950 was sparse, but maintained. Data cleaning was done with the gsub() function to do global replacements of error inputs of the storm event type. Data manipulation was done with dplyr. The cleaned data was then queried to find summary statistics, viewed as a data frame and plotted to visualise. All code and intermediate results are hosted at this Github repository feel free to fork and download this project to collaborate, and/or contact me for further information.

To reproduce these results:

  • Please download the raws data in your working directory.
  • Install these packages (knitr , R.utils, xtable, stringdist, lubridate, tidyverse, dplyr, ggplot2 and gridExtra)
  • Run the given code with R4.0 in the order it is presented.

Summary of Results:

The code to reproduce these figures are in the data analysis section.

Past Significant Weather Events in the USA, 1950-2011

Storm Impacts

Storm Impacts

Initial exploration shows that there are a few events which cause most of the harm and damage. From 1950 to 2011, Tornadoes were the most deadly and caused significant economic damage.

Impact on Human Health

Impact on Human Health

From this graph we can see that Tornadoes caused over 90 Thousand injuries, about 10 times more than the next dangerous storm events, Excessive Heat, Thunderstorm Wind and Flood, which stand around 10 Thousand injuries each.

Impact to the Economy

Impact to the Economy

Once again the scale of impact of Tonordos is much greater than other NOAA storm events. Flooding and Thunderstorms also cause considerable damage.

Raw Data: NOAA StormData.csv.bz2

Set up the Environment & Upload the Data

  • Please download the raw data in your working directory.
  • Install these packages
  • Run the following code
#setwd("./OneDrive/R/ReproducibleResearch/")

library(knitr)
library(R.utils)
## Warning: package 'R.utils' was built under R version 4.0.3
## Loading required package: R.oo
## Warning: package 'R.oo' was built under R version 4.0.3
## Loading required package: R.methodsS3
## Warning: package 'R.methodsS3' was built under R version 4.0.3
## R.methodsS3 v1.8.1 (2020-08-26 16:20:06 UTC) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.24.0 (2020-08-26 16:11:58 UTC) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## The following object is masked from 'package:R.methodsS3':
## 
##     throw
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## The following objects are masked from 'package:base':
## 
##     attach, detach, load, save
## R.utils v2.10.1 (2020-08-26 22:50:31 UTC) successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## The following object is masked from 'package:utils':
## 
##     timestamp
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, nullfile, parse,
##     warnings
library(xtable)
library(stringdist)
## Warning: package 'stringdist' was built under R version 4.0.3
## 
## Attaching package: 'stringdist'
## The following object is masked from 'package:R.utils':
## 
##     extract
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2          v purrr   0.3.4     
## v tibble  3.0.4          v dplyr   1.0.2     
## v tidyr   1.1.2          v stringr 1.4.0     
## v readr   1.4.0.9000     v forcats 0.5.0
## Warning: package 'tibble' was built under R version 4.0.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x lubridate::as.difftime() masks base::as.difftime()
## x lubridate::date()        masks base::date()
## x tidyr::extract()         masks stringdist::extract(), R.utils::extract()
## x dplyr::filter()          masks stats::filter()
## x lubridate::intersect()   masks base::intersect()
## x dplyr::lag()             masks stats::lag()
## x lubridate::setdiff()     masks base::setdiff()
## x lubridate::union()       masks base::union()
library(dplyr)
library(ggplot2)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
if(!file.exists('stormdata.csv')) {
        zipfile <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
        download.file(zipfile, 'stormdata.zip')
        bunzip2('stormdata.zip', 'stormdata.csv', remove=FALSE)
}

storm_data <- read.csv('stormdata.csv')

Summary of impacts to the economy and human health:

The code to reproduce these figures are in the data analysis section.

Tornadoes, Floods and Flash Floods:

Tornadoes, Floods and Flash Floods cause both the most fatalities and damage to both the economy, see data analysis below.

Event.Type Total.Fatalities Total.Injuries Total.Casualties Crop.Property
T R I L L I O N S of Dollars Damages to the Economy
Flood 533 6 889 7 422 161 Trillion
Hurricane (Typhoon) 135 1333 1468 91 Trillion
Tornado 5 659 91 364 97 023 59 Trillion
Storm Surge/Tide 24 43 67 48 Trillion
Dense Fog 80 1076 1 156 23 Trillion
Hail 15 1 371 1 386 19 Trillion
Flash Flood 1 019 1 785 2 804 18 Trillion
Drought 35 19 54 15 Trillion
Thunderstorm Wind 715 9 537 10 252 12 Trillion
Tropical Storm 66 383 449 8 Trillion
Ice Storm 90 1 978 2 068 8 Trillion
Wildfire 90 1 608 1 698 8 Trillion
High Wind 286 1 451 1 737 7 Trillion
Winter Storm 217 1 353 1 570 6 Trillion
Heavy Rain 101 280 381 4 Trillion
Extreme Cold/Wind Chill 306 260 566 2 Trillion
Frost/Freeze 24 196 220 2 Trillion
Heavy Snow 148 1 155 1 303 1 Trillion
_ _ _ _ _
B I L L I O N S of Dollars Damages to the Economy
Blizzard 101 805 906 772 Billion
Coastal Flood 6 7 13 429 Billion
Avalanche 269 225 494 351 Billion
Strong Wind 140 408 548 264 Billion
Tsunami 33 129 162 144 Billion
High Surf 177 273 450 101 Billion
Cold/Wind Chill 167 60 227 94 Billion
Waterspout 6 72 78 61 Billion
Winter Weather 61 470 531 47 Billion
Sleet 25 443 468 14 Billion
Dust Storm 22 440 462 9 Billion
Marine Thunderstorm Wind 24 38 62 6 Billion
Marine High Wind 12 6 18 2 Billion
Freezing Fog 1 1 2 2 Billion
OTHER 0 4 4 1 Billion
_ _ _ _ _
M I L L I O N S of Dollars Damages to the Economy
Lightning 817 5 232 6 049 946 Million
Dust Devil 2 43 45 719 Million
Excessive Heat 2 018 6 730 8 748 505 Million
Heat 1 125 2 498 3 623 419 Million
Marine Strong Wind 19 30 49 433 Million
Funnel Cloud 0 3 3 195 Million
Rip Current 577 529 1 106 163 Million
_ _ _ _ _
see exact figures below

Data Analysis:

  • It is assumed that you have installed and
  • loaded the required R packages and
  • that you downloaded and
  • read in the zipped dataset
  • in your project environment and that
  • the data is stored in the variable “storm_data”.

Data inspection - see dimensions and data types

dim_storm_data <- dim(storm_data)
dim_storm_data
## [1] 902297     37
head(storm_data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6
summary(storm_data)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY       COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31.0   Class :character   Class :character   Class :character  
##  Median : 75.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100.6                                                           
##  3rd Qu.:131.0                                                           
##  Max.   :873.0                                                           
##                                                                          
##    BGN_RANGE          BGN_AZI           BGN_LOCATI          END_DATE        
##  Min.   :   0.000   Length:902297      Length:902297      Length:902297     
##  1st Qu.:   0.000   Class :character   Class :character   Class :character  
##  Median :   0.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :   1.484                                                           
##  3rd Qu.:   1.000                                                           
##  Max.   :3749.000                                                           
##                                                                             
##    END_TIME           COUNTY_END COUNTYENDN       END_RANGE       
##  Length:902297      Min.   :0    Mode:logical   Min.   :  0.0000  
##  Class :character   1st Qu.:0    NA's:902297    1st Qu.:  0.0000  
##  Mode  :character   Median :0                   Median :  0.0000  
##                     Mean   :0                   Mean   :  0.9862  
##                     3rd Qu.:0                   3rd Qu.:  0.0000  
##                     Max.   :0                   Max.   :925.0000  
##                                                                   
##    END_AZI           END_LOCATI            LENGTH              WIDTH         
##  Length:902297      Length:902297      Min.   :   0.0000   Min.   :   0.000  
##  Class :character   Class :character   1st Qu.:   0.0000   1st Qu.:   0.000  
##  Mode  :character   Mode  :character   Median :   0.0000   Median :   0.000  
##                                        Mean   :   0.2301   Mean   :   7.503  
##                                        3rd Qu.:   0.0000   3rd Qu.:   0.000  
##                                        Max.   :2315.0000   Max.   :4400.000  
##                                                                              
##        F               MAG            FATALITIES          INJURIES        
##  Min.   :0.0      Min.   :    0.0   Min.   :  0.0000   Min.   :   0.0000  
##  1st Qu.:0.0      1st Qu.:    0.0   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  Median :1.0      Median :   50.0   Median :  0.0000   Median :   0.0000  
##  Mean   :0.9      Mean   :   46.9   Mean   :  0.0168   Mean   :   0.1557  
##  3rd Qu.:1.0      3rd Qu.:   75.0   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  Max.   :5.0      Max.   :22000.0   Max.   :583.0000   Max.   :1700.0000  
##  NA's   :843563                                                           
##     PROPDMG         PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Min.   :   0.00   Length:902297      Min.   :  0.000   Length:902297     
##  1st Qu.:   0.00   Class :character   1st Qu.:  0.000   Class :character  
##  Median :   0.00   Mode  :character   Median :  0.000   Mode  :character  
##  Mean   :  12.06                      Mean   :  1.527                     
##  3rd Qu.:   0.50                      3rd Qu.:  0.000                     
##  Max.   :5000.00                      Max.   :990.000                     
##                                                                           
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
## 

This is a long dataset with 902297 rows & 37 columns. On inspection it seems that data entry for the type of storm events were not recorded as requested by the National Weather Service Storm Data Documentation and the remarks are blank. Data cleaning was done (with gsub) in two parts first to reduce the 985 rows by quick fixes, then after manual inspection, more precise call were made, which reduced the storm events closer to the required NOAA set of 48 Storm Events, not all entries could be cleaned as some entries did not make sense.

Data Transformation: storm_data_corrected2

Data Cleaning, Part 1:

  • Storm Data Instructions were followed, to describe the Event Types, see Storm Data Event Table, page 6 restricts events to 48 only and “OTHER”. The Designators are C for County-Parish , Z for Zone and M for Marine

First storm data event cleaning, with gsub()

This quick gsub() treatment, to replace expected input errors reduced the event types from 985 to 393 unique storm events.

storm_data_corrected <- storm_data
storm_data_corrected$EVTYPE <- toupper(storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^(SMALL )?HAIL.*", "HAIL", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("TSTM|THUNDERSTORMS?", "THUNDERSTORM", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("STORMS?", "STORM", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("WINDS?|WINDS?/HAIL", "WIND", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("RAINS?", "RAIN", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^TH?UN?DEE?RS?TO?RO?M ?WIND.*|^(SEVERE )?THUNDERSTORM$|^WIND STORM$|^(DRY )?MI[CR][CR]OBURST.*|^THUNDERSTORMW$", "THUNDERSTORM WIND", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^COASTAL ?STORM$|^MARINE ACCIDENT$", "MARINE THUNDERSTORM WIND", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^FLOODS?.*|^URBAN/SML STREAM FLD$|^(RIVER|TIDAL|MAJOR|URBAN|MINOR|ICE JAM|RIVER AND STREAM|URBAN/SMALL STREAM)? FLOOD(ING)?S?$|^HIGH WATER$|^URBAN AND SMALL STREAM FLOODIN$|^DROWNING$|^DAM BREAK$", "FLOOD", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^FLASH FLOOD.*|^RAPIDLY RISING WATER$", "FLASH FLOOD", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("WATERSPOUTS?", "WATERSPOUT", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("WEATHER/MIX", "WEATHER", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("CURRENTS?", "CURRENT", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^WINDCHILL$|^COLD.*|^LOW TEMPERATURE$|^UNSEASONABLY COLD$", "COLD/WIND CHILL", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^EXTREME WIND ?CHILL$|^(EXTENDED|EXTREME|RECORD)? COLDS?$", "EXTREME COLD/WIND CHILL", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^WILD/FOREST FIRE$|^(WILD|BRUSH|FOREST)? ?FIRES?$", "WILDFIRE", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^RAIN/SNOW$|^(BLOWING|HEAVY|EXCESSIVE|BLOWING|ICE AND|RECORD)? ?SNOWS?.*", "HEAVY SNOW", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^FOG$", "DENSE FOG", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^(GUSTY|NON-SEVERE|NON ?-?THUNDERSTORM)? ?WIND.*|^ICE/STRONG WIND$", "STRONG WIND", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("SURGE$", "SURGE/TIDE", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("CLOUDS?", "CLOUD", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^FROST[/\\]FREEZE$|^FROST$|^(DAMAGING)? ?FREEZE$|^HYP[OE]R?THERMIA.*|^ICE$|^(ICY|ICE) ROADS$|^BLACK ICE$|^ICE ON ROAD$", "FROST/FREEZE", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^GLAZE.*|^FREEZING (RAIN|DRIZZLE|RAIN/SNOW|SPRAY$)$|^WINTRY MIX$|^MIXED PRECIP(ITATION)?$|^WINTER WEATHER MIX$|^LIGHT SNOW$|^FALLING SNOW/ICE$|^SLEET.*", "SLEET", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^HURRICANE.*", "HURRICANE/TYPHOON", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^HEAT WAVES?$|^UNSEASONABLY WARM$|^WARM WEATHER$", "HEAT", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^(EXTREME|RECORD/EXCESSIVE|RECORD) HEAT$", "EXCESSIVE HEAT", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^HEAVY SURF(/HIGH SURF)?.*$|^(ROUGH|HEAVY) SEAS?.*|^(ROUGH|ROGUE|HAZARDOUS) SURF.*|^HIGH WIND AND SEAS$|^HIGH SURF.*", "HIGH SURF", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^LAND(SLUMP|SLIDE)?S?$|^MUD ?SLIDES?$|^AVALANCH?E$", "AVALANCHE", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^UNSEASONABLY WARM AND DRY$|^DROUGHT.*|^HEAT WAVE DROUGHT$", "DROUGHT", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^TORNADO.*", "TORNADO", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^TROPICAL STORM.*", "TROPICAL STORM", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^MARINE MISHAP$|^HIGH WIND/SEAS$", "MARINE HIGH WIND", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^HIGH WIND.*", "HIGH WIND", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^HIGH SEAS$", "MARINE STRONG WIND", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^RIP CURRENT.*", "RIP CURRENT", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^WATERSPOUT.*", "WATERSPOUT", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^EXCESSIVE RAINFALL$|^RAIN.*|^TORRENTIAL RAINFALL$|^(HEAVY|HVY)? (RAIN|MIX|PRECIPITATION).*", "HEAVY RAIN", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^FOG.*", "FREEZING FOG", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^WINTER STORM.*", "WINTER STORM", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^THUNDERSNOW$|^ICE STORM.*", "ICE STORM", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("WAVES?|SWELLS?", "SURF", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^LIGHTNING.*", "LIGHTNING", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^WHIRLWIND$|^GUSTNADO$|^TORNDAO$", "TORNADO", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^COASTAL FLOOD.*", "COASTAL FLOOD", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^TYPHOON", "HURRICANE/TYPHOON", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^EROSION/CSTL FLOOD$|^COASTAL FLOOD/EROSION$|^COASTAL SURGE/TIDE$", "COASTAL FLOOD", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^ASTRONOMICAL HIGH TIDE$", "STORM SURGE/TIDE", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^(GROUND)? ?BLIZZARD.*$", "BLIZZARD", storm_data_corrected$EVTYPE)
storm_data_corrected$EVTYPE <- gsub("^DUST STORM.*$", "DUST STORM", storm_data_corrected$EVTYPE)
#unique(storm_data_corrected$EVTYPE)
length(unique(storm_data_corrected$EVTYPE))
## [1] 393
length(unique(storm_data$EVTYPE))
## [1] 985

After inspection the list was reduced even further, but not all listed storm events could be sorted as some data input like “HIGH” make no sense.

Second input correction:

  • used gsub() again, with NOAA list in mind.
storm_data_corrected2 <- storm_data_corrected
storm_data_corrected2$EVTYPE <- toupper(storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("\\sASTRONOMICAL LOW TIDE$|^ASTRONOMICAL LOW TIDE$", "Astronomical Low Tide", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("\\sAVALANCHE$|^AVALANCHE$", "Avalanche", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("\\sBLIZZARD$|^BLIZZARD$", "Blizzard", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("\\sCOASTAL FLOOD$|^BEACH EROSION$/^COASTAL FLOOD$|\\sBEACH EROSION$|^BEACH EROSIN$|^BEACH FLOOD$|^COASTAL  FLOODING/EROSION$|^COASTAL/TIDAL FLOOD$|^CSTL FLOODING/EROSION$|^COASTAL EROSION$|^COASTALFLOOD$", "Coastal Flood", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^COLD/WIND CHILL$|^BITTER WIND CHILL TEMPERATURES$|^BITTER WIND CHILL$|^LOW WIND CHILL$|^EXTREME WINDCHILL TEMPERATURES$|^WAKE LOW WIND$", "Cold/Wind Chill", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("\\sDEBRIS FLOW$|^REMNANTS OF FLOYD$", "Debris Flow", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^DENSE FOG$|^PATCHY DENSE FOG$|^VOG$", "Dense Fog", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^DENSE SMOKE$|^SMOKE$", "Dense Smoke", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^DROUGHT$|^EXCESSIVE HEAT/DROUGHT$|^HEAT DROUGHT$|^HEAT/DROUGHT$|^RECORD LOW RAINFALL$|^UNSEASONABLY DRY$|^DRY CONDITIONS$|^DRY$|^VERY DRY$|^DRY SPELL$|^DRY WEATHER$|^DRIEST MONTH$|^DRY HOT WEATHER$|^DRY PATTERN$|^DRYNESS$|^EXCESSIVELY DRY$|^MILD AND DRY PATTERN$|^MILD PATTERN$|^MILD/DRY PATTERN$|^RECORD DRY MONTH$|^WARM DRY CONDITIONS$|^ABNORMALLY DRY$|^BELOW NORMAL PRECIPITATION$|^ABNORMALLY DRY$|^BELOW NORMAL PRECIPITATION$|^HOT AND DRY$|^RECORD DRYNESS$", "Drought", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^DUST DEVIL$|^DUST DEVIL WATERSPOUT$|\\sDUST DEVEL$", "Dust Devil", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^DUST STORM$|^SAHARAN DUST$|^BLOWING DUST$|^DUSTSTORM$", "Dust Storm", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^EXCESSIVE HEAT$|^HEATBURST$|^UNSEASONABLY HOT$|^UNUSUAL WARMTH$|^UNUSUAL/RECORD WARMTH$|^HIGH TEMPERATURE RECORD$|^ABNORMAL WARMTH$|^HOT PATTERN$|^HOT WEATHER$|^HOT/DRY PATTERN$|^RECORD HIGH TEMPERATURES$|^VERY WARM$|^HOT SPELL$", "Excessive Heat", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^EXTREME WIND CHILL/BLOWING SNO$|^EXTREME WIND CHILLS$|^EXTREME COLD/WIND CHILL$|^EXTREME COLD WIND CHILL$", "Extreme Cold/Wind Chill", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^FLASH FLOOD$|\\sFLASH FLOOD$|^FLASH FLOOODING$|^LOCAL FLASH FLOOD$", "Flash Flood", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^FLOOD$|^SMALL STREAM FLOODING$|^SMALL STREAM/URBAN FLOOD$|^SMALL STREAM FLOOD$|^URBAN SMALL STREAM FLOOD$|^URBAN/SMALL STREAM$|^URBAN/SMALL STREAM  FLOOD$|^STREET FLOOD$|^STREET FLOODING$|\\sURBAN AND SMALL STREAM FLOOD$|^URBAN/STREET FLOODING$|^BREAKUP FLOODING$|^HIGHWAY FLOODING$|^LOCAL FLOOD$|^LANDSLIDE/URBAN FLOOD$|^MUD SLIDES URBAN FLOODING$|^SMALL STREAM AND URBAN FLOODIN$|^SMALL STREAM AND URBAN FLOODIN$|^STREAM FLOODING$|^URBAN FLOOD LANDSLIDE$|^ICE JAM FLOOD [:punct:]MINOR$|^URBAN AND SMALL STREAM$|^SMALL STREAM URBAN FLOOD$|^URBAN/SMALL FLOODING$|^URBAN/SML STREAM FLDG$|^SML STREAM FLD$|^URBAN/SMALL STRM FLDG$|^SMALL STREAM$|^SMALL STREAM AND$|^SMALL STREAM AND URBAN FLOOD$|^RURAL FLOOD$", "Flood", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^FROST FREEZE$|^FROST/FREEZE$|^FIRST FROST$|^EARLY FROST$|^RECORD COLD/FROST$|^AGRICULTURAL FREEZE$|^HARD FREEZE$|^EARLY FREEZE$|$LATE FREEZE$|^UNSEASONAL LOW TEMP$|^UNSEASONABLE COLD$|^ICE FLOES$", "Frost/Freeze", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^FUNNEL$|^FUNNEL CLOUD$|^FUNNEL CLOUD[:punct:]$|^FUNNEL CLOUD/HAIL$|^FUNNELS$|^WALL CLOUD/FUNNEL CLOUD$|^ROTATING WALL CLOUD$|^WALL CLOUD$|^LARGE WALL CLOUD$", "Funnel Cloud", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^FREEZING FOG$|^ICE FOG$", "Freezing Fog", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^HAIL$|^NON SEVERE HAIL$|^DEEP HAIL$|^LATE SEASON HAIL$|^THUNDERSTORM HAIL$", "Hail", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^HEAT$|^RECORD HEAT SURF$|^RECORD WARMTH$|^PROLONG WARMTH$|^UNUSUALLY WARM$|^UNSEASONABLY WARM YEAR$|^RECORD HIGH TEMPERATURE$|^RECORD WARM$|^RECORD WARM TEMPS[:punct:]$", "Heat", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^HEAVY RAIN$|^PROLONGED RAIN$|^EXCESSIVE RAIN$|^MONTHLY RAINFALL$|^RECORD RAINFALL$|^UNSEASONAL RAIN$|^THUNDERSTORM HEAVY RAIN$|^EARLY RAIN$|^LOCALLY HEAVY RAIN$|^RECORD/EXCESSIVE RAINFALL$|^TORRENTIAL RAIN$|^MONTHLY PRECIPITATION$|^WET MONTH$|^WET YEAR$|^WET MICROBURST$|^UNSEASONABLY WET$|^UNSEASONABLY WARM/WET$|^NORMAL PRECIPITATION$|^ABNORMALLY WET$|^EXCESSIVE PRECIPITATION$|^EXCESSIVE WETNESS$|^EXTREMELY WET$|^HEAVY PRECIPATATION$|^HEAVY SHOWERS$|^MUD/ROCK SLIDE$|^MUDSLIDE/LANDSLIDE$|^RECORD PRECIPITATION$|^UNSEASONABLY WARM [:punct:] WET$|^WET WEATHER$|^HEAVY SHOWER$|^WET MICOBURST$", "Heavy Rain", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^HEAVY SNOW$|^MODERATE SNOWFALL$|^ICE/SNOW$|^EARLY SNOWFALL$|^FIRST SNOW$|^LIGHT SNOW/FLURRIES$|^EARLY SNOW$|^ACCUMULATED SNOWFALL$|^DRIFTING SNOW$|^HEAVY WET SNOW$|^LATE-SEASON SNOWFALL$|^LATE SEASON SNOW$|^LIGHT SNOW/FREEZING PRECIP$|^LIGHT SNOWFALL$|^PROLONG COLD$|^SEASONAL SNOWFALL$", "Heavy Snow", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("\\sHIGH SURF ADVISORY$|^HIGH SURF ADVISORY$|^HIGH SURF$|^HEAVY SURF$|^HIGH  SURF$|^ROGUE SURF$", "High Surf", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^HIGH WIND$|^RECORD COLD AND HIGH WIND$|^GUSTY LAKE WIND$|^STORM FORCE WIND$|^SOUTHEAST$|^WND$", "High Wind", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("\\sHURRICANE TYPHOON$|^HURRICANE TYPHOON$|^HURRICANE/TYPHOON$", "Hurricane (Typhoon)", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("\\sICE STORM$|^ICE STORM$", "Ice Storm", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^HEAVY LAKE SNOW$|\\sLAKE-EFFECT SNOW$|^LAKE EFFECT SNOW$", "Lake-Effect Snow", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^LAKESHORE FLOOD$|^LAKE FLOOD$|^RECORD WINTER SNOW$", "LAKESHORE FLOOD", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^LIGHTNING$|^LIGHTING$|\\sLIGHTNING$|^LIGNTNING$", "Lightning", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^MARINE HAIL$|\\sMARINE HAIL$", "Marine Hail", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^MARINE STRONG WIND$|\\sMARINE STRONG WIND$", "Marine Strong Wind", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^MARINE STRONG WIND$|\\sMARINE STRONG WIND$", "MARINE STRONG WIND", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^MARINE THUNDERSTORM WIND$|\\sMARINE THUNDERSTORM WIND$", "Marine Thunderstorm Wind", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^RIP CURRENT$|\\sRIP CURRENT$", "Rip Current", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^SEICHE$|\\sSEICHE$", "Seiche", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^SLEET$|^FREEZING RAIN AND SLEET$|^FREEZING RAIN/SLEET$|^LIGHT FREEZING RAIN$|^UNSEASONABLY COOL & WET$|^WET SNOW$|^FREEZING RAIN AND SNOW$|^FREEZING RAIN SLEET AND  FREEZING RAIN SLEET AND LIGHT$", "Sleet", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^STORM SURGE$/^TIDE$|^BLOW[:punct:]OUT TIDE$|^BLOW[:punct:]OUT TIDES$|^HIGH TIDES$", "Storm Surge/Tide", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^STRONG WIND$|^GRADIENT WIND$|^SEVERE TURBULENCE$|^STRONG WIND GUST$|^DOWNBURST WIND$", "Strong Wind", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^THUNDERSTORM WIND$|^SEVERE THUNDERSTORM WIND$|^THUNDERSTORM  WIND$|^GUSTY THUNDERSTORM WIND$|^THUNDERSTORM DAMAGE$|^THUNDESTORM WIND$|^THUNDERSTORMW WIND$|\\sTHUNDERSTORM WIND$|^THUNDERSTORM WIND [:punct:]G45[:punct:]$|^THUNDERESTORM WIND$|^THUNDERSNOW SHOWER$|^THUNDERSTORM DAMAGE TO$|^THUNDERSTORM W INDS$|^THUNDERSTORM WINS$|^THUNDERSTORM WND$|^THUNDERSTORMW 50$|^THUNDERTSORM WIND$|^THUNERSTORM WIND$", "Thunderstorm Wind", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^TORNADO$|^LANDSPOUT$", "Tornado", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^TROPICAL DEPRESSION$|\\sTROPICAL DEPRESSION$", "Tropical Depression", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^TROPICAL STORM$|\\sTROPICAL STORM$", "Tropical Storm", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^TSUNAMI$|\\sTSUNAMI$", "Tsunami", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^VOLCANIC ASH$|^VOLCANIC ERUPTION$|^VOLCANIC ASHFALL$|^VOLCANIC ASH PLUME$", "Volcanic Ash", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^WATERSPOUT$|\\sWATERSPOUT$|^WAYTERSPOUT$", "Waterspout", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^WILDFIRE$|^GRASS FIRES$|^WILD/FOREST FIRES$|^RED FLAG CRITERIA$|^RED FLAG FIRE WX$", "Wildfire", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^WINTER STORM$|\\sWINTER STORM$", "Winter Storm", storm_data_corrected2$EVTYPE)
storm_data_corrected2$EVTYPE <- gsub("^WINTER WEATHER$|^RECORD COOL$|^UNUSUALLY COLD$|^UNSEASONABLY COOL$|^WINTERY MIX$|^WINTER MIX$|^EXTREME/RECORD COLD$|^ICE JAM$|^COOL AND WET$|^COOL SPELL$|^FREEZING DRIZZLE AND FREEZING$|^ICE PELLETS$|^ICESTORM/BLIZZARD$|^LOW TEMPERATURE RECORD$|^MODERATE SNOW$|^MOUNTAIN SNOWS$|^NEAR RECORD SNOW$|^NORTHERN LIGHTS$|^PATCHY ICE$|^PROLONG COLD/SNOW$|^RECORD  COLD$|^RECORD MAY SNOW$|^SEVERE COLD$|^EXCESSIVE COLD$|^LATE SEASON SNOWFALL$|^LATE SNOW$|^LIGHT SNOW AND SLEET$|^MONTHLY SNOWFALL$", "Winter Weather", storm_data_corrected2$EVTYPE)
#storm_data_corrected2$EVTYPE <- gsub("|^TEMPERATURE RECORD$|^OTHER $|^RECORD TEMPERATURE$|^RECORD HIGH$|^RECORD TEMPERATURES$|^RECORD LOW$|^MONTHLY TEMPERATURE$|^URBAN AND SMALL$||^URBAN/SMALL$|^DOWNBURST$|^ROCK SLIDE$|^SUMMARY AUGUST 10$|^SUMMARY AUGUST 11$|^SUMMARY OF APRIL 12$|^NO SEVERE WEATHER$|^METRO STORM MAY 26$|^HIGH$|^LACK OF SNOW$|^SUMMARY AUGUST 17$|^SUMMARY AUGUST 2[:punct:]3$|^SUMMARY AUGUST 21$|^SUMMARY AUGUST 28$|^$|^SUMMARY AUGUST 4$|^SUMMARY AUGUST 7$|^SUMMARY AUGUST 9$|^SUMMARY JAN 17$|^SUMMARY JULY 23[:punct:]24$|^SUMMARY JUNE 18[:punct:]19$|^SUMMARY JUNE 5[:punct:]6$|^SUMMARY JUNE 6$|^SUMMARY OF APRIL 13$|^SUMMARY OF APRIL 27$|^SUMMARY OF APRIL 3RD$|^SUMMARY OF AUGUST 1$|^SUMMARY OF JULY 11$|^SUMMARY OF JULY 2$|^SUMMARY OF JULY 22$|^SUMMARY OF JULY 26$|^SUMMARY OF JULY 29$|^SUMMARY OF JULY 3$|^SUMMARY OF JUNE 10$|^SUMMARY OF JUNE 11$|^SUMMARY OF JUNE 12$|^SUMMARY OF JUNE 15$|^SUMMARY OF JUNE 16$|^SUMMARY OF JUNE 18$|^SUMMARY OF JUNE 23$|^SUMMARY OF JUNE 24$|^SUMMARY OF JUNE 30$|^SUMMARY OF JUNE 4$|^SUMMARY OF JUNE 6$|^SUMMARY OF MARCH 14$|^SUMMARY OF MARCH 24$|^SUMMARY OF MARCH 24[:punct:]25$|^SUMMARY OF MARCH 27$|^SUMMARY OF MARCH 29$|^SUMMARY OF MAY 10$|^SUMMARY OF MAY 13$|^SUMMARY OF MAY 14$|^SUMMARY OF MAY 22$|^SUMMARY OF MAY 22 AM$|^SUMMARY OF MAY 22 PM$|^SUMMARY OF MAY 26 AM$|^SUMMARY OF MAY 31 AM$|^SUMMARY OF MAY 9[:punct:]10$|^SUMMARY SEPT[:punct:] 25[:punct:]26$|^SUMMARY SEPTEMBER 20$|^SUMMARY SEPTEMBER 3$|^SUMMARY SEPTEMBER 4$|^SUMMARY[:punct:] NOV[:punct:] 6[:punct:]7$|^SUMMARY[:punct:] OCT[:punct:] 20[:punct:]21$|^SUMMARY[:punct:] OCTOBER 31$|^SUMMARY[:punct:] SEPT[:punct:] 18$|[:punct:]$", "error", storm_data_corrected2$EVTYPE) #makes a big mess :(

length(unique(storm_data_corrected2$EVTYPE))
## [1] 157

The list has now been narrowed down to 157

from the original 985 storm event entries, errors not qualified by the NOAA list of 48 were set to NULL, that is ignored.

Harm and Damage by Severe Weather:

It was assumed that severe weather which caused harm to population health, could be calculated by summing total injuries and total fatalities, listed by storm event type and those storm events having economic consequences could be calculated by summing the total damage caused to crops and property.

  • Data Inspection
#head(storm_data_corrected2, 15)
#tail(storm_data_corrected2, 15)
names(storm_data_corrected2)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
#NamesOfColumnsSDC2[23:24]
#[1] "FATALITIES" "INJURIES" 
#NamesOfColumnsSDC2[25:28]
#[1] "PROPDMG"    "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP"

Data Processing: Storm Event Type Damage

storm event impacts:

these values are found in the columns: - “FATALITIES” “INJURIES, to estimate public health impact -”PROPDMG" “PROPDMGEXP” “CROPDMG” “CROPDMGEXP”, the economic impact of storm events.

#length(unique(storm_data_corrected2$PROPDMG))
#length(unique(storm_data_corrected2$PROPDMGEXP))
PDamageExpUnits<- unique(storm_data_corrected2$PROPDMGEXP)
#PDamageExpUnits
# [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7"
#[16] "H" "-" "1" "8"
CDamageExpUnits<- unique(storm_data_corrected2$CROPDMGEXP)
#CDamageExpUnits
#[1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

The units for the Property Damage are not clear: K, M, , B, m, +, 0, 5, 6, ?, 4, 2, 3, h, 7, H, -, 1, 8 Nor is that of the Crop Damage: , M, K, m, B, ?, 0, k, 2

I had to create useful monary units that provides correct estimates of the crop and property damage, it was assumed that [K,k] = 1000; [M,m] =1000000; [B]=1000000000; [h]=100 and other values are ignored.

  • Data processing was done with the dply library:
library(dplyr)
# used mutate()

stormSumByType <- storm_data_corrected2%>% 
    mutate(propMult = ifelse(is.na(PROPDMGEXP),
                            1,
                            ifelse((PROPDMGEXP =="K" | PROPDMGEXP == "k"),
                                   1000, 
                                   ifelse((PROPDMGEXP == "M" | PROPDMGEXP =="m"),
                                          1000000, 
                                          ifelse((PROPDMGEXP =="B" | PROPDMGEXP =="b"),
                                                 1000000000,
                                                 1)))), 
           cropMult = ifelse(is.na(CROPDMGEXP), 
                             1,
                             ifelse((CROPDMGEXP =="K"| CROPDMGEXP =="k"),
                                    1000, 
                                    ifelse((CROPDMGEXP =="M" | CROPDMGEXP =="m"),
                                           1000000, 
                                           ifelse((CROPDMGEXP =="B"| CROPDMGEXP =="b"),
                                                  1000000000,
                                                  1)))), 
           propDamage = PROPDMG * propMult, 
           cropDamage = CROPDMG * cropMult, 
           totalDamage = propDamage + cropDamage)%>% 
group_by(EVTYPE) %>% 
summarise(totalFatalities = sum(FATALITIES), 
          totalInjuries = sum(INJURIES), 
          totalCasualties = totalFatalities + totalInjuries, 
          economicDamage = sum(totalDamage))
## `summarise()` ungrouping output (override with `.groups` argument)
# create new "helper" data sets
stormTypesWithCasualties <- stormSumByType%>% 
    filter(totalCasualties >=1) %>% 
    arrange(desc(totalCasualties)) 

stormTypesWithDamage <- stormSumByType%>% 
    filter(economicDamage > 1) %>% arrange(desc(economicDamage))

head(stormTypesWithCasualties)
## # A tibble: 6 x 5
##   EVTYPE            totalFatalities totalInjuries totalCasualties economicDamage
##   <chr>                       <dbl>         <dbl>           <dbl>          <dbl>
## 1 Tornado                      5659         91364           97023   58959516549.
## 2 Thunderstorm Wind             715          9537           10252   12249634666.
## 3 Excessive Heat               2018          6730            8748     505270700 
## 4 Flood                         533          6889            7422  161339246394.
## 5 Lightning                     817          5232            6049     945834537.
## 6 Heat                         1125          2498            3623     419278550

This table was used to write the summary report in markdown.

Results show that Tornadoes, Floods and Flash Floods caused both the most casualties and damage to the economy during the period 1950 - 2011. Economic damage rounded off to Trillion, Billion and Millions of USD.

Here are those events which cause Trillions of dollars of damage to the economy:

# Event Types: 

TrillionDAMECO.CSV <- "Event Type, Total Fatalities, Total Injuries, Total Casualties, Crop Property
,,, , T R I L L I O N S of Dollars Damages to the Economy 
Flood , 533 , 6 889 , 7 422 , 161 Trillion
Hurricane (Typhoon) , 135 , 1333 , 1468 , 91 Trillion
Tornado , 5 659 , 91 364 , 97 023 , 59 Trillion
Storm Surge/Tide , 24 , 43 , 67 , 48 Trillion
Dense Fog , 80 , 1076 , 1 156 , 23 Trillion
Hail , 15 , 1 371 , 1 386 , 19 Trillion
Flash Flood, 1 019 , 1 785 , 2 804 , 18 Trillion
Drought , 35 , 19 , 54 , 15 Trillion
Thunderstorm Wind , 715 , 9 537 , 10 252, 12 Trillion
Tropical Storm , 66 , 383 , 449 , 8 Trillion
Ice Storm , 90 , 1 978 , 2 068 , 8 Trillion
Wildfire , 90 , 1 608 , 1 698 , 8 Trillion
High Wind , 286 , 1 451 , 1 737 , 7 Trillion
Winter Storm , 217 , 1 353 , 1 570 , 6 Trillion
Heavy Rain , 101 , 280 , 381 , 4 Trillion
Extreme Cold/Wind Chill , 306 , 260 , 566 , 2 Trillion
Frost/Freeze , 24 , 196 , 220 , 2 Trillion
Heavy Snow , 148 , 1 155 , 1 303 , 1 Trillion"


TrillionDAMECO <- read.csv(textConnection(TrillionDAMECO.CSV),header=TRUE)

kable(TrillionDAMECO,format="markdown")
Event.Type Total.Fatalities Total.Injuries Total.Casualties Crop.Property
T R I L L I O N S of Dollars Damages to the Economy
Flood 533 6 889 7 422 161 Trillion
Hurricane (Typhoon) 135 1333 1468 91 Trillion
Tornado 5 659 91 364 97 023 59 Trillion
Storm Surge/Tide 24 43 67 48 Trillion
Dense Fog 80 1076 1 156 23 Trillion
Hail 15 1 371 1 386 19 Trillion
Flash Flood 1 019 1 785 2 804 18 Trillion
Drought 35 19 54 15 Trillion
Thunderstorm Wind 715 9 537 10 252 12 Trillion
Tropical Storm 66 383 449 8 Trillion
Ice Storm 90 1 978 2 068 8 Trillion
Wildfire 90 1 608 1 698 8 Trillion
High Wind 286 1 451 1 737 7 Trillion
Winter Storm 217 1 353 1 570 6 Trillion
Heavy Rain 101 280 381 4 Trillion
Extreme Cold/Wind Chill 306 260 566 2 Trillion
Frost/Freeze 24 196 220 2 Trillion
Heavy Snow 148 1 155 1 303 1 Trillion

Here are those events which cause Billions of dollars of damage to the economy:

# Event Types: 

BillionDAMECO.CSV <- "Event Type, Total Fatalities, Total Injuries, Total Casualties, Crop Property
,,,, B I L L I O N S of Dollars Damages to the Economy 
Blizzard , 101 , 805 , 906 , 772 Billion
Coastal Flood , 6 , 7 , 13 , 429 Billion
Avalanche , 269 , 225 , 494 , 351 Billion
Strong Wind , 140 , 408 , 548 , 264 Billion
Tsunami , 33 , 129 , 162 , 144 Billion
High Surf , 177 , 273 , 450 , 101 Billion
Cold/Wind Chill , 167 , 60 , 227 , 94 Billion
Waterspout , 6 , 72 , 78 , 61 Billion
Winter Weather , 61 , 470 , 531 , 47 Billion
Sleet , 25 , 443 , 468 , 14 Billion
Dust Storm , 22 , 440 , 462 , 9 Billion
Marine Thunderstorm Wind, 24 , 38 , 62 , 6 Billion
Marine High Wind , 12 , 6 , 18 , 2 Billion
Freezing Fog , 1 , 1 , 2 , 2 Billion
OTHER, 0, 4, 4, 1 Billion "



BillionDAMECO <- read.csv(textConnection(BillionDAMECO.CSV),header=TRUE)

kable(BillionDAMECO,format="markdown")
Event.Type Total.Fatalities Total.Injuries Total.Casualties Crop.Property
NA NA NA B I L L I O N S of Dollars Damages to the Economy
Blizzard 101 805 906 772 Billion
Coastal Flood 6 7 13 429 Billion
Avalanche 269 225 494 351 Billion
Strong Wind 140 408 548 264 Billion
Tsunami 33 129 162 144 Billion
High Surf 177 273 450 101 Billion
Cold/Wind Chill 167 60 227 94 Billion
Waterspout 6 72 78 61 Billion
Winter Weather 61 470 531 47 Billion
Sleet 25 443 468 14 Billion
Dust Storm 22 440 462 9 Billion
Marine Thunderstorm Wind 24 38 62 6 Billion
Marine High Wind 12 6 18 2 Billion
Freezing Fog 1 1 2 2 Billion
OTHER 0 4 4 1 Billion

Here are those events which cause Millions of dollars of damage to the economy:

# Event Types: 
MillionDAMECO.CSV <- "Event Type, Total Fatalities, Total Injuries, Total Casualties, Crop Property
,,,, M I L L I O N S of Dollars Damages to the Economy 
Lightning , 817 , 5 232 , 6 049 , 946 Million
Dust Devil , 2 , 43 , 45 , 719 Million
Excessive Heat, 2 018 , 6 730 , 8 748 , 505 Million
Heat , 1 125 , 2 498 , 3 623 , 419 Million
Marine Strong Wind , 19 , 30 , 49 , 433 Million
Funnel Cloud , 0 , 3 , 3 , 195 Million
Rip Current , 577 , 529 , 1 106 , 163 Million"


MillionDAMECO <- read.csv(textConnection(MillionDAMECO.CSV),header=TRUE)

kable(MillionDAMECO,format="markdown")
Event.Type Total.Fatalities Total.Injuries Total.Casualties Crop.Property
M I L L I O N S of Dollars Damages to the Economy
Lightning 817 5 232 6 049 946 Million
Dust Devil 2 43 45 719 Million
Excessive Heat 2 018 6 730 8 748 505 Million
Heat 1 125 2 498 3 623 419 Million
Marine Strong Wind 19 30 49 433 Million
Funnel Cloud 0 3 3 195 Million
Rip Current 577 529 1 106 163 Million

There are some storm events like, Excessive Heat, Lightning and Heat which caused only around (505, 946, 505) Million USD damage to the economy but lead to (8 748, 6 049, 3 623) casualties respectively.

Data Analysis, Tables & Plots

mostCasualties <- stormSumByType[[which.max(stormSumByType$totalCasualties),1]]
mostCasualties
## [1] "Tornado"
#[1] "Tornado"
#max(stormSumByType$totalCasualties)
#[1] 97023
mostFatalities <- stormSumByType[[which.max(stormSumByType$totalFatalities),1]] 
mostFatalities
## [1] "Tornado"
#[1] "Tornado"
#max(stormSumByType$totalFatalities)
#[1] 5659
mostInjuries <- stormSumByType[[which.max(stormSumByType$totalInjuries),1]] 
mostInjuries
## [1] "Tornado"
#[1] "Tornado"
#max(stormSumByType$totalInjuries)
#[1] 91364
mostEconomicDamage <- stormSumByType[[which.max(stormSumByType$economicDamage),1]] 
mostEconomicDamage
## [1] "Flood"
#[1] "Flood"
max(stormSumByType$economicDamage)
## [1] 161339246394
#[1] 161339246394

Storm Event Type Damage:

Cost to human life Maximum fatalities : Tornado : 5659 Maximum injuries : Tornado : 91364

Most Economic Damage Flood : 161339246394 USD

Plot1: stormTypesWithDamage Pair

plot(stormTypesWithDamage)

Initial exploration shows that there are a few events which cause most of the harm and damage. From 1950 to 2011, From previous examination of the data table we know that Tornados were the most deadly and caused significant economic damage for example.

More informative plots would be the following:

  • Event Type to Total Fatalities
  • Event Type to Injuries
  • Event Type to Crop Damage
  • Event Type to Property Damage

Used the cleaned data set, storm_data_corrected2 and the created variables of stormTypesWithDamage for the Visualizations:

#summary(storm_data_corrected2)
names(storm_data_corrected2)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
#keep to map, L8R
#storm_data_corrected2$STATE[1:10]
#storm_data_corrected2$LATITUDE[1:10]
#storm_data_corrected2$LATITUDE[1:10]
#storm_data_corrected2$REMARKS there were no remarks
names(stormTypesWithDamage)
## [1] "EVTYPE"          "totalFatalities" "totalInjuries"   "totalCasualties"
## [5] "economicDamage"

Key Assignment Plots:

human_cost <- storm_data %>% group_by(EVTYPE) %>% 
    summarize(deaths = sum(FATALITIES), 
    injury = sum(INJURIES), 
    total = sum(FATALITIES) + sum(INJURIES))
## `summarise()` ungrouping output (override with `.groups` argument)
# Top 5 categories of events that are most harmful to human populations
head(arrange(human_cost, desc(total)))
## # A tibble: 6 x 4
##   EVTYPE         deaths injury total
##   <chr>           <dbl>  <dbl> <dbl>
## 1 TORNADO          5633  91346 96979
## 2 EXCESSIVE HEAT   1903   6525  8428
## 3 TSTM WIND         504   6957  7461
## 4 FLOOD             470   6789  7259
## 5 LIGHTNING         816   5230  6046
## 6 HEAT              937   2100  3037
#Gathering the data into a tidy format for plotting
phuman_cost <- tidyr::gather(human_cost[,-4], key = casualty, value=total,2:3) %>% arrange(desc(total))
# Subsetting for the top 15 in each casualty category
split_list <- lapply(split(phuman_cost, phuman_cost$casualty), head, 15) 
phcost <- do.call(rbind.data.frame, split_list) 
row.names(phcost) <-NULL

POPstormDAM_viz <- ggplot(phcost, aes(x=reorder(EVTYPE, total), y=total)) + 
  geom_col() + facet_wrap(casualty ~.) + 
  coord_flip() + theme_light() + 
  ggtitle("Dangerous Storm Events by Total Number of Casualties") 

ggsave("plot1.png")
## Saving 7 x 5 in image
POPstormDAM_viz

From this graph we can see that Tornadoes caused 91364 total injuries, about 10 times more than the next dangerous storm events, Excessive heat, Thunderstorm Wind and Flood, which stand around 10000 injuries, see data frame ‘stormTypesWithCasualties’ for exact figures. It’s difficult to see because of the amount of Tornado caused injuries.

Summary Statistics:

#Event Type to Total Fatalities 

fatalities_by_event <- storm_data_corrected2 %>%
        add_count(EVTYPE) %>%
        summarize(fatal_sum = sum(FATALITIES), fatal_mean = mean(FATALITIES), n = mean(n))

fatalities_by_event$fatal_sum
## [1] 15145
#[1] 15145
fatalities_by_event$n
## [1] 219812
#[1] 219,812
head(fatalities_by_event)
##   fatal_sum fatal_mean      n
## 1     15145 0.01678494 219812

The total death toll stood at 15 145 for deaths caused by severe weather from 1950-2011.

#Event Type to Injuries 
injuries_by_event <- storm_data_corrected2 %>%
        add_count(EVTYPE) %>%
        summarize(injuries_sum = sum(INJURIES), injuries_mean = mean(INJURIES), n = mean(n))
head(injuries_by_event)
##   injuries_sum injuries_mean      n
## 1       140528     0.1557447 219812
injuries_by_event$injuries_sum
## [1] 140528
#[1] 140,528
head(injuries_by_event)
##   injuries_sum injuries_mean      n
## 1       140528     0.1557447 219812

There were 140528 injuries caused by severe weather from 1950-2011.

crop_property <- storm_data %>% group_by(EVTYPE)%>% 
    summarise(property = round(sum(PROPDMG)), 
              crop = round(sum(CROPDMG)), 
              total = round(sum(PROPDMG) + sum(CROPDMG)))
## `summarise()` ungrouping output (override with `.groups` argument)
# Top five categories of storm events that are most damaging to properties and crops
head(arrange(crop_property, desc(total)))
## # A tibble: 6 x 4
##   EVTYPE            property   crop   total
##   <chr>                <dbl>  <dbl>   <dbl>
## 1 TORNADO            3212258 100019 3312277
## 2 FLASH FLOOD        1420125 179200 1599325
## 3 TSTM WIND          1335966 109203 1445168
## 4 HAIL                688693 579596 1268290
## 5 FLOOD               899938 168038 1067976
## 6 THUNDERSTORM WIND   876844  66791  943636
# Subsetting for the top seven severe weather forms for property/crop damage
cp_cost <- tidyr::gather(crop_property[,-4], key = crop_prop, value=total,2:3) %>% arrange(desc(total)) 

split_list2 <- lapply(split(cp_cost, cp_cost$crop_prop), head,15) 

cp_cost <- do.call(rbind.data.frame, split_list2) 
row.names(cp_cost) <-NULL
p_cost <- cp_cost[cp_cost$crop_prop =="property", ] 
c_cost <- cp_cost[cp_cost$crop_prop =="crop", ]

# Visualizing with Bar Charts
par(mfrow = c(1,2), mar = c(8,4,3,2), mgp = c(3,1,0), cex =0.4)

barplot(p_cost$total, las =3, names.arg = p_cost$EVTYPE, 
        main ="Worst Storm Events", 
        ylab ="Damage Cost ($ billions)", 
        col ="grey85") 

barplot(c_cost$total, las =3, names.arg = c_cost$EVTYPE, 
        ylab ="Damage Cost ($ billions)", 
        col ="grey85")

All code and intermediate results are hosted at this Github repository feel free to fork and download this project to collaborate, and/or contact me. for further information.