Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.3
library("gridExtra")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:gridExtra':
##
## combine
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(knitr)
library(lubridate)
## Warning: package 'lubridate' was built under R version 4.0.3
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
The source data file is downloaded from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2. Comprehensive documentation for the dataset is available: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pd
dataset_url <- “https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2” download.file(dataset_url, “StormData.csv.bz2”) storm <- read.csv(“StormData.csv.bz2”)
setwd("C:/Users/Inspiron 5537pro/Desktop/Project/Reproducible_Research_P")
storm <- read.csv("repdata_data_StormData.csv")
There are 902.297 observations with 37 variables in the raw file. Only a subset is required for the analysis as: 1. Relevant for the analysis are the starting date (BGN_DATE), event type (EVTYPE), counter for the health impact (FATALITIES and INJURIES), monetary impact on crop and property (PROPDMG and CROPDMG) as well as their corresponding units/exponents (PROPDMGEXP and CROPDMGEXP). 2. According to the NOAA ([https://www.ncdc.noaa.gov/stormevents/details.jsp]) the full set of wheather events is available since 1996 only. Between 1950 and 1995 only a subset (Tornado, Thunderstorm, Wind and Hail) of these events is available in the storm database. In order to have a comparable basis for the analysis, the dataset is limited to the observations posted between 1996 and 2011.
# select the required fields only
stormsub <- select(storm, BGN_DATE, EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP, FATALITIES, INJURIES)
# Format the BGN_DATE variable as a date
stormsub$BGN_DATE <- as.Date(stormsub$BGN_DATE, "%m/%d/%Y")
stormsub$YEAR <- year(stormsub$BGN_DATE)
# Only use events since 1996
stormsub2 <- filter(stormsub, YEAR >= 1996)
# start looking at what the data look like
dim(stormsub2)
## [1] 653530 9
summary(stormsub2)
## BGN_DATE EVTYPE PROPDMG PROPDMGEXP
## Min. :1996-01-01 Length:653530 Min. : 0.00 Length:653530
## 1st Qu.:2000-11-21 Class :character 1st Qu.: 0.00 Class :character
## Median :2005-05-14 Mode :character Median : 0.00 Mode :character
## Mean :2004-10-25 Mean : 11.69
## 3rd Qu.:2008-08-22 3rd Qu.: 1.00
## Max. :2011-11-30 Max. :5000.00
## CROPDMG CROPDMGEXP FATALITIES INJURIES
## Min. : 0.000 Length:653530 Min. : 0.00000 Min. :0.00e+00
## 1st Qu.: 0.000 Class :character 1st Qu.: 0.00000 1st Qu.:0.00e+00
## Median : 0.000 Mode :character Median : 0.00000 Median :0.00e+00
## Mean : 1.839 Mean : 0.01336 Mean :8.87e-02
## 3rd Qu.: 0.000 3rd Qu.: 0.00000 3rd Qu.:0.00e+00
## Max. :990.000 Max. :158.00000 Max. :1.15e+03
## YEAR
## Min. :1996
## 1st Qu.:2000
## Median :2005
## Mean :2004
## 3rd Qu.:2008
## Max. :2011
This subset file contains 653,530 observations with 9 variables. # Cleaning data
We first want to delete rows where observations of FATALITIES, INJURIES and DAMAGES are positive or equal to zero.
stormsub2 <- filter(stormsub2, PROPDMG>=0 & CROPDMG >=0 & FATALITIES >=0 & INJURIES>=0)
length(unique(stormsub2$EVTYPE))
## [1] 516
unique(stormsub2$EVTYPE)
## [1] "WINTER STORM" "TORNADO"
## [3] "TSTM WIND" "HAIL"
## [5] "HIGH WIND" "HEAVY RAIN"
## [7] "FLASH FLOOD" "FREEZING RAIN"
## [9] "EXTREME COLD" "EXCESSIVE HEAT"
## [11] "LIGHTNING" "FUNNEL CLOUD"
## [13] "EXTREME WINDCHILL" "BLIZZARD"
## [15] "URBAN/SML STREAM FLD" "FLOOD"
## [17] "TSTM WIND/HAIL" "WATERSPOUT"
## [19] "RIP CURRENTS" "HEAVY SNOW"
## [21] "Other" "Record dry month"
## [23] "Temperature record" "WILD/FOREST FIRE"
## [25] "Minor Flooding" "ICE STORM"
## [27] "STORM SURGE" "Ice jam flood (minor"
## [29] "High Wind" "DUST STORM"
## [31] "STRONG WIND" "DUST DEVIL"
## [33] "Tstm Wind" "DROUGHT"
## [35] "DRY MICROBURST" "FOG"
## [37] "ROUGH SURF" "Wind"
## [39] "THUNDERSTORMS" "Heavy Surf"
## [41] "HEAVY SURF" "Dust Devil"
## [43] "Wind Damage" "Marine Accident"
## [45] "Snow" "AVALANCHE"
## [47] "Freeze" "TROPICAL STORM"
## [49] "Snow Squalls" "Coastal Flooding"
## [51] "Heavy Rain" "Strong Wind"
## [53] "WINDS" "WIND"
## [55] "COASTAL FLOOD" "COASTAL STORM"
## [57] "COASTALFLOOD" "Erosion/Cstl Flood"
## [59] "Heavy Rain and Wind" "Light Snow/Flurries"
## [61] "Wet Month" "Wet Year"
## [63] "Tidal Flooding" "River Flooding"
## [65] "SNOW" "DAMAGING FREEZE"
## [67] "Damaging Freeze" "HURRICANE"
## [69] "Beach Erosion" "Hot and Dry"
## [71] "Flood/Flash Flood" "Icy Roads"
## [73] "High Surf" "Heavy Rain/High Surf"
## [75] "HIGH SURF" "Thunderstorm Wind"
## [77] "Rain Damage" "ICE JAM"
## [79] "Unseasonable Cold" "Early Frost"
## [81] "Wintry Mix" "blowing snow"
## [83] "STREET FLOODING" "Record Cold"
## [85] "Extreme Cold" "Ice Fog"
## [87] "Excessive Cold" "Torrential Rainfall"
## [89] "Freezing Rain" "Landslump"
## [91] "Late-season Snowfall" "Hurricane Edouard"
## [93] "Coastal Storm" "Flood"
## [95] "HEAVY RAIN/WIND" "TIDAL FLOODING"
## [97] "Winter Weather" "Snow squalls"
## [99] "Strong Winds" "Strong winds"
## [101] "RECORD WARM TEMPS." "Ice/Snow"
## [103] "Mudslide" "Glaze"
## [105] "Extended Cold" "Snow Accumulation"
## [107] "Freezing Fog" "Drifting Snow"
## [109] "Whirlwind" "Heavy snow shower"
## [111] "Heavy rain" "COASTAL FLOODING"
## [113] "LATE SNOW" "Record May Snow"
## [115] "Record Winter Snow" "Heavy Precipitation"
## [117] " COASTAL FLOOD" "Record temperature"
## [119] "Light snow" "Late Season Snowfall"
## [121] "Gusty Wind" "small hail"
## [123] "Light Snow" "MIXED PRECIP"
## [125] "Black Ice" "Mudslides"
## [127] "Gradient wind" "Snow and Ice"
## [129] "COLD" "Freezing Spray"
## [131] "DOWNBURST" "Summary Jan 17"
## [133] "Summary of March 14" "Summary of March 23"
## [135] "Summary of March 24" "Summary of April 3rd"
## [137] "Summary of April 12" "Summary of April 13"
## [139] "Summary of April 21" "Summary August 11"
## [141] "Summary of April 27" "Summary of May 9-10"
## [143] "Summary of May 10" "Summary of May 13"
## [145] "Summary of May 14" "Summary of May 22 am"
## [147] "Summary of May 22 pm" "Heatburst"
## [149] "Summary of May 26 am" "Summary of May 26 pm"
## [151] "Metro Storm, May 26" "Summary of May 31 am"
## [153] "Summary of May 31 pm" "Summary of June 3"
## [155] "Summary of June 4" "Summary June 5-6"
## [157] "Summary June 6" "Summary of June 11"
## [159] "Summary of June 12" "Summary of June 13"
## [161] "Summary of June 15" "Summary of June 16"
## [163] "Summary June 18-19" "Summary of June 23"
## [165] "Summary of June 24" "Summary of June 30"
## [167] "Summary of July 2" "Summary of July 3"
## [169] "Summary of July 11" "Summary of July 22"
## [171] "Summary July 23-24" "Summary of July 26"
## [173] "Summary of July 29" "Summary of August 1"
## [175] "Summary August 2-3" "Summary August 7"
## [177] "Summary August 9" "Summary August 10"
## [179] "Summary August 17" "Summary August 21"
## [181] "Summary August 28" "Summary September 4"
## [183] "Summary September 20" "Summary September 23"
## [185] "Summary Sept. 25-26" "Summary: Oct. 20-21"
## [187] "Summary: October 31" "Summary: Nov. 6-7"
## [189] "Summary: Nov. 16" "Microburst"
## [191] "wet micoburst" "HAIL/WIND"
## [193] "Hail(0.75)" "Funnel Cloud"
## [195] "Urban Flooding" "No Severe Weather"
## [197] "Urban flood" "Urban Flood"
## [199] "Cold" "WINTER WEATHER"
## [201] "Summary of May 22" "Summary of June 6"
## [203] "Summary August 4" "Summary of June 10"
## [205] "Summary of June 18" "Summary September 3"
## [207] "Summary: Sept. 18" "Coastal Flood"
## [209] "coastal flooding" "Small Hail"
## [211] "Record Temperatures" "Light Snowfall"
## [213] "Freezing Drizzle" "Gusty wind/rain"
## [215] "GUSTY WIND/HVY RAIN" "Blowing Snow"
## [217] "Early snowfall" "Monthly Snowfall"
## [219] "Record Heat" "Seasonal Snowfall"
## [221] "Monthly Rainfall" "Cold Temperature"
## [223] "Sml Stream Fld" "Heat Wave"
## [225] "MUDSLIDE/LANDSLIDE" "Saharan Dust"
## [227] "Volcanic Ash" "Volcanic Ash Plume"
## [229] "Thundersnow shower" "NONE"
## [231] "COLD AND SNOW" "DAM BREAK"
## [233] "RAIN" "RAIN/SNOW"
## [235] "OTHER" "FREEZE"
## [237] "TSTM WIND (G45)" "RECORD WARMTH"
## [239] "STRONG WINDS" "FREEZING DRIZZLE"
## [241] "UNSEASONABLY WARM" "SLEET/FREEZING RAIN"
## [243] "BLACK ICE" "WINTRY MIX"
## [245] "BLOW-OUT TIDES" "UNSEASONABLY COLD"
## [247] "UNSEASONABLY COOL" "TSTM HEAVY RAIN"
## [249] "UNSEASONABLY DRY" "Gusty Winds"
## [251] "GUSTY WIND" "TSTM WIND 40"
## [253] "TSTM WIND 45" "HARD FREEZE"
## [255] "TSTM WIND (41)" "HEAT"
## [257] "RIVER FLOOD" "TSTM WIND (G40)"
## [259] "RIP CURRENT" "TSTM WND"
## [261] "DENSE FOG" "Wintry mix"
## [263] " TSTM WIND" "MUD SLIDE"
## [265] "MUDSLIDES" "MUDSLIDE"
## [267] "Frost" "Frost/Freeze"
## [269] "SNOW AND ICE" "WIND DAMAGE"
## [271] "RAIN (HEAVY)" "Record Warmth"
## [273] "Prolong Cold" "Cold and Frost"
## [275] "RECORD COLD" "PROLONG COLD"
## [277] "AGRICULTURAL FREEZE" "URBAN/SML STREAM FLDG"
## [279] "SNOW SQUALL" "HEAVY SNOW SQUALLS"
## [281] "SNOW/ICE" "GUSTY WINDS"
## [283] "SMALL HAIL" "SNOW SQUALLS"
## [285] "LAKE EFFECT SNOW" "STRONG WIND GUST"
## [287] "LATE FREEZE" "RECORD TEMPERATURES"
## [289] "ICY ROADS" "RECORD SNOWFALL"
## [291] "BLOW-OUT TIDE" "THUNDERSTORM"
## [293] "Hypothermia/Exposure" "HYPOTHERMIA/EXPOSURE"
## [295] "Lake Effect Snow" "Mixed Precipitation"
## [297] "Record High" "COASTALSTORM"
## [299] "LIGHT SNOW" "Snow and sleet"
## [301] "Freezing rain" "Gusty winds"
## [303] "FUNNEL CLOUDS" "WATERSPOUTS"
## [305] "Blizzard Summary" "FROST"
## [307] "ICE" "SUMMARY OF MARCH 24-25"
## [309] "SUMMARY OF MARCH 27" "SUMMARY OF MARCH 29"
## [311] "GRADIENT WIND" "Icestorm/Blizzard"
## [313] "Flood/Strong Wind" "TSTM WIND AND LIGHTNING"
## [315] "gradient wind" "SEVERE THUNDERSTORMS"
## [317] "EXCESSIVE RAIN" "Freezing drizzle"
## [319] "Mountain Snows" "URBAN/SMALL STRM FLDG"
## [321] "WET MICROBURST" "Heavy surf and wind"
## [323] "Mild and Dry Pattern" "COLD AND FROST"
## [325] "RECORD HEAT" "TYPHOON"
## [327] "LANDSLIDES" "HIGH SWELLS"
## [329] "HIGH SWELLS" "VOLCANIC ASH"
## [331] "HIGH WINDS" "DRY SPELL"
## [333] " LIGHTNING" "BEACH EROSION"
## [335] "UNSEASONAL RAIN" "EARLY RAIN"
## [337] "PROLONGED RAIN" "WINTERY MIX"
## [339] "COASTAL FLOODING/EROSION" "UNSEASONABLY WET"
## [341] "HOT SPELL" "HEAT WAVE"
## [343] "UNSEASONABLY HOT" "UNSEASONABLY WARM AND DRY"
## [345] " TSTM WIND (G45)" "TSTM WIND (G45)"
## [347] "HIGH WIND (G40)" "TSTM WIND (G35)"
## [349] "DRY WEATHER" "TSTM WINDS"
## [351] "FREEZING RAIN/SLEET" "ABNORMAL WARMTH"
## [353] "UNUSUAL WARMTH" "GLAZE"
## [355] "WAKE LOW WIND" "MONTHLY RAINFALL"
## [357] "COLD TEMPERATURES" "COLD WIND CHILL TEMPERATURES"
## [359] "MODERATE SNOW" "MODERATE SNOWFALL"
## [361] "URBAN/STREET FLOODING" "COASTAL EROSION"
## [363] "UNUSUAL/RECORD WARMTH" "BITTER WIND CHILL"
## [365] "BITTER WIND CHILL TEMPERATURES" "SEICHE"
## [367] "TSTM" "COASTAL FLOODING/EROSION"
## [369] "SNOW DROUGHT" "UNSEASONABLY WARM YEAR"
## [371] "HYPERTHERMIA/EXPOSURE" "SNOW/SLEET"
## [373] "ROCK SLIDE" "ICE PELLETS"
## [375] "URBAN FLOOD" "PATCHY DENSE FOG"
## [377] "RECORD COOL" "RECORD WARM"
## [379] "HOT WEATHER" "RIVER FLOODING"
## [381] "RECORD TEMPERATURE" "SAHARAN DUST"
## [383] "TROPICAL DEPRESSION" "VOLCANIC ERUPTION"
## [385] "COOL SPELL" "WIND ADVISORY"
## [387] "GUSTY WIND/HAIL" "RED FLAG FIRE WX"
## [389] "FIRST FROST" "EXCESSIVELY DRY"
## [391] "HEAVY SEAS" "FLASH FLOOD/FLOOD"
## [393] "SNOW AND SLEET" "LIGHT SNOW/FREEZING PRECIP"
## [395] "VOG" "EXCESSIVE RAINFALL"
## [397] "FLASH FLOODING" "MONTHLY PRECIPITATION"
## [399] "MONTHLY TEMPERATURE" "RECORD DRYNESS"
## [401] "EXTREME WINDCHILL TEMPERATURES" "MIXED PRECIPITATION"
## [403] "EXTREME WIND CHILL" "DRY CONDITIONS"
## [405] "HEAVY RAINFALL" "REMNANTS OF FLOYD"
## [407] "EARLY SNOWFALL" "FREEZING FOG"
## [409] "LANDSPOUT" "DRIEST MONTH"
## [411] "RECORD COLD" "LATE SEASON HAIL"
## [413] "EXCESSIVE SNOW" "WINTER MIX"
## [415] "DRYNESS" "FLOOD/FLASH/FLOOD"
## [417] "WIND AND WAVE" "SEVERE THUNDERSTORM"
## [419] "LIGHT FREEZING RAIN" " WIND"
## [421] "MONTHLY SNOWFALL" "DRY"
## [423] "RECORD RAINFALL" "RECORD PRECIPITATION"
## [425] "ICE ROADS" "HIGH SEAS"
## [427] "SLEET" "ROUGH SEAS"
## [429] "UNSEASONABLY WARM/WET" "UNSEASONABLY COOL & WET"
## [431] "UNUSUALLY WARM" "TSTM WIND G45"
## [433] "NON SEVERE HAIL" "RECORD SNOW"
## [435] "SNOW/FREEZING RAIN" "SNOW/BLOWING SNOW"
## [437] "NON-SEVERE WIND DAMAGE" "UNUSUALLY COLD"
## [439] "WARM WEATHER" "LANDSLUMP"
## [441] "THUNDERSTORM WIND (G40)" "LANDSLIDE"
## [443] "WALL CLOUD" "HIGH WATER"
## [445] "UNSEASONABLY WARM & WET" " FLASH FLOOD"
## [447] "LOCALLY HEAVY RAIN" "WIND GUSTS"
## [449] "UNSEASONAL LOW TEMP" "HIGH SURF ADVISORY"
## [451] "LATE SEASON SNOW" "GUSTY LAKE WIND"
## [453] "ABNORMALLY DRY" "WINTER WEATHER MIX"
## [455] "RED FLAG CRITERIA" "WND"
## [457] "CSTL FLOODING/EROSION" "SMOKE"
## [459] " WATERSPOUT" "SNOW ADVISORY"
## [461] "EXTREMELY WET" "UNUSUALLY LATE SNOW"
## [463] "VERY DRY" "RECORD LOW RAINFALL"
## [465] "ROGUE WAVE" "SNOWMELT FLOODING"
## [467] "PROLONG WARMTH" "ACCUMULATED SNOWFALL"
## [469] "FALLING SNOW/ICE" "DUST DEVEL"
## [471] "NON-TSTM WIND" "NON TSTM WIND"
## [473] "BRUSH FIRE" "GUSTY THUNDERSTORM WINDS"
## [475] "PATCHY ICE" "SNOW SHOWERS"
## [477] "HEAVY RAIN EFFECTS" "BLOWING DUST"
## [479] "EXCESSIVE HEAT/DROUGHT" "NORTHERN LIGHTS"
## [481] "MARINE TSTM WIND" " HIGH SURF ADVISORY"
## [483] "WIND CHILL" "HAZARDOUS SURF"
## [485] "WILDFIRE" "FROST/FREEZE"
## [487] "WINTER WEATHER/MIX" "ASTRONOMICAL HIGH TIDE"
## [489] "COLD WEATHER" "WHIRLWIND"
## [491] "VERY WARM" "ABNORMALLY WET"
## [493] "TORNADO DEBRIS" "EXTREME COLD/WIND CHILL"
## [495] "ICE ON ROAD" "FIRST SNOW"
## [497] "ICE/SNOW" "DROWNING"
## [499] "GUSTY THUNDERSTORM WIND" "MARINE HAIL"
## [501] "HIGH SURF ADVISORIES" "HURRICANE/TYPHOON"
## [503] "HEAVY SURF/HIGH SURF" "SLEET STORM"
## [505] "STORM SURGE/TIDE" "COLD/WIND CHILL"
## [507] "LAKE-EFFECT SNOW" "MARINE HIGH WIND"
## [509] "THUNDERSTORM WIND" "TSUNAMI"
## [511] "DENSE SMOKE" "LAKESHORE FLOOD"
## [513] "MARINE THUNDERSTORM WIND" "MARINE STRONG WIND"
## [515] "ASTRONOMICAL LOW TIDE" "VOLCANIC ASHFALL"
Some difference are caused by upper and lower cases as well as leading or trailing whitespace(s), shlashes or hyphens between two words, etc. We scanned through the most frequent event types as well as most obvious approximations (like Coastal flood, Coastal flood / erosion, coastal flooding…) or abbreviations (TSTM for Thunderstorm). We then regroup several key words under more global categories. We might regroup further, for example with ICE / SNOW / WINTER under a broad WINTER category, but we prefer to stay with a few more ie more specific types.
stormsub3 <- mutate(stormsub2, EVTYPE = toupper(trimws(EVTYPE, which = "both", whitespace = "[ \t\r\n]")))
stormsub3$EVTYPE <- gsub(".*WINTER.*", "WINTER", stormsub3$EVTYPE)
stormsub3$EVTYPE <- gsub(".*(COLD|COOL|HYPOTHERM).*", "COLD", stormsub3$EVTYPE)
stormsub3$EVTYPE <- gsub(".*(HAIL|SLEET).*", "HAIL", stormsub3$EVTYPE)
stormsub3$EVTYPE <- gsub(".*(ICE|ICY|FRO*ST|FREEZ).*", "ICE", stormsub3$EVTYPE)
stormsub3$EVTYPE <- gsub(".*(SNOW|AVALANCH|BLIZZARD|WINTE*R).*", "SNOW", stormsub3$EVTYPE)
stormsub3$EVTYPE <- gsub(".*(FIRE|SMOKE).*", "FIRE", stormsub3$EVTYPE)
stormsub3$EVTYPE <- gsub(".*(SLIDE|MUD|LANDSL).*", "LANDSLIDE", stormsub3$EVTYPE)
stormsub3$EVTYPE <- gsub(".*(FLO*D|FLOYD|DAM).*", "FLOOD", stormsub3$EVTYPE)
stormsub3$EVTYPE <- gsub(".*(HEAT|HOT|TEMPERATUR|WARM|HYPERTHERM|RECORD HIGH).*", "HEAT", stormsub3$EVTYPE)
stormsub3$EVTYPE <- gsub(".*(DRY|DRIEST|DROUGHT|SEICHE).*", "DROUGHT", stormsub3$EVTYPE)
stormsub3$EVTYPE <- gsub(".*(STO*RM|TORNADO|DEPRESSION|CYCLON|TYPHOON|HURRICAN|BURST).*", "TORNADO", stormsub3$EVTYPE)
stormsub3$EVTYPE <- gsub(".*(WI*ND).*", "WIND", stormsub3$EVTYPE)
stormsub3$EVTYPE <- gsub(".*(RAIN|PRECIP|WET|THUNDERST|TSTM).*", "RAIN", stormsub3$EVTYPE)
stormsub3$EVTYPE <- gsub(".*(SEA|WAVE|TSUNAMI|SWELL|SURF|TIDE|CURRENT|BEACH|COAST|DROWN|MARIN).*", "SEA", stormsub3$EVTYPE)
stormsub3$EVTYPE <- gsub(".*(FOG|VOG|CLOUD).*", "FOG", stormsub3$EVTYPE)
stormsub3$EVTYPE <- gsub(".*VOLCA.*", "VOLCANO", stormsub3$EVTYPE)
stormsub3$EVTYPE <- gsub(".*DUST.*", "DUST", stormsub3$EVTYPE)
stormsub3$EVTYPE <- gsub(".*(SUMMARY|MONTHLY|NONE).*", "OTHER", stormsub3$EVTYPE)
length(unique(stormsub3$EVTYPE))
## [1] 26
unique(stormsub3$EVTYPE)
## [1] "SNOW" "TORNADO" "WIND"
## [4] "HAIL" "RAIN" "FLOOD"
## [7] "ICE" "COLD" "HEAT"
## [10] "LIGHTNING" "FOG" "WATERSPOUT"
## [13] "SEA" "OTHER" "DROUGHT"
## [16] "FIRE" "DUST" "LANDSLIDE"
## [19] "GLAZE" "NO SEVERE WEATHER" "VOLCANO"
## [22] "WATERSPOUTS" "LANDSPOUT" "HIGH WATER"
## [25] "RED FLAG CRITERIA" "NORTHERN LIGHTS"
We have reduced the number of event types to 26.
We now want to check damage amounts. There are two columns, one containing a figure and the second containing the unit. Let’s check the unit:
unique(stormsub3$CROPDMGEXP)
## [1] "K" "" "M" "B"
unique(stormsub3$PROPDMGEXP)
## [1] "K" "" "M" "B" "0"
We have null values and some “0”, which does not really matter as a quick exploration of the data reveals they go together nil amounts in the damage columns.
The meaning of units/components is the following: * K or k: thousand dollars (10^3) * M or m: million dollars (10^6) * B or b: billion dollars (10^9)
We then replace these letters (upper or lower case) by the 10^x item then create a new column to calculate the total amount of damage:
stormsub3 <- mutate(stormsub3,
PROPDAMAGE = ifelse(PROPDMGEXP == "B", PROPDMG * 10^9, ifelse(
PROPDMGEXP == "M", PROPDMG * 10^6, PROPDMG * 10^3
)
),
CROPDAMAGE = ifelse(CROPDMGEXP == "B", CROPDMG * 10^9, ifelse(
CROPDMGEXP == "M", CROPDMG * 10^6, CROPDMG * 10^3
)
),
TOTDAMAGE = PROPDAMAGE + CROPDAMAGE
)
head(stormsub3)
## BGN_DATE EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP FATALITIES INJURIES
## 1 1996-01-06 SNOW 380 K 38 K 0 0
## 2 1996-01-11 TORNADO 100 K 0 0 0
## 3 1996-01-11 WIND 3 K 0 0 0
## 4 1996-01-11 WIND 5 K 0 0 0
## 5 1996-01-11 WIND 2 K 0 0 0
## 6 1996-01-18 HAIL 0 0 0 0
## YEAR PROPDAMAGE CROPDAMAGE TOTDAMAGE
## 1 1996 380000 38000 418000
## 2 1996 100000 0 100000
## 3 1996 3000 0 3000
## 4 1996 5000 0 5000
## 5 1996 2000 0 2000
## 6 1996 0 0 0
We now have a clean databse.
To figure out which events cause 1/ the most fatalities and injuries and 2/ the largest economic impact, we need to extract two new datasets from our clean database. This means aggregating health impact and economic impact per event type, then sorting the date in a descending order.
Impact on the population health in terms of fatalities and injuries:
fatalities <- aggregate(FATALITIES ~ EVTYPE, stormsub3, sum)
fatalities10 <- fatalities[order(-fatalities$FATALITIES), ][1:10, ]
injuries<-aggregate(INJURIES ~ EVTYPE, stormsub3, sum)
injuries10 <- injuries[order(-injuries$INJURIES), ][1:10, ]
Economic impact in terms of damage on properties and crops:
ecocost <- aggregate(TOTDAMAGE ~ EVTYPE, stormsub3, sum)
ecocost <- transform(ecocost, TOTDAMAGE=TOTDAMAGE/10^9)
ecocost <- transform(ecocost, TOTDAMAGE=round(TOTDAMAGE,0))
ecocost10 <- ecocost[order(-ecocost$TOTDAMAGE), ][1:10, ]
Hail, wind and tornadoes and hail were the most frequent severe weather events in the US between 1996 and 2011.
sort(table(stormsub3$EVTYPE), decreasing = TRUE)[1:10]
##
## HAIL WIND TORNADO FLOOD SNOW LIGHTNING RAIN FOG
## 209335 159429 112271 79591 37985 13204 11697 7798
## FIRE ICE
## 4199 3700
fatalities10
## EVTYPE FATALITIES
## 9 HEAT 2037
## 22 TORNADO 1863
## 5 FLOOD 1337
## 20 SEA 732
## 21 SNOW 664
## 26 WIND 655
## 14 LIGHTNING 651
## 1 COLD 379
## 11 ICE 96
## 18 RAIN 96
injuries10
## EVTYPE INJURIES
## 22 TORNADO 24180
## 5 FLOOD 8527
## 9 HEAT 7702
## 26 WIND 5148
## 14 LIGHTNING 4141
## 21 SNOW 3147
## 4 FIRE 1458
## 20 SEA 888
## 6 FOG 856
## 8 HAIL 818
G1<-ggplot(data=fatalities10, aes(x=reorder(EVTYPE, FATALITIES),y =FATALITIES))+ coord_flip() +geom_bar(fill="violet",stat="identity")+labs(title = "Top 10 Fatality causing Events in US",x = "Weather Event", y ="Number of Fatalities")
G2 = ggplot(data=injuries10,aes(x=reorder(EVTYPE, INJURIES),y =INJURIES))+coord_flip()+geom_bar(fill = "green",stat = "identity")+labs(title = "Top 10 Injury causing Events in US",x = "Weather Event", y=" Number of Injuries")
# Draw two plots generated above dividing space in two columns
grid.arrange(G1, G2, nrow = 2)
Tornadoes, and floods & heat to a lesser extent, were the severe weather events causing the most fatalities and injuries in the US between 1996 and 2011.
Let’s vizualize results as table and plot:
G3<- ggplot(ecocost10, aes(x = reorder(EVTYPE, -TOTDAMAGE), y = TOTDAMAGE, fill="green")) + coord_flip()+ geom_bar(stat = "identity", show.legend = F) + theme(axis.text.x = element_text(angle = 30, hjust = 1)) + labs(x = "Weather Events", y = "Economic impact (USDbn)", title = "Top 10 weather events causing economic impact")
print(G3)