Author : Ralston Fonseca Date : 09 Sep 2018 Version : 1.0
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This document analyses the US NOAA Storm database to show the effect of these events.
Kindly scroll down for the steps and details of the analysis. It will provide crucial data for taking preventive actions against these events.
The raw data mentioned in later sections was tidied to get the results.
The top 10 most harmful events (in descending order) were: Tornado, Heat, Floods, Thunderstorm, Lightning, Winds, Ice Storm, Tropical Cyclone, Wildfire and Winter Storm.
The top 10 events with most economic consequences (in descending order) were: Floods(180B), Tropical Cyclone(99B), Tornado(59B), Storm Surge/ Tide(48B), Hail(19B), Drought(15B), Thunderstorm(12B), Ice Storm(9B), Wildfire(9B) and Winds(7B).
Note: Values may not match exactly due to rounding
sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: i386-w64-mingw32/i386 (32-bit)
## Running under: Windows 7 (build 7601) Service Pack 1
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252
## [3] LC_MONETARY=English_India.1252 LC_NUMERIC=C
## [5] LC_TIME=English_India.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] compiler_3.5.1 backports_1.1.2 magrittr_1.5 rprojroot_1.3-2
## [5] tools_3.5.1 htmltools_0.3.6 yaml_2.2.0 Rcpp_0.12.18
## [9] stringi_1.1.7 rmarkdown_1.10 knitr_1.20 stringr_1.3.1
## [13] digest_0.6.15 evaluate_0.11
library(ggplot2)
The data for this assignment comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. It can be downloaded from the web site:
Storm Data [47Mb] There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
Following code was used to download and read the data from local working directory. It was then transformed. The comments in the code provide further details.
dataURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
fileName <- "StormData.csv.bz2"
# if file does not exist download the file
if(!file.exists(fileName)) {
download.file(url=dataURL,destfile=fileName,mode = "wb")
}
bzFile <- bzfile(fileName)
# read the comma seperated file
stormDF <- read.table(bzFile, header = TRUE, sep = ",", na.strings="NA", stringsAsFactors = FALSE)
# dim 902297 37
str(stormDF)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
Selecting variables neeed for analysis and tidying the column names. Discarding data which cannot be used.
# Select only variables needed for the analysis
dataSubset <- stormDF[,c(1,8,23,24,25,26,27,28)]
# Column names are in RAW form
names(dataSubset)
## [1] "STATE__" "EVTYPE" "FATALITIES" "INJURIES" "PROPDMG"
## [6] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP"
# Give descriptive column names
colnames(dataSubset) <- c("State","EventType","Fatalities","Injuries","PropertyDamage","PropertyDamageExp","CropDamage","CropDamageExp")
# Eliminate rows having label - Summary, in EVentType variable.
dataSubset <- subset(dataSubset,!grepl("summary",dataSubset$EventType,ignore.case = TRUE))
The labels in EventType variable are in RAW form. Transforming and grouping labels into different groups. There were many spelling mistakes which are taken into consideration during the transformation.
# DO NOT CHANGE the order or CASE SENSITIVITY in the transformation.
dataSubset$EventType[grep("^(Astronomical High Tide)|BLOW-OUT TIDE|HIGH TIDES",
dataSubset$EventType, ignore.case = TRUE)] <- "astronomical.high.tide"
dataSubset$EventType[grep("^(Astronomical Low Tide)",
dataSubset$EventType, ignore.case = TRUE)] <- "astronomical.low.tide"
dataSubset$EventType[grep("Avalan",dataSubset$EventType, ignore.case = TRUE)] <- "avalanche"
dataSubset$EventType[grep("^(Blizzard)",dataSubset$EventType, ignore.case = TRUE)] <- "blizzard"
dataSubset$EventType[grep("^(Debris Flow)",dataSubset$EventType, ignore.case = TRUE)] <- "debris.flow"
dataSubset$EventType[grep("^(Dense Fog)",dataSubset$EventType, ignore.case = TRUE)] <- "dense.fog"
dataSubset$EventType[grep("^(Dense Smoke)|smoke",dataSubset$EventType, ignore.case = TRUE)] <- "dense.smoke"
dataSubset$EventType[grep("^(Drought)|LOW RAINFALL|dry|BELOW NORMAL PRECIPITATION|DRIEST MONTH",
dataSubset$EventType, ignore.case = TRUE)] <- "drought"
dataSubset$EventType[grep("Dust",dataSubset$EventType, ignore.case = TRUE)] <- "dust.storm"
dataSubset$EventType[grep("Funnel",dataSubset$EventType, ignore.case = TRUE)] <- "funnel.cloud"
dataSubset$EventType[grep("fog|VOG",dataSubset$EventType,ignore.case = TRUE)] <- "fog"
dataSubset$EventType[grep("^(Heavy Rain)|rain|wet|PRECIP|shower|DOWNBURST",
dataSubset$EventType, ignore.case = TRUE)] <- "heavy.rain"
dataSubset$EventType[grep("^(High Surf)|surf|wave",dataSubset$EventType, ignore.case = TRUE)] <- "high.surf"
dataSubset$EventType[grep("^(Hail)",dataSubset$EventType, ignore.case = TRUE)] <- "hail"
dataSubset$EventType[grep("^(Ice Storm)|ICE/STRONG WINDS|ICE STORM",
dataSubset$EventType, ignore.case = TRUE)] <- "ice.storm"
dataSubset$EventType[grep("ICE|Ice|Icy|ICY",dataSubset$EventType)] <- "ice.others" # Case sensitive
# make is snw to eliminate from snow search
dataSubset$EventType[grep("^(Lake-Effect Snow)|LAKE SNOW|Lake Effect Snow",
dataSubset$EventType, ignore.case = TRUE)] <- "lake.effect.snw"
dataSubset$EventType[grep("^(Heavy Snow)|snow",dataSubset$EventType, ignore.case = TRUE)] <- "snow.related"
dataSubset$EventType[grep("lake.effect.snw",
dataSubset$EventType, ignore.case = TRUE)] <- "lake.effect.snow" # rename it again
dataSubset$EventType[grep("^(Lakeshore Flood)",dataSubset$EventType, ignore.case = TRUE)] <- "lakeshore.flood"
dataSubset$EventType[grep("^(Lightning)|lightn|lighti|lign",
dataSubset$EventType, ignore.case = TRUE)] <- "lightning"
dataSubset$EventType[grep("^(Marine Hail)",dataSubset$EventType, ignore.case = TRUE)] <- "marine.hail"
dataSubset$EventType[grep("^(Marine High Wind)|^(Marine Strong Wind)|HIGH WIND AND HIGH TIDES|WIND/SEAS",dataSubset$EventType, ignore.case = TRUE)] <- "marine.wnd"
dataSubset$EventType[grep(" SEA|marine ",dataSubset$EventType, ignore.case = TRUE)] <- "marine.others"
dataSubset$EventType[grep("^(Marine Thunderstorm Wind)|MARINE TSTM WIND|Coastal storm|coastalstorm",
dataSubset$EventType, ignore.case = TRUE)] <- "marine.thunderstorm"
dataSubset$EventType[grep("^(Rip Current)",dataSubset$EventType, ignore.case = TRUE)] <- "rip.current"
dataSubset$EventType[grep("^(Seiche)",dataSubset$EventType, ignore.case = TRUE)] <- "seiche"
dataSubset$EventType[grep("^(Storm Surge/Tide)|surge",
dataSubset$EventType, ignore.case = TRUE)] <- "storm.surge.tide"
dataSubset$EventType[grep("Torn",dataSubset$EventType, ignore.case = TRUE)] <- "tornado"
dataSubset$EventType[grep("wall cloud",dataSubset$EventType, ignore.case = TRUE)] <- "wall.cloud"
dataSubset$EventType[grep("^(Tropical Depression)",
dataSubset$EventType, ignore.case = TRUE)] <- "tropical.depression"
dataSubset$EventType[grep("^(Tropical Storm)|hurricane|typhoon",
dataSubset$EventType, ignore.case = TRUE)] <- "tropical.cyclone"
dataSubset$EventType[grep("^(Tsunami)",dataSubset$EventType, ignore.case = TRUE)] <- "tsunami"
dataSubset$EventType[grep("^(Volcanic Ash)|volcanic",
dataSubset$EventType, ignore.case = TRUE)] <- "volcanic.ash"
dataSubset$EventType[grep("^(Waterspout)|spout",dataSubset$EventType, ignore.case = TRUE)] <- "waterspout"
dataSubset$EventType[grep("^(Winter Storm)|RECORD COLD AND HIGH WIND",
dataSubset$EventType, ignore.case = TRUE)] <- "winter.storm"
dataSubset$EventType[grep("^(Winter Weather)|wintry|LOW TEMP|COLD TEMPERATURE| cold|cool|record low|MIX",
dataSubset$EventType, ignore.case = TRUE)] <- "winter.weather"
dataSubset$EventType[grep("COLD|Cold",dataSubset$EventType)] <-"winter.weather" # case sensitive
dataSubset$EventType[grep("NON[ -]TSTM Wind",dataSubset$EventType, ignore.case = TRUE)] <- "winds"
dataSubset$EventType[grep("^Thunderstorm|TSTM|THUNDEERSTORM| THUNDERSTORM|THUNERSTORM|THUDERSTORM|THUNDERESTORM|THUNDESTORM|THUNDERTSORM|THUNDERSTROM|THUNDERTORM|TUNDERSTORM|microburst|metro storm|GUSTNADO",
dataSubset$EventType, ignore.case = TRUE)] <- "thunderstorm"
dataSubset$EventType[grep("blizzard|SNOW AND WIND|WIND AND HEAVY SNOW|WIND/HEAVY SNOW|BLOWING SNOW|SNOWSTORM|WINDS/SNOW|Thundersnow",
dataSubset$EventType, ignore.case = TRUE)] <- "blizzard"
dataSubset$EventType[grep("^hail| hail",dataSubset$EventType, ignore.case = TRUE)] <- "hail"
dataSubset$EventType[grep("coastal flood|BEACH FLOOD|Tidal flood| CSTL flood|COASTAL FLOOD|CSTL FLOODING|FLOYD",
dataSubset$EventType, ignore.case = TRUE)] <- "coastal.flood"
dataSubset$EventType[grep("COASTALFLOOD",dataSubset$EventType)] <- "coastal.flood" # case sensitive
dataSubset$EventType[grep("fire",dataSubset$EventType, ignore.case = TRUE)] <- "wildfire"
dataSubset$EventType[grep("Chill",dataSubset$EventType, ignore.case = TRUE)] <- "cold.wnd.chill"
dataSubset$EventType[grep("Heat|Hot|record high|warm",dataSubset$EventType, ignore.case = TRUE)] <- "heat"
dataSubset$EventType[grep("^Flood|Flash Flood|Rising water|swells|HIGH WATER",
dataSubset$EventType, ignore.case = TRUE)] <- "flood"
dataSubset$EventType[grep(" flood|stream|urban|floood",dataSubset$EventType, ignore.case = TRUE)] <- "flood"
dataSubset$EventType[grep("sleet|Freezing rain|freez|glaze",
dataSubset$EventType, ignore.case = TRUE)] <- "sleet.freezing.rain"
dataSubset$EventType[grep("mud|rock|lands",dataSubset$EventType,ignore.case = TRUE)] <- "land.slide"
dataSubset$EventType[grep("wind",dataSubset$EventType, ignore.case = TRUE)] <- "winds"
dataSubset$EventType[grep("WND",dataSubset$EventType)] <- "winds" # Case sensitive
dataSubset$EventType[grep("record|MONTHLY TEMPERATURE",
dataSubset$EventType,ignore.case = TRUE)] <- "record.temperature"
dataSubset$EventType[grep("erosion|EROSIN",dataSubset$EventType,ignore.case = TRUE)] <- "coastal.erosion"
dataSubset$EventType[grep("dam",dataSubset$EventType,ignore.case = TRUE)] <- "dam.failure"
dataSubset$EventType[grep("RED FLAG",dataSubset$EventType,ignore.case = TRUE)] <- "red.flag"
dataSubset$EventType[grep("hyp",dataSubset$EventType,ignore.case = TRUE)] <- "hypothermia"
dataSubset$EventType[grep("frost",dataSubset$EventType,ignore.case = TRUE)] <- "frost"
dataSubset$EventType[grep("OTHER|Other|HIGH",dataSubset$EventType)] <- "others" # case sensitive
dataSubset$EventType[grep("APACHE|NORTHERN LIGHTS|NONE|SOUTHEAST|No severe weather|SEVERE TURBULENCE|MILD PATTERN|EXCESSIVE|DROWNING",
dataSubset$EventType,ignore.case = TRUE)] <- "others"
dataSubset$EventType[grep("\\?",dataSubset$EventType)] <- "others"
Processing data for Events Vs Public Health.
# Select relevant variables
dataHealth <- dataSubset[,c(1,2,3,4)]
# Add a column Casualties summing Fatalities and Injuries
dataHealth$Casualties <- dataHealth$Fatalities + dataHealth$Injuries
# Variable names in this subset
colnames(dataHealth)
## [1] "State" "EventType" "Fatalities" "Injuries" "Casualties"
harmfulHealthEvents <- tapply(dataHealth$Casualties,dataHealth$EventType,sum, na.rm = TRUE)
harmfulHealthEvents <- subset(harmfulHealthEvents, harmfulHealthEvents > 0)
harmfulHealthEvents <- harmfulHealthEvents[order(harmfulHealthEvents,decreasing = TRUE)]
# Select the Top 10 most harmful Events
harmfulTop10 <- harmfulHealthEvents[1:10]
# Convert it to a data frame
harmfulTop10 <- data.frame(harmfulTop10,names(harmfulTop10),row.names = NULL)
colnames(harmfulTop10) <- c("Casualties","EventType")
Processing data for Events Vs Economic Consequences
# Select relevant variables
dataEconomic <- dataSubset[,c(1,2,5,6,7,8)]
# Curent Columns selected
colnames(dataEconomic)
## [1] "State" "EventType" "PropertyDamage"
## [4] "PropertyDamageExp" "CropDamage" "CropDamageExp"
# Unique Exp Values in PropertyDamage
unique(dataEconomic$PropertyDamageExp)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
# Unique Exp Values in CropDamage
unique(dataEconomic$CropDamageExp)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
# Function to get the Mutiplying factor based on the Exp value
decodeMultiplier <- function(x = "1"){
switch(tolower(x),
"1" = {return(1)},
"2" = {return(10^2)},
"3" = {return(10^3)},
"4" = {return(10^4)},
"5" = {return(10^5)},
"6" = {return(10^6)},
"7" = {return(10^7)},
"8" = {return(10^8)},
"h" = {return(10^2)},
"k" = {return(10^3)},
"m" = {return(10^6)},
"b" = {return(10^9)},
"-" = {return(1)},
"?" = {return(1)},
"+" = {return(1)},
"Otherwise" = 1)
return(1)
}
# Add CropDamageMultiple variable to the subset.
dataEconomic$CropDamageMultiple <- sapply(dataEconomic$CropDamageExp, decodeMultiplier)
# Add PropertyDamageMultiple variable to the subset.
dataEconomic$PropertyDamageMultiple <- sapply(dataEconomic$PropertyDamageExp, decodeMultiplier)
# Calculate Total Crop Damage and Total Property Damage by multiplying with the factor.
dataEconomic$TotCropDamage <- (dataEconomic$CropDamageMultiple * dataEconomic$CropDamage)
dataEconomic$TotPropertyDamage <- (dataEconomic$PropertyDamageMultiple * dataEconomic$PropertyDamage)
# Sum Total Crop Damage and Total Property Damage to get TotalEconomicDamage
dataEconomic$TotalEconomicDamage <- dataEconomic$TotPropertyDamage + dataEconomic$TotCropDamage
economicEvents <- tapply(dataEconomic$TotalEconomicDamage,dataEconomic$EventType,sum, na.rm = TRUE)
economicEvents <- subset(economicEvents, economicEvents > 0)
economicEvents <- economicEvents[order(economicEvents,decreasing = TRUE)]
# Select top 10 events with economic consequences
economicTop10 <- economicEvents[1:10]
# Convert it into dataframe
economicTop10 <- data.frame(economicTop10,names(economicTop10),row.names = NULL)
colnames(economicTop10) <- c("TotalEconomicDamage","EventType")
# Plot a bar graph
g <- ggplot(data=harmfulTop10, aes(x = reorder(EventType,-Casualties,sum), y = Casualties, fill = Casualties))
g + geom_col() + theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
ggtitle("Top 10 U.S. NOAA most harmful Events") +
labs(x="Event Type", y="Casualties")
Inferences from this data:
# Sum all the Casualties
totalCasualties <- sum(harmfulHealthEvents)
# Calculate proportionate Casualties per event
proportionCasualties <- harmfulHealthEvents/totalCasualties
# Proportionate Casualties for Top 10 events
totPropCasualtiesTop10 <- sum(proportionCasualties[1:10])
totPropCasualtiesTop10
## [1] 0.9308872
# Plot a bar graph
g <- ggplot(data=economicTop10, aes(x = reorder(EventType,-TotalEconomicDamage,sum), y = TotalEconomicDamage/10^9, fill = TotalEconomicDamage))
g + geom_col() + theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
ggtitle("Top 10 U.S. NOAA Events with most economic consequences") +
labs(x="Event Type", y="Total Economic Damage (in billions-USD)")
Inferences from this data:
# Economic Damage for top 10 events (in billions USD)
economicEvents[1:10]/10^9
## flood tropical.cyclone tornado storm.surge.tide
## 180.073416 99.281814 59.020780 47.966079
## hail drought thunderstorm ice.storm
## 19.024428 15.025425 12.451101 8.971641
## wildfire winds
## 8.899910 6.819224
# Sum the economic damage
totalEconomicDamage <- sum(economicEvents)
# Calculate the proportionate economic damage
proportionEconomicDamage <- economicEvents/totalEconomicDamage
# Sum the top 10 economic damages
totEconomicDamageTop10 <- sum(proportionEconomicDamage[1:10])
totEconomicDamageTop10
## [1] 0.9585291
# Sum Crop Damage per event and order it.
cropDamageEvents <- tapply(dataEconomic$TotCropDamage,dataEconomic$EventType,sum, na.rm = TRUE)
cropDamageEvents <- cropDamageEvents[order(cropDamageEvents,decreasing = TRUE)]
# Convert it to data frame
cropDamageDF <- data.frame(cropDamageEvents,names(cropDamageEvents),row.names = NULL)
colnames(cropDamageDF) <- c("TotCropDamage","EventType")
# Sum Property Damage per event and order it.
propDamageEvents <- tapply(dataEconomic$TotPropertyDamage,dataEconomic$EventType,sum, na.rm = TRUE)
propDamageEvents <- propDamageEvents[order(propDamageEvents,decreasing = TRUE)]
# Convert it to data frame
propDamageDF <- data.frame(propDamageEvents,names(propDamageEvents),row.names = NULL)
colnames(propDamageDF) <- c("TotPropertyDamage","EventType")
# Merge Property Damage & Crop Damage data frame with top10 economic damage data frame
mergedDF <- merge(x=propDamageDF,y=cropDamageDF,by = "EventType")
top10mergedDF <- merge(x=mergedDF,y=economicTop10,by = "EventType")
#Order the results
top10mergedDF <- top10mergedDF[order(top10mergedDF$TotalEconomicDamage
,decreasing = TRUE),]
# Set graph parameters for 2 by 1 panel plot
par(mar = c(7, 4, 2, 2), mfrow = c(2,1))
barplot(top10mergedDF$TotPropertyDamage/10^9,main = "Event Type vs Property Damage" ,
ylab = "Property Damage (in billion USD)", names.arg = top10mergedDF$EventType,
las = 2, col = "darkred", cex.names=0.8, cex.axis = 0.7, cex.lab = 0.7)
barplot(top10mergedDF$TotCropDamage/10^9,main = "Event Type vs Crop Damage" ,
ylab = "Crop Damage (in billion USD)", names.arg = top10mergedDF$EventType,
las = 2, col = "aquamarine4", cex.names=0.8, cex.axis = 0.7, cex.lab = 0.7)
# Top 5 Property Damage Events (in billions USD)
propDamageEvents[1:5]/10^9
## flood tropical.cyclone tornado storm.surge.tide
## 167.80308 93.07080 58.60332 47.96522
## hail
## 15.97754
# Top 5 Crop Damage Events (in billions USD)
cropDamageEvents[1:5]/10^9
## drought flood tropical.cyclone ice.storm
## 13.972587 12.270338 6.211014 5.022113
## hail
## 3.046888
From above data we infer that: