Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
We explored the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database which tracks characteristics of major storms and weather events in the United States, starting in the year 1950 and till the end of November 2011.
As a result of our analysis, we found that across the United States, the following weather events were the most harmful with respect to population health:
We also found that the greatest economic damages were made by the following weather events: (1) TORNADOS (2) THUNDERSTORMS (3) FLOODS, (based on TOTAL damage from the event, which is sum of Crop and Property damages)
setInternet2(TRUE)
f <- tempfile()
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", f)
data <- read.csv(bzfile(f), stringsAsFactors = FALSE)
First we’d like to check how clean is the variable EVTYPE which we’ll use extensively for further analysis. The variable EVTYPE lists different types of weather conditions which caused public / economic damages.
event <- data$EVTYPE
event <- as.data.frame(table(event))
library(plyr)
arr <- arrange(event, -Freq)
head(arr, 15)
## event Freq
## 1 HAIL 288661
## 2 TSTM WIND 219940
## 3 THUNDERSTORM WIND 82563
## 4 TORNADO 60652
## 5 FLASH FLOOD 54277
## 6 FLOOD 25326
## 7 THUNDERSTORM WINDS 20843
## 8 HIGH WIND 20212
## 9 LIGHTNING 15754
## 10 HEAVY SNOW 15708
## 11 HEAVY RAIN 11723
## 12 WINTER STORM 11433
## 13 WINTER WEATHER 7026
## 14 FUNNEL CLOUD 6839
## 15 MARINE TSTM WIND 6175
By looking into EVTYPE Frequencies, We can observe that most of the variables are in upper case (few are in lower case), so we transform all variables to upper case. Also, we’ll trim the names
trim <- function(x) gsub("^\\s+|\\s+$", "", x)
data$EVTYPE <- toupper(data$EVTYPE)
data$EVTYPE <- trim(data$EVTYPE)
Also by looking at the data we observed another issue: in many cases the same event in EVTYPE is coded differently,
For example all below different events should be reported as one event because they all represent THUNDERSTORM
## EVENT FREQUANCY
## 1 TSTM WIND 219940
## 2 THUNDERSTORM WIND 82563
## 3 THUNDERSTORM WINDS 20843
## 4 MARINE TSTM WIND 6175
## 5 MARINE THUNDERSTORM WIND 5812
## 6 TSTM WIND/HAIL 1028
Another example where all of below different events should be reported as one event because they all represent FLOOD
## EVENT FREQUANCY
## 1 FLASH FLOOD 54277
## 2 FLOOD 25326
## 3 FLASH FLOODING 682
## 4 COASTAL FLOOD 650
## 5 FLOOD/FLASH FLOOD 624
And so on, for other varables in EVTYPE. Similar transformation / aggregation is also required for HEAT, HAIL, WiLDFIRE, EXTREMECOLD, HIGH WIND, SNOW, which we make using below code:
new <- data
new$EVTYPE[new$EVTYPE == "TSTM WIND" | new$EVTYPE == "THUNDERSTORM WIND" | new$EVTYPE == "THUNDERSTORM WINDS" | new$EVTYPE == "MARINE TSTM WIND" | new$EVTYPE == "TSTM WIND/HAIL" | new$EVTYPE == "MARINE TSTM WIND" | new$EVTYPE =="TROPICAL STORM" | new$EVTYPE == "THUNDERSTORM"] <- "THUNDERSTORM"
new$EVTYPE[new$EVTYPE == "FLASH FLOOD" | new$EVTYPE == "FLOOD" | new$EVTYPE == "FLASH FLOODING" | new$EVTYPE == "COASTAL FLOOD" | new$EVTYPE == "FLOOD/FLASH FLOOD"] <- "FLOOD"
new$EVTYPE[new$EVTYPE == "MARINE HAIL" | new$EVTYPE == "HAIL" ] <- "HAIL"
new$EVTYPE[new$EVTYPE == "WILDFIRE" | new$EVTYPE == "WILD/FOREST FIRE" ] <- "WILDFIFRE"
new$EVTYPE[new$EVTYPE == "EXTREME COLD/WIND CHILL" | new$EVTYPE == "EXTREME COLD" ] <- "EXTREMECOLD"
new$EVTYPE[new$EVTYPE == "HIGH WIND" | new$EVTYPE == "HIGH WINDS" | new$EVTYPE == "WIND" | new$EVTYPE == "WIND " ] <- "HIGH WIND"
new$EVTYPE[new$EVTYPE == "EXCESSIVE HEAT" | new$EVTYPE == "HEAT" ] <- "HEAT"
After cleaning / preparing the data, we are ready to do the analysis to answer reqested questions.
First lets address the question on most harmful weatrher events for public health
The following 2 variables report Population casualties (each variable reports the number of people impacted)
FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
We load needed R packages and sum together the numbers of injuries and fatalities for each EVTYPE
library(plyr)
## Warning: package 'plyr' was built under R version 3.2.4
library(reshape2)
## Warning: package 'reshape2' was built under R version 3.2.4
sum_population <- ddply(new, .(EVTYPE), summarise, fatalities = sum(FATALITIES),
injuries = sum(INJURIES))
Let’s sort fatalities and injuries data to find what weather events are the most harmful
major_fatalities <- head(sum_population[order(sum_population$fatalities, decreasing = T),
], n = 10)[, c(1, 2)]
major_injuries <- head(sum_population[order(sum_population$injuries, decreasing = T), ],
n = 10)[, c(1, 3)]
Now lets address the question on most harmful weatrher events for economy
The following 2 variables report Economic damages (each of which is reported in $)
PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 …
CROPDMG : num 0 0 0 0 0 0 0 0 0 0 …
We sum together the numbers of PROPERTY DAMAGE and CROP DAMAGE for each EVTYPE
sum_econdamage <- ddply(new, .(EVTYPE), summarise, cropdmg = sum(CROPDMG), propdmg = sum(PROPDMG))
sum_econdamage$total_damage <- sum_econdamage$cropdmg + sum_econdamage$propdmg
Let’s sort PROPERTY DAMAGE and CROP DAMAGE data to find what weather events are the most harmful
major_econdamage <- head(sum_econdamage[order(sum_econdamage$total_damage, decreasing = T), ], n = 10)
By looking at below list of top fatalities we conclude that across the United States, the following 3 weather events were the most harmful with respect to population health: The 3 major causes of human fatalities were (1) TORNADO (2) HEAT (3) FLOOD
major_fatalities
## EVTYPE fatalities
## 739 TORNADO 5633
## 229 HEAT 2840
## 141 FLOOD 1487
## 403 LIGHTNING 816
## 662 THUNDERSTORM 774
## 507 RIP CURRENT 368
## 306 HIGH WIND 306
## 122 EXTREMECOLD 287
## 11 AVALANCHE 224
## 864 WINTER STORM 206
The same data plotted as barchart:
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.4
barchart1 <- melt(major_fatalities)
## Using EVTYPE as id variables
ggplot(barchart1, aes(x = factor(barchart1$EVTYPE, levels = barchart1$EVTYPE[order(-barchart1$value)]), y = barchart1$value, fill = variable)) +
geom_bar(fill="green", stat = "identity", position = "dodge") + theme(axis.text.x = element_text(angle = -315)) +
labs(x = "WEATHER EVENT", y = " # OF FATALITIES") +
theme(legend.position = "none") + ggtitle("HUMAN FATALITIES DUE TO WEATHER EVENTS")
By looking at below list of top injuries we conclude that across the United States, the following 3 weather events were the most harmful with respect to population health: The 3 major causes of human injuries were (1) TORNADO (2) THUNDERSTORMS (3) HEAT
major_injuries
## EVTYPE injuries
## 739 TORNADO 91346
## 662 THUNDERSTORM 9808
## 229 HEAT 8625
## 141 FLOOD 8591
## 403 LIGHTNING 5230
## 372 ICE STORM 1975
## 306 HIGH WIND 1525
## 852 WILDFIFRE 1456
## 198 HAIL 1361
## 864 WINTER STORM 1321
The same data plotted as barchart:
barchart2 <- melt(major_injuries)
## Using EVTYPE as id variables
ggplot(barchart2, aes(x = factor(barchart2$EVTYPE, levels = barchart2$EVTYPE[order(-barchart2$value)]), y = barchart2$value, fill = variable)) +
geom_bar(fill="blue", stat = "identity", position = "dodge") + theme(axis.text.x = element_text(angle = -315)) +
labs(x = "WEATHER EVENT", y = " # OF INJURIES") +
theme(legend.position = "none") + ggtitle("HUMAN INJURIES DUE TO WEATHER EVENTS")
By looking at below list of top total injuries (SUM OF CROP AND PROPERTY DAMAGES) we conclude that the greatest economic damages in the US were made by the following weather events: (1) TORNADOS (2) THUNDERSTORMS (3) FLOODS
major_econdamage
## EVTYPE cropdmg propdmg total_damage
## 739 TORNADO 100018.52 3212258.16 3312276.68
## 662 THUNDERSTORM 204935.75 2719380.69 2924316.44
## 141 FLOOD 353792.09 2381300.31 2735092.40
## 198 HAIL 579596.28 688697.38 1268293.66
## 403 LIGHTNING 3580.61 603351.78 606932.39
## 306 HIGH WIND 19342.81 383017.10 402359.91
## 864 WINTER STORM 1978.99 132720.59 134699.58
## 852 WILDFIFRE 8553.74 123804.29 132358.03
## 260 HEAVY SNOW 2165.72 122251.99 124417.71
## 372 ICE STORM 1688.95 66000.67 67689.62
The same data plotted as barchart:
barchart3 <- melt(major_econdamage)
## Using EVTYPE as id variables
barchart3 <- barchart3[barchart3$variable != "total_damage", ]
ggplot(na.omit(barchart3), aes(x = factor(barchart3$EVTYPE, levels = barchart3$EVTYPE[order(-barchart3$value)]), y = barchart3$value, fill = variable)) +
geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = -315)) + labs(x = "WEATHER EVENT", y = "DAMAGE TO ECONOMY $") +
scale_fill_discrete(name = "Type of damage", labels = c("CROP", "PROPERTY")) +
theme(legend.position = "top") + ggtitle("ECONOMIC DAMAGE DUE TO WEATHER EVENTS")