In this report, we identify the top-10 storm and other severe weather events that cause the most public health issues (i.e., fatalities and injuries) and economic challenges (i.e., property and crop damages) in the United States. For this purpose, we analyzed data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database for the time period 1950 to November-2011. Our analysis shows that even of the top-10, TORNADO and FLOOD are the events that have caused maximum public health and economic problems. TORNADO accounted for 96,979 fatalities & injuries and $57 billion in property & crop damages over the entire period. Meanwhile, FLOOD accounted for 7,259 fatalities & injuries and $150 billion in property & crop damages.
From the Storm Data link, we downloaded data on characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
# Downloading data from URL into exclusive "storm_data" folder in working directory
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if(file.exists("storm_data") == FALSE) { dir.create("storm_data") }
download.file(fileURL, "./storm_data/StormData.csv.bz2")
# Reading data into "stormdata" data frame
stormdata <- read.csv("./storm_data/StormData.csv.bz2", stringsAsFactors = FALSE)
We note the number of rows and fields, as well as view a sample of content to better understand the dataset.
# Noting the size of the dataset
dim(stormdata)
## [1] 902297 37
# Viewing sample content
head(stormdata)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
From this exploration, we see that the fields that interest us are:
* EVTYPE - the type of weather event
* FATALITIES - the number of fatalities caused
* INJURIES - the number of injuries resulted
* PROPDMG - estimated value of property damage - some scale of 10s
* PROPDMGEXP - exponent (factor of 10s) of property damage estimate
* CROPDMG - estimated value of property damage - some scale of 10s
* CROPDMGEXP - exponent (factor of 10s) of property damage estimate
PROPDMGEXP and CROPDMGEXP appear to be fields with alphabetical / other values, which obviously require further investigation, since they cannot be directly used for mathematical calculations.
# Look at unique values of PROPDMGEXP
unique(stormdata$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
# Look at unique values of CROPDMGEXP
unique(stormdata$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
We see that both PROPDMGEXP and CROPDMGEXP have some special character symbols and blanks in addition to alphabets and numbers. While alphabets can be interpreted as k/K for thousand and m/M for million etc., we need to check if the special characters and blanks can be outright ignored in case they are associated only with zero damage estimates in the PROPDMG and CROPDMG fields.
# Look at unique monetary estimates associated with special character/blank PROPDMGEXP
unique(stormdata[stormdata$PROPDMGEXP %in% c("","+","-","?"),"PROPDMG"])
## [1] 0.00 20.00 2.00 0.41 3.00 4.00 10.00 5.00 35.00 75.00 1.00
## [12] 15.00 60.00 6.00 7.00 9.00 8.00
# Look at unique monetary estimates associated with special character/blank CROPDMGEXP
unique(stormdata[stormdata$CROPDMGEXP %in% c("","?"),"CROPDMG"])
## [1] 0 3 4
Thus we see, that the blank and special character entries cannot be ignored - there are finite, non-zero monetary values associated with them.
Hence, we need to process the data to manipulate the PROPDMGEXP and CROPDMGEXP data.
Since we are unable to directly utilize the PROPDMGEXP and CROPDMGEXP data directly into calculations, we are adding new variables that will be numeric scales in the form of 10^x, where x is derived from PROPDMGEXP and CROPDMGEXP.
# For PROPDMGEXP, add PROPDMG_multiplier
stormdata$PROPDMG_multiplier[stormdata$PROPDMGEXP %in% c("0","","+","-","?")] <- 10^0
stormdata$PROPDMG_multiplier[stormdata$PROPDMGEXP == "1"] <- 10^1
stormdata$PROPDMG_multiplier[stormdata$PROPDMGEXP %in% c("2","h","H")] <- 10^2
stormdata$PROPDMG_multiplier[stormdata$PROPDMGEXP %in% c("3","k","K")] <- 10^3
stormdata$PROPDMG_multiplier[stormdata$PROPDMGEXP == "4"] <- 10^4
stormdata$PROPDMG_multiplier[stormdata$PROPDMGEXP == "5"] <- 10^5
stormdata$PROPDMG_multiplier[stormdata$PROPDMGEXP %in% c("6","m","M")] <- 10^6
stormdata$PROPDMG_multiplier[stormdata$PROPDMGEXP == "7"] <- 10^7
stormdata$PROPDMG_multiplier[stormdata$PROPDMGEXP == "8"] <- 10^8
stormdata$PROPDMG_multiplier[stormdata$PROPDMGEXP %in% c("b","B")] <- 10^9
# For CROPDMGEXP, add CROPDMG_multiplier
stormdata$CROPDMG_multiplier[stormdata$CROPDMGEXP %in% c("0","","?")] <- 10^0
stormdata$CROPDMG_multiplier[stormdata$CROPDMGEXP == "2"] <- 10^2
stormdata$CROPDMG_multiplier[stormdata$CROPDMGEXP %in% c("k","K")] <- 10^3
stormdata$CROPDMG_multiplier[stormdata$CROPDMGEXP %in% c("m","M")] <- 10^6
stormdata$CROPDMG_multiplier[stormdata$CROPDMGEXP %in% c("b","B")] <- 10^9
Now, we can easily calculate actual damage values as a product of PROPDMG/CROPDMG with the respective multiplier. We add two more variables to the dataset with calculated damage values.
# Value of property damage
stormdata$PROPDMGVAL <- stormdata$PROPDMG * stormdata$PROPDMG_multiplier
# Value of crop damage
stormdata$CROPDMGVAL <- stormdata$CROPDMG * stormdata$CROPDMG_multiplier
We will analyze the “stormdata” dataset to understand how much health and economic damage is caused by each type of weather event. For this, we create a smaller dataset, “stormimpact” that will create the sum of fatalities, injuries, property damage, and crop damage grouped by event type.
# Sum of fatalities, injuries, property damage, and crop damage grouped by event type
stormimpact <- aggregate(stormdata[,c("FATALITIES","INJURIES", "PROPDMGVAL", "CROPDMGVAL")], by = list(stormdata$EVTYPE), FUN = sum)
names(stormimpact)[1] <- "EVTYPE"
# Including variable to calculate total health (fatalities + injuries) impact
stormimpact$TOTAL_POPHEALTH <- stormimpact$FATALITIES + stormimpact$INJURIES
# Including variable to calculate total economic (property damage + crop damage) impact
stormimpact$TOTAL_ECONDMG <- stormimpact$PROPDMGVAL + stormimpact$CROPDMGVAL
Now, to graphically view the TOP-10 most destructive weather events by impact on population health:
par(mfrow=c(1,3), mar=c(12,4,2,2))
temphold <- stormimpact[order(stormimpact$FATALITIES, decreasing = TRUE),c("EVTYPE","FATALITIES")]
with(temphold[1:10,], barplot(FATALITIES/1000, names.arg = EVTYPE, las = 3, main = "Fatalities ('000s) by Cause", col = "lightgrey"))
temphold <- stormimpact[order(stormimpact$INJURIES, decreasing = TRUE),c("EVTYPE","INJURIES")]
with(temphold[1:10,], barplot(INJURIES/1000, names.arg = EVTYPE, las = 3, main = "Injuries ('000s) by Cause", col = "darkgrey"))
temphold <- stormimpact[order(stormimpact$TOTAL_POPHEALTH, decreasing = TRUE),c("EVTYPE", "TOTAL_POPHEALTH")]
with(temphold[1:10,], barplot(TOTAL_POPHEALTH/1000, names.arg = EVTYPE, las = 3, main = "Fatalities + Injuries ('000s) by Cause", col = "black"))
So, we see that TORNADO is the most destructive event for population health, being the number-1 cause for both fatalities and injuries.
And now, to graphically view the TOP-10 most destructive weather events by impact on economy:
par(mfrow=c(1,3), mar=c(12,4,2,2))
temphold <- stormimpact[order(stormimpact$PROPDMGVAL, decreasing = TRUE),c("EVTYPE","PROPDMGVAL")]
with(temphold[1:10,], barplot(PROPDMGVAL/100000000, names.arg = EVTYPE, las = 3, main = "Property Damage (Billion $) by Cause", col = "lightgrey"))
temphold <- stormimpact[order(stormimpact$CROPDMGVAL, decreasing = TRUE),c("EVTYPE","CROPDMGVAL")]
with(temphold[1:10,], barplot(CROPDMGVAL/100000000, names.arg = EVTYPE, las = 3, main = "Crop Damage (Billion $) by Cause", col = "darkgrey"))
temphold <- stormimpact[order(stormimpact$TOTAL_ECONDMG, decreasing = TRUE), c("EVTYPE", "TOTAL_ECONDMG")]
with(temphold[1:10,], barplot(TOTAL_ECONDMG/100000000, names.arg = EVTYPE, las = 3, main = "Property + Crop Damage (Billion $) by Cause", col = "black"))
Thus, we see that in terms of economic consequences, FLOOD is most destructive: being the number-1 cause for property damage and number-2 cause for crop damage.
Viewing health and economic impacts together:
par(mfrow=c(1,2), mar=c(12,4,2,2))
temphold <- stormimpact[order(stormimpact$TOTAL_POPHEALTH, decreasing = TRUE),c
("EVTYPE","TOTAL_POPHEALTH")]
with(temphold[1:10,], barplot(TOTAL_POPHEALTH/1000, names.arg = EVTYPE, las = 3, main = "Public Health Impact (Fatalities + Injuries) '000s", col = "red", cex.names = 0.7))
temphold <- stormimpact[order(stormimpact$TOTAL_ECONDMG, decreasing = TRUE),c
("EVTYPE","TOTAL_ECONDMG")]
with(temphold[1:10,], barplot(TOTAL_ECONDMG/100000000, names.arg = EVTYPE, las = 3, main = "Economic Impact (Property + Crop) Billion $", col = "darkblue", cex.names = 0.7))
It is obvious that TORNADO and FLOOD are the most destructive events overall, with both featuring in the top-5 causes for adverse impact public health as well as economy.
Our analysis has shown that TORNADO and FLOOD deliver the maximum damage to population health and economy in the United States.
print(paste("From 1950 to Nov-2011, TORNADO has resulted in", stormimpact$TOTAL_POPHEALTH[stormimpact$EVTYPE == "TORNADO"],"fatalities & injuries, as well as",round(stormimpact$TOTAL_ECONDMG[stormimpact$EVTYPE == "TORNADO"]/1000000000,0) ,"billion $ of property and crop damage.", sep = " "))
## [1] "From 1950 to Nov-2011, TORNADO has resulted in 96979 fatalities & injuries, as well as 57 billion $ of property and crop damage."
print(paste("From 1950 to Nov-2011, FLOOD has resulted in", stormimpact$TOTAL_POPHEALTH[stormimpact$EVTYPE == "FLOOD"],"fatalities & injuries, as well as",round(stormimpact$TOTAL_ECONDMG[stormimpact$EVTYPE == "FLOOD"]/1000000000,0) ,"billion $ of property and crop damage.", sep = " "))
## [1] "From 1950 to Nov-2011, FLOOD has resulted in 7259 fatalities & injuries, as well as 150 billion $ of property and crop damage."