In this report we aim to find the top weather events that are most harmful with respect to population health and that have the greatest economic consequences. To investigate this, we use the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm dataset, which records the impacts of major weather events from the year 1950 to November 2011. We calcualted the total caualities caused by each weather event in the recorded time period which include fatilities and injuries. We then caculated the total economic loss caused by each weather event which includes property damage and crop damage. Based on the above calculations, we conclude that the top five weather events with the most casualties are TORNADO, EXCESSIVE HEAT, TSTM WIND, FLOOD, and LIGHTNING, and the top five weather events with the most economic loss are FLOOD, HURRICANE/TYPHOON, TORNADO, STORM SURGE, and FLASH FLOOD.
We first read the data from “StormData.csv”.
stormdata <- read.csv("StormData.csv")
After reading the data from the file, we check the dimensions and first few rows of the data.
dim(stormdata)
## [1] 902297 37
head(stormdata)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
Here we add the FATALITIES and INJURIES data to get the total CASUALTIES data.
CASUALTIES <- data.frame(CASUALTIES = stormdata$FATALITIES + stormdata$INJURIES)
stormdata <- cbind(stormdata, CASUALTIES)
Since the property damage is described by “PROPDMG” and “PROPDMGEXP” and the crop damage is described by “CROPDM” and “CROPDMGEXP” respectively, we need to consolidate them to get the property damage and crop damage data. First let’s take a look at the levels in “PROPDMGEXP” and “CROPDMGEXP”.
table(stormdata$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 13 4 4 28
## 6 7 8 B h H K m M
## 4 5 1 40 1 6 424665 7 11330
table(stormdata$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
From the table we can observe that the “PROPDMGEXP” contains levels including “”, “-”, “?”, “+”, 0-8, “B”, “h”, “H”, “K”, “m”, and “M”. Similarly the “CROPDMGEXP” contains levels including “”, “?”, 0, 2, “B”, “k”, “K”, “m”, “M”. From the documentation of the dataset we know that “B” represents billion(10^9), “m” and “M” represent million (10^6), “k” and “K” represent thousand (10^3), and “h” and “H” represent hundred (10^2). The documentation doesn’t mention the notation for other levels appearing in the dataset. Here we adopt the idea in this article, where it’s proved that “+” = 1, “-” = 0, “?” = 0, “” = 0, and numeric values (0-8) = 10. The damage values are then calculated as PROPDMG = PROPDMG * PROPDMGEXP and CROPDMG = CROPDMG * CROPDMGEXP.
stormdata$PROPDMGEXP <- as.character(stormdata$PROPDMGEXP)
stormdata$PROPDMGEXP[stormdata$PROPDMGEXP == "B"] <- "1000000000"
stormdata$PROPDMGEXP[stormdata$PROPDMGEXP == "m" | stormdata$PROPDMGEXP == "M"] <- "1000000"
stormdata$PROPDMGEXP[stormdata$PROPDMGEXP == "K" | stormdata$PROPDMGEXP == "k"] <- "1000"
stormdata$PROPDMGEXP[stormdata$PROPDMGEXP == "h" | stormdata$PROPDMGEXP == "H"] <- "100"
stormdata$PROPDMGEXP[stormdata$PROPDMGEXP == "+"] <- "1"
stormdata$PROPDMGEXP[stormdata$PROPDMGEXP == "-" | stormdata$PROPDMGEXP == "?" |
stormdata$PROPDMGEXP == ""] <- "0"
stormdata$PROPDMGEXP[stormdata$PROPDMGEXP == "1" | stormdata$PROPDMGEXP == "2" |
stormdata$PROPDMGEXP == "3" | stormdata$PROPDMGEXP == "4" |
stormdata$PROPDMGEXP == "5" | stormdata$PROPDMGEXP == "6" |
stormdata$PROPDMGEXP == "7" | stormdata$PROPDMGEXP == "8"] <- "10"
stormdata$PROPDMGEXP <- as.numeric(stormdata$PROPDMGEXP)
table(stormdata$PROPDMGEXP)
##
## 0 10 100 1000 1e+06 1e+09
## 466159 89 7 424665 11337 40
stormdata$PROPDMG <- stormdata$PROPDMG * stormdata$PROPDMGEXP
stormdata$CROPDMGEXP <- as.character(stormdata$CROPDMGEXP)
stormdata$CROPDMGEXP[stormdata$CROPDMGEXP == "B"] <- "1000000000"
stormdata$CROPDMGEXP[stormdata$CROPDMGEXP == "m" | stormdata$CROPDMGEXP == "M"] <- "1000000"
stormdata$CROPDMGEXP[stormdata$CROPDMGEXP == "K" | stormdata$CROPDMGEXP == "k"] <- "1000"
stormdata$CROPDMGEXP[stormdata$CROPDMGEXP == "h" | stormdata$CROPDMGEXP == "H"] <- "100"
stormdata$CROPDMGEXP[stormdata$CROPDMGEXP == "+"] <- "1"
stormdata$CROPDMGEXP[stormdata$CROPDMGEXP == "-" | stormdata$CROPDMGEXP == "?" |
stormdata$CROPDMGEXP == ""] <- "0"
stormdata$CROPDMGEXP[stormdata$CROPDMGEXP == "1" | stormdata$CROPDMGEXP == "2" |
stormdata$CROPDMGEXP == "3" | stormdata$CROPDMGEXP == "4" |
stormdata$CROPDMGEXP == "5" | stormdata$CROPDMGEXP == "6" |
stormdata$CROPDMGEXP == "7" | stormdata$CROPDMGEXP == "8"] <- "10"
stormdata$CROPDMGEXP <- as.numeric(stormdata$CROPDMGEXP)
table(stormdata$CROPDMGEXP)
##
## 0 10 1000 1e+06 1e+09
## 618439 1 281853 1995 9
stormdata$CROPDMG <- stormdata$CROPDMG * stormdata$CROPDMG
The PROPDMG and CROPDMG data are then added to get the total economic loss caused by the severe weather events.
ECOLOSS <- stormdata$PROPDMG + stormdata$CROPDMG
stormdata <- cbind(stormdata, ECOLOSS)
Now we take a look at the total economic loss caused by each weather event over the time period recorded in the dataset.
ecoimpact <- aggregate(list(Economic_Loss = stormdata$ECOLOSS), list(EVTYPE = stormdata$EVTYPE), sum)
ecoimpact$Economic_Loss <- ecoimpact$Economic_Loss/1000000000
topecoimpact <- ecoimpact[order(-ecoimpact$Economic_Loss)[1:5],]
ggplot(data = topecoimpact, aes(EVTYPE, Economic_Loss)) + geom_bar(stat = "identity") + labs(title = "Top 5 weather events with most severe economic loss", x = "Event Type", y = "Economic Loss (billion dollars)")
We can tell from the graph that the top five events with the most severe economic losses are FLOOD, HURRICANE/TYPHOON, TORNADO, STORM SURGE, and FLASH FLOOD.
Next let’s take a look at the total casualties caused by each weather event over the time period recorded in the dataset.
totalcasualty <- aggregate(list(Casualties = stormdata$CASUALTIES), list(EVTYPE = stormdata$EVTYPE), sum)
toptotalcasualty <- totalcasualty[order(-totalcasualty$Casualties)[1:5],]
ggplot(data = toptotalcasualty, aes(EVTYPE, Casualties)) + geom_bar(stat = "identity") + labs(title = "Top 5 weather events with most casualties", x = "Event Type", y = "Casualties")
We can tell from the graph that the top five events with the most casualties are TORNADO, EXCESSIVE HEAT, TSTM WIND, FLOOD, and LIGHTNING.