Storms and severe weather massively impact both the health of the population of a country and that country’s economy. With the US experiencing a variety of extreme weather conditions and occurrences across the country, it is vital that government is aware of what poses the largest threat. In this analysis I will look at the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. With appropriate managing of the data and illustrative tabling and graphing, I aim to show which storms or weather conditions prove to harm public health and the economy in the US.
First, I will load the packages that will be required for the analysis. Next, I will download and store the data. I will look at the structure of the data.
library(tidyverse)
library(RColorBrewer)
URL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(URL, "./repdata%2Fdata%2FStormData.csv.bz2")
datedownloaded <- date()
data <- read.csv("./repdata%2Fdata%2FStormData.csv.bz2")
str(data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
As can be seen in the structure, the EXP variables are factor variables so I will convert to character and examine.
data <- data %>% mutate(CROPDMGEXP = as.character(CROPDMGEXP), PROPDMGEXP = as.character(PROPDMGEXP))
table(data$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
table(data$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5 6
## 465934 1 8 5 216 25 13 4 4 28 4
## 7 8 B h H K m M
## 5 1 40 1 6 424665 7 11330
These indicate the units of the CROPDM and PROPDM as powers of 10. According to the source, National Weather Service Storm Data Documentation, “Alphabetical characters used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions. If additional precision is available, it may be provided in the narrative part of the entry." Thus, I deduce the following.
data$CROPDMGEXP[data$CROPDMGEXP=="?"] <- 10^0
data$CROPDMGEXP[data$CROPDMGEXP=="0"] <- 10^0
data$CROPDMGEXP[data$CROPDMGEXP=="2"] <- 10^0
data$CROPDMGEXP[data$CROPDMGEXP=="B"] <- 10^9
data$CROPDMGEXP[data$CROPDMGEXP=="k"] <- 10^3
data$CROPDMGEXP[data$CROPDMGEXP=="K"] <- 10^3
data$CROPDMGEXP[data$CROPDMGEXP=="m"] <- 10^6
data$CROPDMGEXP[data$CROPDMGEXP=="M"] <- 10^6
data$PROPDMGEXP[data$PROPDMGEXP=="-"] <- 10^0
data$PROPDMGEXP[data$PROPDMGEXP=="?"] <- 10^0
data$PROPDMGEXP[data$PROPDMGEXP=="+"] <- 10^0
data$PROPDMGEXP[data$PROPDMGEXP=="0"] <- 10^0
data$PROPDMGEXP[data$PROPDMGEXP=="1"] <- 10^0
data$PROPDMGEXP[data$PROPDMGEXP=="2"] <- 10^2
data$PROPDMGEXP[data$PROPDMGEXP=="3"] <- 10^3
data$PROPDMGEXP[data$PROPDMGEXP=="4"] <- 10^4
data$PROPDMGEXP[data$PROPDMGEXP=="5"] <- 10^5
data$PROPDMGEXP[data$PROPDMGEXP=="6"] <- 10^6
data$PROPDMGEXP[data$PROPDMGEXP=="7"] <- 10^7
data$PROPDMGEXP[data$PROPDMGEXP=="8"] <- 10^8
data$PROPDMGEXP[data$PROPDMGEXP=="B"] <- 10^9
data$PROPDMGEXP[data$PROPDMGEXP=="h"] <- 10^2
data$PROPDMGEXP[data$PROPDMGEXP=="H"] <- 10^2
data$PROPDMGEXP[data$PROPDMGEXP=="K"] <- 10^3
data$PROPDMGEXP[data$PROPDMGEXP=="m"] <- 10^6
data$PROPDMGEXP[data$PROPDMGEXP=="M"] <- 10^6
data <- data %>% mutate(CROPDMGEXP = as.numeric(CROPDMGEXP), PROPDMGEXP = as.numeric(PROPDMGEXP))
data$CROPDMGEXP[is.na(data$CROPDMGEXP)] <- 10^0
data$PROPDMGEXP[is.na(data$PROPDMGEXP)] <- 10^0
table(data$CROPDMGEXP)
##
## 1 1000 1e+06 1e+09
## 618440 281853 1995 9
table(data$PROPDMGEXP)
##
## 1 100 1000 10000 1e+05 1e+06 1e+07 1e+08 1e+09
## 466189 20 424669 4 28 11341 5 1 40
I now have correctly formatted damage units so I will create new columns with the real damage estimate values.
data <- data %>% mutate(CROPDMGCOST = CROPDMG*CROPDMGEXP, PROPDMGCOST = PROPDMG*PROPDMGEXP)
First, I will look at how storms and severe weather affect public health. I will create two tables that show the number of fatalities and injuries per storm respectively.
fatalities <- data %>% group_by(EVTYPE) %>%
summarise(TOTAL_FATALITIES = sum(FATALITIES)) %>%
arrange(desc(TOTAL_FATALITIES))
fatalities
## # A tibble: 985 x 2
## EVTYPE TOTAL_FATALITIES
## <fct> <dbl>
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
## # ... with 975 more rows
injuries <- data %>% group_by(EVTYPE) %>%
summarise(TOTAL_INJURIES = sum(INJURIES)) %>%
arrange(desc(TOTAL_INJURIES))
injuries
## # A tibble: 985 x 2
## EVTYPE TOTAL_INJURIES
## <fct> <dbl>
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
## # ... with 975 more rows
Next, I will plot the data for fatalities.
fatalities %>% top_n(10) %>%
ggplot(aes(reorder(EVTYPE, -TOTAL_FATALITIES), TOTAL_FATALITIES, fill = EVTYPE)) +
geom_bar(stat = "identity", show.legend = FALSE) +
scale_fill_brewer(palette = "Spectral") +
theme(axis.text.x = element_text(angle = 90)) +
ggtitle("Top 10 Storms to Cause Fatalities") +
xlab("Storm Type") +
ylab("Total Fatalities")
Clearly tornados are the biggest cause of concern in terms of the fatalities they cause, with excessive heat posing a threat almost double the third most deadly, yet still less than half of tornados.
Next, I will plot the data for injuries.
injuries %>% top_n(10) %>%
ggplot(aes(reorder(EVTYPE, -TOTAL_INJURIES), TOTAL_INJURIES, fill = EVTYPE)) +
geom_bar(stat = "identity", show.legend = FALSE) +
scale_fill_brewer(palette = "Spectral") +
theme(axis.text.x = element_text(angle = 90)) +
ggtitle("Top 10 Storms to Cause Injuries") +
xlab("Storm Type") +
ylab("Total Injuries")
Again, tornados are the most dangerous. In terms of injuries though they are even more considerably dangerous than the second most. Notice this time that excessive heat drops to the fourth most dangerous and proportionally is less comparable to tornados. This perhaps suggests that excessive heat is a cause for concern as when it causes harm it more often leads to death.
Here, I will look at how storms and severe weather affect the economy by showing the costs of damages to crops and properties for each storm type. First, I will display tables.
propdmg <- data %>% group_by(EVTYPE) %>%
summarise(TOTAL_PROPDMGCOST = sum(PROPDMGCOST)) %>%
arrange(desc(TOTAL_PROPDMGCOST))
propdmg
## # A tibble: 985 x 2
## EVTYPE TOTAL_PROPDMGCOST
## <fct> <dbl>
## 1 FLOOD 144657709807
## 2 HURRICANE/TYPHOON 69305840000
## 3 TORNADO 56947380676.
## 4 STORM SURGE 43323536000
## 5 FLASH FLOOD 16822673978.
## 6 HAIL 15735267513.
## 7 HURRICANE 11868319010
## 8 TROPICAL STORM 7703890550
## 9 WINTER STORM 6688497251
## 10 HIGH WIND 5270046295
## # ... with 975 more rows
Finally, I will plot the estimated cost of damages caused to properties.
propdmg %>% top_n(10) %>%
ggplot(aes(reorder(EVTYPE, -TOTAL_PROPDMGCOST), TOTAL_PROPDMGCOST, fill = EVTYPE)) +
geom_bar(stat = "identity", show.legend = FALSE) +
scale_fill_brewer(palette = "Spectral") +
theme(axis.text.x = element_text(angle = 90)) +
ggtitle("Top 10 Storms that Damage Properties") +
xlab("Storm Type") +
ylab("Estimate of Damages to Properties")
Floods are the most costly in terms of damage to properties, with hurricanes and typhoons in second and tornados a close third.
From the anaylsis, I deduce that tornados are the biggest concern when it comes to public health, with extreme heat also a problem (though not nearly as much so as tornados). I also deduce that floods pose the biggest threat to the US economy in terms of the damage they cause, with hurricanes and typhoons also a problem.