Storm Data is an official publication of the National Oceanic and Atmospheric Administration (NOAA) which documents:
The occurrence of storms and other significant weather phenomena having sufficient intensity to cause loss of life, injuries, significant property damage, and/or disruption to commerce;
Rare, unusual, weather phenomena that generate media attention, such as snow flurries in South Florida or the San Diego coastal area; and
Other significant meteorological events, such as record maximum or minimumtemperatures or precipitation that occur in connection with another event
The data used throughout this analysis can be located HERE and it includes data tracked from 1950 and ends in November 2011.
The data is used to answer the following questions:
Across the United States, which type of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
Based on the conclusions of our data, it is determiend:
Tornados are the most harmful events in terms of injuries and fatalities
Floods have the most economic impact
Once the data is downloaded and placed in the proper directory, it is loaded into the program. Packages “dplyr”, “ggplot2”, “knitr”, “lemon”, “kableExtra”, and “tidyr” were used to organize and present the data.
rawData <- read.csv("StormData.csv", sep=",", header = TRUE)
names(rawData)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
Once the data is uploaded and variable names are determined, we can set the data up to analyze. The only fields that are necessary for our analysis are Event Type (EVTYPE), Fatalities, Injuries, Property Damage (PROPDMG), Property Damage Exponent (PROPDMGEXP), Crop Damage (CROPDMG), and Crop Damage Exponent (CROPDMGEXP).
Then the data is additioanlly filtered to remove non zero elements. Side calcluations were done to ensure to ensure the totals of both data sets were equal. After that, property damage and crop damage exponents are used to calculate their total values. Property Damage and Crop Damage total values are determiend by their exponent. B/b = Billions, M/m = Millions, K/k = Thousands, H/h = Hundreds. The remaining values were ignored.
Lastly, the data is separated into top 10 Events by Fatalities, Injuries, and Damage. Damage is calculated as the sum of Property Damage and Crop Damage.
options(scipen=999)
Data <- rawData %>%
select(EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP) %>%
filter(FATALITIES >0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0) %>%
mutate(PROPDMG = ifelse(PROPDMGEXP == "B" | PROPDMGEXP == "b", PROPDMG*1000000000,
ifelse(PROPDMGEXP == "M" | PROPDMGEXP == "m", PROPDMG*1000000,
ifelse(PROPDMGEXP == "K" | PROPDMGEXP == "k", PROPDMG*1000,
ifelse(PROPDMGEXP == "H" | PROPDMGEXP == "h", PROPDMG*100,PROPDMG)))),
CROPDMG = ifelse(CROPDMGEXP == "B" | CROPDMGEXP == "b", CROPDMG*1000000000,
ifelse(CROPDMGEXP == "M" | CROPDMGEXP == "m", CROPDMG*1000000,
ifelse(CROPDMGEXP == "K" | CROPDMGEXP == "k", CROPDMG*1000,
ifelse(CROPDMGEXP == "H" | CROPDMGEXP == "h", CROPDMG*100,CROPDMG)))),
Total_Damage = PROPDMG + CROPDMG) %>%
select(-PROPDMGEXP,-CROPDMGEXP,-PROPDMG,-CROPDMG)
# Change names to something more readable
names(Data) <- c("Event","Fatalities","Injuries","TotalDamage")
# Separated the data into subsets and aggregate them by Event.
# We also set the order to be descending by event to make the graphs easier to discern
TopFatalities <- aggregate(Fatalities ~ Event, data = Data, FUN = sum)
TopFatalities <- arrange(TopFatalities,desc(Fatalities))
TopFatalities <- head(TopFatalities,10)
TopFatalities$Event <- factor(TopFatalities$Event, levels = TopFatalities$Event[order(-TopFatalities$Fatalities)])
TopInjuries <- aggregate(Injuries ~ Event, data = Data, FUN = sum)
TopInjuries <- arrange(TopInjuries,desc(Injuries))
TopInjuries <- head(TopInjuries,10)
TopInjuries$Event <- factor(TopInjuries$Event, levels = TopInjuries$Event[order(-TopInjuries$Injuries)])
TopDamage <- aggregate(TotalDamage ~ Event, data = Data, FUN = sum)
TopDamage <- arrange(TopDamage, desc(TotalDamage))
TopDamage <- head(TopDamage,10)
TopDamage$Event <- factor(TopDamage$Event, levels = TopDamage$Event[order(-TopDamage$TotalDamage)])
# Remove unused data
rm(Data,rawData)
Now that the data is separated into proper subsets, the above questions can be answered. First, the most fatal events are determined, followed by the events causing the most injuries, and lastly, the events that are the most economically damaging.
The graph and table below clearly depict tornados as the most fatal event. Directly after that comes excessive heat. So stay away from the hottest tornado areas!
Fplot <- ggplot(TopFatalities, aes(x=Event, y=Fatalities))
Fplot <- Fplot + geom_bar(fill = "red", stat = "identity") + theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(hjust = .5)) +
xlab("Event Type") + ylab("Number of Fatalities") + ggtitle("Top 10 Fatal Weather Events")
print(Fplot)
kable(TopFatalities, format.args = list(big.mark = ","),format = "html") %>%
kable_styling(bootstrap_options = "striped", full_width = FALSE)
| Event | Fatalities |
|---|---|
| TORNADO | 5,633 |
| EXCESSIVE HEAT | 1,903 |
| FLASH FLOOD | 978 |
| HEAT | 937 |
| LIGHTNING | 816 |
| TSTM WIND | 504 |
| FLOOD | 470 |
| RIP CURRENT | 368 |
| HIGH WIND | 248 |
| AVALANCHE | 224 |
Not surprisingly, tornados also incur the most injuries by a large margin. Thunderstorm winds, floods, and excessive heat all have relatively similar values.
Iplot <- ggplot(TopInjuries, aes(x=Event, y=Injuries))
Iplot <- Iplot + geom_bar(fill = "orange", stat = "identity") + theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(hjust = .5)) +
xlab("Event Type") + ylab("Number of Injuries") + ggtitle("Top 10 Weather Event Causing Injury")
print(Iplot)
kable(TopInjuries, format.args = list(big.mark = ","),format = "html") %>%
kable_styling(bootstrap_options = "striped", full_width = FALSE)
| Event | Injuries |
|---|---|
| TORNADO | 91,346 |
| TSTM WIND | 6,957 |
| FLOOD | 6,789 |
| EXCESSIVE HEAT | 6,525 |
| LIGHTNING | 5,230 |
| HEAT | 2,100 |
| ICE STORM | 1,975 |
| FLASH FLOOD | 1,777 |
| THUNDERSTORM WIND | 1,488 |
| HAIL | 1,361 |
Floods cause the most damage to both crops and property. Anyone who has dealth with catostraphic insurance can relate to this. Behind them comes hurricane/typhoon and tornados.
Dplot <- ggplot(TopDamage, aes(x=Event, y=TotalDamage/10^9))
Dplot <- Dplot + geom_bar(fill = "blue", stat = "identity") + theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(hjust = .5)) +
xlab("Event Type") + ylab("Damage (In Billions)") + ggtitle("Top 10 Damaging Weather Event in U.S. Dollars")
print(Dplot)
kable(TopDamage, format.args = list(big.mark = ","),format = "html") %>%
kable_styling(bootstrap_options = "striped", full_width = FALSE)
| Event | TotalDamage |
|---|---|
| FLOOD | 150,319,678,257 |
| HURRICANE/TYPHOON | 71,913,712,800 |
| TORNADO | 57,352,114,049 |
| STORM SURGE | 43,323,541,000 |
| HAIL | 18,758,222,016 |
| FLASH FLOOD | 17,562,129,167 |
| DROUGHT | 15,018,672,000 |
| HURRICANE | 14,610,229,010 |
| RIVER FLOOD | 10,148,404,500 |
| ICE STORM | 8,967,041,360 |
Though tornados do not cause as much economic damage, they are a great risk to great injuries. It’s strongly advised to have plans in place if you are in an area with frequent tornados. If you are in an area that is a risk to flooding, make sure you get flood insurance!