Synopsis

Storm Data is an official publication of the National Oceanic and Atmospheric Administration (NOAA) which documents:

The data used throughout this analysis can be located HERE and it includes data tracked from 1950 and ends in November 2011.

The data is used to answer the following questions:

  1. Across the United States, which type of events are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Based on the conclusions of our data, it is determiend:

  1. Tornados are the most harmful events in terms of injuries and fatalities

  2. Floods have the most economic impact

Data Processing

Once the data is downloaded and placed in the proper directory, it is loaded into the program. Packages “dplyr”, “ggplot2”, “knitr”, “lemon”, “kableExtra”, and “tidyr” were used to organize and present the data.

rawData <- read.csv("StormData.csv", sep=",", header = TRUE)

names(rawData)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Once the data is uploaded and variable names are determined, we can set the data up to analyze. The only fields that are necessary for our analysis are Event Type (EVTYPE), Fatalities, Injuries, Property Damage (PROPDMG), Property Damage Exponent (PROPDMGEXP), Crop Damage (CROPDMG), and Crop Damage Exponent (CROPDMGEXP).

Then the data is additioanlly filtered to remove non zero elements. Side calcluations were done to ensure to ensure the totals of both data sets were equal. After that, property damage and crop damage exponents are used to calculate their total values. Property Damage and Crop Damage total values are determiend by their exponent. B/b = Billions, M/m = Millions, K/k = Thousands, H/h = Hundreds. The remaining values were ignored.

Lastly, the data is separated into top 10 Events by Fatalities, Injuries, and Damage. Damage is calculated as the sum of Property Damage and Crop Damage.

options(scipen=999)
Data <- rawData %>% 
    select(EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP) %>% 
    filter(FATALITIES >0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0) %>% 
    mutate(PROPDMG = ifelse(PROPDMGEXP == "B" | PROPDMGEXP == "b", PROPDMG*1000000000,  
                     ifelse(PROPDMGEXP == "M" | PROPDMGEXP == "m", PROPDMG*1000000,    
                     ifelse(PROPDMGEXP == "K" | PROPDMGEXP == "k", PROPDMG*1000,
                     ifelse(PROPDMGEXP == "H" | PROPDMGEXP == "h", PROPDMG*100,PROPDMG)))),
           CROPDMG = ifelse(CROPDMGEXP == "B" | CROPDMGEXP == "b", CROPDMG*1000000000,
                     ifelse(CROPDMGEXP == "M" | CROPDMGEXP == "m", CROPDMG*1000000,
                     ifelse(CROPDMGEXP == "K" | CROPDMGEXP == "k", CROPDMG*1000,
                     ifelse(CROPDMGEXP == "H" | CROPDMGEXP == "h", CROPDMG*100,CROPDMG)))),
           Total_Damage = PROPDMG + CROPDMG) %>% 
    select(-PROPDMGEXP,-CROPDMGEXP,-PROPDMG,-CROPDMG)

# Change names to something more readable
names(Data) <- c("Event","Fatalities","Injuries","TotalDamage")


# Separated the data into subsets and aggregate them by Event. 
# We also set the order to be descending by event to make the graphs easier to discern

TopFatalities <- aggregate(Fatalities ~ Event, data = Data, FUN = sum)
TopFatalities <- arrange(TopFatalities,desc(Fatalities))
TopFatalities <- head(TopFatalities,10)
TopFatalities$Event <- factor(TopFatalities$Event, levels = TopFatalities$Event[order(-TopFatalities$Fatalities)])

TopInjuries <- aggregate(Injuries ~ Event, data = Data, FUN = sum)
TopInjuries <- arrange(TopInjuries,desc(Injuries))
TopInjuries <- head(TopInjuries,10)
TopInjuries$Event <- factor(TopInjuries$Event, levels = TopInjuries$Event[order(-TopInjuries$Injuries)])

TopDamage <- aggregate(TotalDamage ~ Event, data = Data, FUN = sum)
TopDamage <- arrange(TopDamage, desc(TotalDamage))
TopDamage <- head(TopDamage,10)
TopDamage$Event <- factor(TopDamage$Event, levels = TopDamage$Event[order(-TopDamage$TotalDamage)])

# Remove unused data
rm(Data,rawData)

Results

Now that the data is separated into proper subsets, the above questions can be answered. First, the most fatal events are determined, followed by the events causing the most injuries, and lastly, the events that are the most economically damaging.

Fatalities

The graph and table below clearly depict tornados as the most fatal event. Directly after that comes excessive heat. So stay away from the hottest tornado areas!

Fplot <- ggplot(TopFatalities, aes(x=Event, y=Fatalities))

Fplot <- Fplot + geom_bar(fill = "red", stat = "identity") + theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(hjust = .5)) + 
         xlab("Event Type") + ylab("Number of Fatalities") + ggtitle("Top 10 Fatal Weather Events")

print(Fplot)

kable(TopFatalities, format.args = list(big.mark = ","),format = "html") %>% 
    kable_styling(bootstrap_options = "striped", full_width = FALSE)
Event Fatalities
TORNADO 5,633
EXCESSIVE HEAT 1,903
FLASH FLOOD 978
HEAT 937
LIGHTNING 816
TSTM WIND 504
FLOOD 470
RIP CURRENT 368
HIGH WIND 248
AVALANCHE 224

Injuries

Not surprisingly, tornados also incur the most injuries by a large margin. Thunderstorm winds, floods, and excessive heat all have relatively similar values.

Iplot <- ggplot(TopInjuries, aes(x=Event, y=Injuries))

Iplot <- Iplot + geom_bar(fill = "orange", stat = "identity") + theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(hjust = .5)) + 
         xlab("Event Type") + ylab("Number of Injuries") + ggtitle("Top 10 Weather Event Causing Injury")

print(Iplot)

kable(TopInjuries, format.args = list(big.mark = ","),format = "html") %>% 
    kable_styling(bootstrap_options = "striped", full_width = FALSE)
Event Injuries
TORNADO 91,346
TSTM WIND 6,957
FLOOD 6,789
EXCESSIVE HEAT 6,525
LIGHTNING 5,230
HEAT 2,100
ICE STORM 1,975
FLASH FLOOD 1,777
THUNDERSTORM WIND 1,488
HAIL 1,361

Economic Damage

Floods cause the most damage to both crops and property. Anyone who has dealth with catostraphic insurance can relate to this. Behind them comes hurricane/typhoon and tornados.

Dplot <- ggplot(TopDamage, aes(x=Event, y=TotalDamage/10^9))

Dplot <- Dplot + geom_bar(fill = "blue", stat = "identity") + theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(hjust = .5)) + 
         xlab("Event Type") + ylab("Damage (In Billions)") + ggtitle("Top 10 Damaging Weather Event in U.S. Dollars")

print(Dplot)

kable(TopDamage, format.args = list(big.mark = ","),format = "html") %>% 
    kable_styling(bootstrap_options = "striped", full_width = FALSE)
Event TotalDamage
FLOOD 150,319,678,257
HURRICANE/TYPHOON 71,913,712,800
TORNADO 57,352,114,049
STORM SURGE 43,323,541,000
HAIL 18,758,222,016
FLASH FLOOD 17,562,129,167
DROUGHT 15,018,672,000
HURRICANE 14,610,229,010
RIVER FLOOD 10,148,404,500
ICE STORM 8,967,041,360

Conclusion

Though tornados do not cause as much economic damage, they are a great risk to great injuries. It’s strongly advised to have plans in place if you are in an area with frequent tornados. If you are in an area that is a risk to flooding, make sure you get flood insurance!