The goal of this analysis was to take information concerning the human and economic costs of storms and seeing which storms had the highest cost. Data was taken from the National weather service and analyzed. The data was cleaned to standardize the event types and remove any typos that were present by closeness to the correct list. The conclusion of the analysis is that Tornado Events have both the highest human cost in terms of injuries, as well as the highest economic cost.
To see the effects of event types, the data frame should be loaded and split according the the factor of event type, then the human cost in terms of injury or fatality, and the financial cost in terms of property or crop damage should be summed.
It should be noted, that the data set must be cleaned to account for spelling errors and differences in capitalization. As such, after the data is read into a dataframe, the EVTYPE column will be converted changed to match the list in the categories found in the Storm Data Documentation (p.6) with the stringdist package.
library(data.table)
library(stringdist)
df <- fread("repdata_data_StormData.csv.bz2")
correct_events <- c(
"Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought",
"Dust Devil", "Dust Storm", "Excessive Heat", "Extreme Cold/Wind Chill", "Flash Flood", "Flood", "Frost/Freeze", "Funnel Cloud", "Freezing Fog", "Hail", "Heat",
"Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane (Typhoon)", "Ice Storm", "Lake-Effect Snow", "Lakeshore Flood", "Lightning", "Marine Hail", "Marine High Wind",
"Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Surge/Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression",
"Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather"
)
find_match <- function(event, correct_events) {
distances <- stringdist::stringdist(tolower(event), tolower(correct_events), method = "lcs")
closest_match <- correct_events[which.min(distances)]
return(closest_match)
}
df$EVTYPE <- sapply(df$EVTYPE, find_match, correct_events = correct_events)
print(unique(df$EVTYPE))
## [1] "Tornado" "Strong Wind"
## [3] "Hail" "Freezing Fog"
## [5] "Heavy Snow" "Flash Flood"
## [7] "Seiche" "Winter Storm"
## [9] "Marine High Wind" "Thunderstorm Wind"
## [11] "Flood" "Hurricane (Typhoon)"
## [13] "Heavy Rain" "Lightning"
## [15] "Dense Fog" "Rip Current"
## [17] "High Wind" "Funnel Cloud"
## [19] "Heat" "Waterspout"
## [21] "Extreme Cold/Wind Chill" "Blizzard"
## [23] "Cold/Wind Chill" "Frost/Freeze"
## [25] "Coastal Flood" "High Surf"
## [27] "Drought" "Ice Storm"
## [29] "Debris Flow" "Avalanche"
## [31] "Marine Hail" "Sleet"
## [33] "Excessive Heat" "Wildfire"
## [35] "Dust Storm" "Dust Devil"
## [37] "Winter Weather" "Storm Surge/Tide"
## [39] "Tropical Storm" "Tsunami"
## [41] "Lake-Effect Snow" "Volcanic Ash"
## [43] "Tropical Depression" "Astronomical Low Tide"
## [45] "Dense Smoke" "Marine Strong Wind"
## [47] "Lakeshore Flood" "Marine Thunderstorm Wind"
sum_human_cost <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, data = df, sum)
sum_human_cost <- sum_human_cost[!(sum_human_cost$FATALITIES == 0 & sum_human_cost$INJURIES == 0), ]
sum_human_cost$COST <- sum_human_cost$FATALITIES + sum_human_cost$INJURIES
sum_financial_cost <- aggregate(cbind(PROPDMG, CROPDMG) ~ EVTYPE, data = df, sum)
sum_financial_cost <- sum_financial_cost[!(sum_financial_cost$PROPDMG == 0 & sum_financial_cost$CROPDMG), ]
sum_financial_cost$COST <- sum_financial_cost$PROPDMG + sum_financial_cost$CROPDMG
library(ggplot2)
ggplot(sum_human_cost, aes(x = EVTYPE, y = COST)) +
geom_bar(stat = "identity", fill = "blue") +
labs(title = "Injuries and Fatalities by Event", x = "Event Type", y = "Sum of Injuries and Fatalities") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
ggplot(sum_human_cost, aes(x = EVTYPE, y = FATALITIES)) +
geom_bar(stat = "identity", fill = "red") +
labs(title = "Fatalities by Event", x = "Event Type", y = "Fatalities") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
max_human <- sum_human_cost[which.max(sum_human_cost$FATALITIES), ]
max_fatalities <- sum_human_cost[which.max(sum_human_cost$COST), ]
print(max_human)
## EVTYPE FATALITIES INJURIES COST
## 40 Tornado 5633 91364 96997
ggplot(sum_human_cost, aes(x = EVTYPE, y = COST)) +
geom_bar(stat = "identity", fill = "green") +
labs(title = "Economic Cost by Event", x = "Event Type", y = "Cost in Thousands of $") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
max_economic <- sum_financial_cost[which.max(sum_financial_cost$COST), ]
print(max_economic)
## EVTYPE PROPDMG CROPDMG COST
## 40 Tornado 3214639 100028.3 3314668
It can be seen that far and away the greatest cost to both human health and the economy are Tornado Events.