Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This analysis aims to provide answers to questions that help to prepare for severe weather events and prioritization of resources for different types of weather events. In both cases, it shows that according to data from the the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, tornados are the most harmful weather events.
This project uses the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database which tracks characteristics of major storms and weather events in the United States.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The file can be obtained here.
Documentation on the database can be found at these locations:
This analysis was conducted in R, using these packages:
Data was loaded with the data table package. For ease of use variable names were converted to lowercase and made unique if needed.
library(data.table)
myData <-
fread(
"stormdata.csv",
sep = ",",
na.strings = "",
stringsAsFactors = TRUE,
)
##
Read 32.1% of 967216 rows
Read 55.8% of 967216 rows
Read 74.4% of 967216 rows
Read 84.8% of 967216 rows
Read 902297 rows and 37 (of 37) columns from 0.523 GB file in 00:00:06
names(myData) <- make.names(tolower(names(myData)), unique = TRUE)
Key questions addressed are 1. Across the United States, which types of events are most harmful with respect to population health? 2. Across the United States, which types of events have the greatest economic consequences?
Health impact was assessed as the total number of fatalities and injuries caused by eacht type of event. To understand this, data was first aggregated into a new data frame using the dplyr package. For each of use the variable names of the new data frame were replaced. The data was then sorted by health impact, in descending order. Finally the data was plotted into a bar plot using the ggplot package. For readability, the plot displays only the 10 most impactful events.
library(dplyr)
library(ggplot2)
health <- aggregate(data = myData, fatalities + injuries ~ evtype, sum)
names(health) <- c("event_type", "health_impact")
healthSort <- arrange(health, desc(health_impact))
plot1 <- ggplot(data = healthSort[1:10, 1:2],
aes(x = reorder(event_type, health_impact),
y = health_impact)) +
geom_bar(stat = "identity", na.rm = TRUE, fill = "steelblue") +
coord_flip() +
theme_minimal() +
theme(plot.margin = unit(c(0.5,2,0.5,1), "cm")) +
labs(title = "Top 10 health hazards",
x = "Event type",
y = "Health impact")
head(healthSort, 10)
## event_type health_impact
## 1 TORNADO 96979
## 2 EXCESSIVE HEAT 8428
## 3 TSTM WIND 7461
## 4 FLOOD 7259
## 5 LIGHTNING 6046
## 6 HEAT 3037
## 7 FLASH FLOOD 2755
## 8 ICE STORM 2064
## 9 THUNDERSTORM WIND 1621
## 10 WINTER STORM 1527
print(plot1)
Health impact was assessed as the total number of property and crop damages caused by eacht type of event. Data aggregation, sorting and visualization were done in the same manner as for the health impact analysis. For ease of use, the plot expresses ecomomic impact in thousands.
economics <- aggregate(data = myData, propdmg + cropdmg ~ evtype, sum)
names(economics) <- c("event_type", "economic_impact")
economics$economic_impact <- round(economics$economic_impact)
ecoSort <- arrange(economics, desc(economic_impact))
plot2 <- ggplot(data = ecoSort[1:10, 1:2],
aes(x = reorder(event_type, economic_impact),
y = economic_impact)) +
scale_y_continuous(labels = function(x)x/1000) +
geom_bar(stat = "identity", na.rm = TRUE, fill = "indianred3") +
coord_flip() +
theme_minimal() +
theme(plot.margin = unit(c(0.5,2,0.5,1), "cm")) +
labs(title = "Top 10 economic hazards",
x = "Event type",
y = "Economic impact (1000)")
head(ecoSort, 10)
## event_type economic_impact
## 1 TORNADO 3312277
## 2 FLASH FLOOD 1599325
## 3 TSTM WIND 1445168
## 4 HAIL 1268290
## 5 FLOOD 1067976
## 6 THUNDERSTORM WIND 943636
## 7 LIGHTNING 606932
## 8 THUNDERSTORM WINDS 464978
## 9 HIGH WIND 342015
## 10 WINTER STORM 134700
print(plot2)