This report explores the impact of severe weather events in the United States on public health and the economy between 1950 and 2011, using data from the U.S National Oceanic and Atmospheric Administric (NOAA). The analysis identifies weather events that caused the most fatalities and injuries during this time period, as well as those that caused to highest economic damages. The insight provided by this report can aid to highlight which types of events should be prioritized for preparedness for future weather disaster prevention and mitigation.
The data used in this analysis comes from the U.S. National Oceanic
and Atmospheric Administration’s (NOAA) storm database. It includes
details of major storms and weather events in the United States from
1950 to November 2011. The raw data is provided in a
comma-separated-value (CSV) format compressed via the bzip2
algorithm.
We first load the necessary R packages and read the compressed CSV file directly into R.
# Define the file path
data_path <- "C:/Users/nater/OneDrive/Documents/coursera/reprodR/assignment2/repdata_data_StormData.csv.bz2"
# Load the dataset
storm_data <- read.csv(data_path, stringsAsFactors = FALSE)
#load data
data_path <- "C:/Users/nater/OneDrive/Documents/coursera/reprodR/assignment2/repdata_data_StormData.csv.bz2"
storm_data <- read.csv(data_path, stringsAsFactors = FALSE)
#look at data structure
str(storm_data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
###Next, we select the relevant columns for our analysis: EVTYPE (event type), FATALITIES, INJURIES, PROPDMG (property damage), PROPDMGEXP (property damage exponent), CROPDMG (crop damage), and CROPDMGEXP (crop damage exponent).
#refine data, subset columns
storm_subset <- storm_data %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
### economic damage values are split into two parts: the base number and an exponent that defines the multiplier. We convert these exponents into numeric values and then calculate the total property, crop, and combined damage
# convert exponents to numeric values
convert_exp <- function(exp) {
exp <- toupper(trimws(exp))
ifelse(exp == "K", 1e3,
ifelse(exp == "M", 1e6,
ifelse(exp == "B", 1e9, 1)))
}
storm_subset <- storm_subset %>%
mutate(
PROPDMGEXP = convert_exp(PROPDMGEXP),
CROPDMGEXP = convert_exp(CROPDMGEXP),
PROP_DAMAGE = PROPDMG * PROPDMGEXP,
CROP_DAMAGE = CROPDMG * CROPDMGEXP,
TOTAL_DAMAGE = PROP_DAMAGE + CROP_DAMAGE
)
### Data is now cleaned and ready for analysis
## Plotting the data
health_impact <- storm_subset %>%
group_by(EVTYPE) %>%
summarise(
Fatalities = sum(FATALITIES, na.rm = TRUE),
Injuries = sum(INJURIES, na.rm = TRUE),
TotalHealthImpact = Fatalities + Injuries
) %>%
arrange(desc(TotalHealthImpact)) %>%
head(10)
fig.cap="Figure 1: This plot shows the impact each weather event has in terms of total injuries and deaths recorded, and gathered by NOAA."
ggplot(health_impact, aes(x = reorder(EVTYPE, -TotalHealthImpact), y = TotalHealthImpact)) +
geom_col(fill = "steelblue") +
labs(title = "Top 10 Weather Events by Health Impact",
x = "Event Type", y = "Fatalities + Injuries") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# events with economic impact
economic_impact <- storm_subset %>%
group_by(EVTYPE) %>%
summarise(TotalDamage = sum(TOTAL_DAMAGE, na.rm = TRUE)) %>%
arrange(desc(TotalDamage)) %>%
head(10)
### Figure 2: Economic Impact of Weather Events
# plot
ggplot(economic_impact, aes(x = reorder(EVTYPE, -TotalDamage), y = TotalDamage / 1e9)) +
geom_col(fill = "darkred") +
labs(title = "Top 10 Weather Events by Economic Damage",
x = "Event Type", y = "Total Damage (in Billions USD)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
### Figure 2: Economic Impact of Weather Events
This plot shows the impact each weather event has in terms of total property and crop damage, as recorded by NOAA.
By plotting our data by event type and total amount of fatalities and injuries, we can see that Tornado are overwhelmingly costly to Population health, towering above other event types. This may seem like an error in the data based on how drastically different it appears, but we can verify this with an extra couple functions:
tornado_health_impact <- storm_subset %>%
filter(EVTYPE == "TORNADO") %>%
summarise(
TotalFatalities = sum(FATALITIES, na.rm = TRUE),
TotalInjuries = sum(INJURIES, na.rm = TRUE),
TotalHealthImpact = TotalFatalities + TotalInjuries
)
# View the calculated totals
tornado_health_impact
## TotalFatalities TotalInjuries TotalHealthImpact
## 1 5633 91346 96979
# Sum the fatalities and injuries for tornadoes
flood_health_impact <- storm_subset %>%
filter(EVTYPE == "FLOOD") %>%
summarise(
TotalFatalities = sum(FATALITIES, na.rm = TRUE),
TotalInjuries = sum(INJURIES, na.rm = TRUE),
TotalHealthImpact = TotalFatalities + TotalInjuries
)
# View the calculated totals
flood_health_impact
## TotalFatalities TotalInjuries TotalHealthImpact
## 1 470 6789 7259
This verification step shows that tornadoes are in fact wildly more dangerous in terms of the risk to public health.
By plotting event type by Damage(in USD billions), we can see that floods cause the most economic damage, with hurricanes coming in second at about half that value.