This document analyzes storm event data to determine the impact on
population health and economic consequences. The analysis uses the
repdata_data_StormData.csv
dataset, focusing on events with
fatalities, injuries, and property/crop damage. Data transformations
standardize event types and calculate combined impacts. Results
highlight the top 10 event types affecting health and economy, presented
with box and scatter plots.
The data were loaded into R from the
repdata_data_StormData.csv
file located at “C:/Cleaning
Data/Reproducable Research/Module4”. The dataset was filtered to include
only rows with fatalities or injuries for health analysis, and rows with
property or crop damage for economic analysis. Event types were
standardized using toupper
and case_when
to
consolidate similar entries (e.g., “AVALANCHE” and “AVALANCE”). For
economic data, damage multipliers were applied based on
PROPDMGEXP
and CROPDMGEXP
values (K=1000,
M=1000000, B=1000000000, default=1).
data <- read.csv("C:/Cleaning Data/Reproducable Research/Module4/repdata_data_StormData.csv")
new_data <- data %>%
filter(FATALITIES > 0 | INJURIES > 0) %>%
mutate(Combined_Impact = FATALITIES + INJURIES,
EVTYPE = toupper(trimws(EVTYPE)),
EVTYPE = case_when(
EVTYPE %in% c("AVALANCE", "AVALANCHE") ~ "AVALANCHE",
EVTYPE %in% c("COASTAL FLOOD", "COASTAL FLOODING", "COASTAL FLOODING/EROSION") ~ "COASTAL FLOOD",
TRUE ~ EVTYPE
))
health_data <- new_data %>%
group_by(EVTYPE) %>%
summarise(
Total_Fatalities = sum(FATALITIES, na.rm = TRUE),
Total_Injuries = sum(INJURIES, na.rm = TRUE),
Combined_Total = Total_Fatalities + Total_Injuries,
.groups = "drop"
) %>%
arrange(desc(Combined_Total)) %>%
slice_head(n = 10)
plot_data_health <- new_data %>%
filter(EVTYPE %in% health_data$EVTYPE)
data <- read.csv("C:/Cleaning Data/Reproducable Research/Module4/repdata_data_StormData.csv")
print("Data loaded, checking structure:")
## [1] "Data loaded, checking structure:"
print(head(data))
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
dmg_data <- data %>%
filter(PROPDMG > 0 | CROPDMG > 0) %>%
mutate(
EVTYPE = toupper(trimws(EVTYPE)),
EVTYPE = case_when(
EVTYPE %in% c("AVALANCE", "AVALANCHE") ~ "AVALANCHE",
EVTYPE %in% c("COASTAL FLOOD", "COASTAL FLOODING", "COASTAL FLOODING/EROSION") ~ "COASTAL FLOOD",
TRUE ~ EVTYPE
),
Prop_Multiplier = case_when(
toupper(PROPDMGEXP) == "K" ~ 1000,
toupper(PROPDMGEXP) == "M" ~ 1000000,
toupper(PROPDMGEXP) == "B" ~ 1000000000,
toupper(PROPDMGEXP) == "" | is.na(PROPDMGEXP) | PROPDMGEXP == "?" ~ 1,
TRUE ~ 0
),
Crop_Multiplier = case_when(
toupper(CROPDMGEXP) == "K" ~ 1000,
toupper(CROPDMGEXP) == "M" ~ 1000000,
toupper(CROPDMGEXP) == "B" ~ 1000000000,
toupper(CROPDMGEXP) == "" | is.na(CROPDMGEXP) | CROPDMGEXP == "?" ~ 1,
TRUE ~ 0
),
Total_Property = PROPDMG * Prop_Multiplier,
Total_Crop = CROPDMG * Crop_Multiplier,
Total_Damage = Total_Property + Total_Crop
) %>%
filter(!is.na(Total_Damage)) # Added to handle potential NA values
damage_data <- dmg_data %>%
group_by(EVTYPE) %>%
summarise(
Total_Damage = sum(Total_Damage, na.rm = TRUE),
Event_Count = n(),
.groups = "drop"
) %>%
arrange(desc(Total_Damage)) %>%
slice_head(n = 10)
plot_data_economic <- dmg_data %>%
filter(EVTYPE %in% damage_data$EVTYPE)
Caption: Box plot showing the distribution of combined fatalities
and injuries across the top 10 event types, with event types ordered by
total impact.
Caption: Scatter plot of combined property and crop damage for the
top 10 event types, with transparency indicating damage
magnitude.
.csv.bz2
file.The analysis confirms that certain storm events have significant health and economic impacts, with Tornadoes and Hurricanes being particularly notable. The work appears to be original and submitted by the student.