Synopsis - Course Project 2 Reproducible Research Goal: Exploratory analysis with the intention of answering the following questions:
Across the United States, which types of events are most harmful with respect to population health? Across the United States, which types of events have the greatest economic consequences? Results: In general, the magnitude of adverse human health affects scales with the magnitude of economic damage caused by different storm types. These results do not exclude outliers or leverage points.
Disclaimer: there are a lot of repeat event types, similar spellings, miss-spellings, etc. This means that the EVTYPE data can be grouped in many different ways that would take too long to explore for this assignment. I will be treating every spelling as a distinct event type.
Data processing ## R Markdown
stormdata_raw <- read.csv(file = "C:/Users/Admin/Downloads/repdata_data_StormData.csv.bz2",
header = TRUE, na.strings = "")
head(stormdata_raw)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 <NA> <NA> <NA> <NA> 0 NA
## 2 0 <NA> <NA> <NA> <NA> 0 NA
## 3 0 <NA> <NA> <NA> <NA> 0 NA
## 4 0 <NA> <NA> <NA> <NA> 0 NA
## 5 0 <NA> <NA> <NA> <NA> 0 NA
## 6 0 <NA> <NA> <NA> <NA> 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 <NA> <NA> 14.0 100 3 0 0 15 25.0
## 2 0 <NA> <NA> 2.0 150 2 0 0 0 2.5
## 3 0 <NA> <NA> 0.1 123 2 0 0 2 25.0
## 4 0 <NA> <NA> 0.0 100 2 0 0 2 2.5
## 5 0 <NA> <NA> 0.0 150 2 0 0 2 2.5
## 6 0 <NA> <NA> 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 <NA> <NA> <NA> <NA> 3040 8812
## 2 K 0 <NA> <NA> <NA> <NA> 3042 8755
## 3 K 0 <NA> <NA> <NA> <NA> 3340 8742
## 4 K 0 <NA> <NA> <NA> <NA> 3458 8626
## 5 K 0 <NA> <NA> <NA> <NA> 3412 8642
## 6 K 0 <NA> <NA> <NA> <NA> 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 <NA> 1
## 2 0 0 <NA> 2
## 3 0 0 <NA> 3
## 4 0 0 <NA> 4
## 5 0 0 <NA> 5
## 6 0 0 <NA> 6
str(stormdata_raw)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr NA NA NA NA ...
## $ BGN_LOCATI: chr NA NA NA NA ...
## $ END_DATE : chr NA NA NA NA ...
## $ END_TIME : chr NA NA NA NA ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr NA NA NA NA ...
## $ END_LOCATI: chr NA NA NA NA ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr NA NA NA NA ...
## $ WFO : chr NA NA NA NA ...
## $ STATEOFFIC: chr NA NA NA NA ...
## $ ZONENAMES : chr NA NA NA NA ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr NA NA NA NA ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
Which events are most harmful? With respect to human health: So what is population health? Lets define population health as the total number of fatalities plus the total number of injuries (POP_HEALTH)
With respect to economic consequences: Lets define economic consequences (Econ) as the property + crop damage - and we’ll need to write a function to multiple in value (K, M, and B)
New data frame for processing:
stormdata <- stormdata_raw
head(stormdata)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 <NA> <NA> <NA> <NA> 0 NA
## 2 0 <NA> <NA> <NA> <NA> 0 NA
## 3 0 <NA> <NA> <NA> <NA> 0 NA
## 4 0 <NA> <NA> <NA> <NA> 0 NA
## 5 0 <NA> <NA> <NA> <NA> 0 NA
## 6 0 <NA> <NA> <NA> <NA> 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 <NA> <NA> 14.0 100 3 0 0 15 25.0
## 2 0 <NA> <NA> 2.0 150 2 0 0 0 2.5
## 3 0 <NA> <NA> 0.1 123 2 0 0 2 25.0
## 4 0 <NA> <NA> 0.0 100 2 0 0 2 2.5
## 5 0 <NA> <NA> 0.0 150 2 0 0 2 2.5
## 6 0 <NA> <NA> 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 <NA> <NA> <NA> <NA> 3040 8812
## 2 K 0 <NA> <NA> <NA> <NA> 3042 8755
## 3 K 0 <NA> <NA> <NA> <NA> 3340 8742
## 4 K 0 <NA> <NA> <NA> <NA> 3458 8626
## 5 K 0 <NA> <NA> <NA> <NA> 3412 8642
## 6 K 0 <NA> <NA> <NA> <NA> 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 <NA> 1
## 2 0 0 <NA> 2
## 3 0 0 <NA> 3
## 4 0 0 <NA> 4
## 5 0 0 <NA> 5
## 6 0 0 <NA> 6
Apply multiplier to economic damage
# Convert property damage exponent
stormdata$PROPDMGEXP <- toupper(stormdata$PROPDMGEXP)
stormdata$PROPDMGEXP[stormdata$PROPDMGEXP %in% c("", "+", "-", "?")] <- "0"
stormdata$PROPDMGEXP[stormdata$PROPDMGEXP == "H"] <- "100"
stormdata$PROPDMGEXP[stormdata$PROPDMGEXP == "K"] <- "1000"
stormdata$PROPDMGEXP[stormdata$PROPDMGEXP == "M"] <- "1000000"
stormdata$PROPDMGEXP[stormdata$PROPDMGEXP == "B"] <- "1000000000"
stormdata$PROPDMGEXP <- as.numeric(stormdata$PROPDMGEXP)
stormdata$PROPDMGEXP[is.na(stormdata$PROPDMGEXP)] <- 0
# Convert crop damage exponent
stormdata$CROPDMGEXP <- toupper(stormdata$CROPDMGEXP)
stormdata$CROPDMGEXP[stormdata$CROPDMGEXP %in% c("", "+", "-", "?")] <- "0"
stormdata$CROPDMGEXP[stormdata$CROPDMGEXP == "K"] <- "1000"
stormdata$CROPDMGEXP[stormdata$CROPDMGEXP == "M"] <- "1000000"
stormdata$CROPDMGEXP[stormdata$CROPDMGEXP == "B"] <- "1000000000"
stormdata$CROPDMGEXP <- as.numeric(stormdata$CROPDMGEXP)
stormdata$CROPDMGEXP[is.na(stormdata$CROPDMGEXP)] <- 0
# Compute total damage
stormdata <- stormdata %>%
mutate(TOTALPROPDMG = PROPDMG * PROPDMGEXP,
TOTALCROPDMG = CROPDMG * CROPDMGEXP,
TOTALDMG = TOTALPROPDMG + TOTALCROPDMG)
Events Most Harmful to Population Health
health_impact <- stormdata %>%
group_by(EVTYPE) %>%
summarise(FATALITIES = sum(FATALITIES, na.rm = TRUE),
INJURIES = sum(INJURIES, na.rm = TRUE)) %>%
mutate(TOTAL_HEALTH_IMPACT = FATALITIES + INJURIES) %>%
arrange(desc(TOTAL_HEALTH_IMPACT)) %>%
top_n(10, TOTAL_HEALTH_IMPACT)
ggplot(health_impact, aes(x = reorder(EVTYPE, -TOTAL_HEALTH_IMPACT), y = TOTAL_HEALTH_IMPACT)) +
geom_col(fill = "steelblue") +
labs(title = "Top 10 Weather Events by Population Health Impact",
x = "Event Type", y = "Total Fatalities + Injuries") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Events With Greatest Economic Consequences
economic_impact <- stormdata %>%
group_by(EVTYPE) %>%
summarise(TOTAL_ECONOMIC_DAMAGE = sum(TOTALDMG, na.rm = TRUE)) %>%
arrange(desc(TOTAL_ECONOMIC_DAMAGE)) %>%
top_n(10, TOTAL_ECONOMIC_DAMAGE)
ggplot(economic_impact, aes(x = reorder(EVTYPE, -TOTAL_ECONOMIC_DAMAGE), y = TOTAL_ECONOMIC_DAMAGE / 1e9)) +
geom_col(fill = "darkgreen") +
labs(title = "Top 10 Weather Events by Economic Damage",
x = "Event Type", y = "Total Damage (Billion USD)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))