Weather Events and their Impacts on Human Health and Economics
Synopsis
A photo of Hurricane Florence taken from the ISS (Courtesy: NASA)
The NOAA Storm Database receives Storm Data from the National Weather Service from across the US. This project aims to quantify the impact of various documented storms from an Economic as well as Human perspective. The idea is to compare and contrast various events to answer the following questions:
Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
This work was done as a part of a project towards the completion of the Reproducible Research course in the Data Science Specialization. This knitr generated publication documents all the work done (in R) towards the completion of said project.
Read the full data documentation
Load up all dependencies and set global variables
Download Data if Missing
Load Data onto R
The Data
Table
Columns
| colnames(df) | |
|---|---|
| 1 | STATE__ |
| 2 | BGN_DATE |
| 3 | BGN_TIME |
| 4 | TIME_ZONE |
| 5 | COUNTY |
| 6 | COUNTYNAME |
| 7 | STATE |
| 8 | EVTYPE |
| 9 | BGN_RANGE |
| 10 | BGN_AZI |
| 11 | BGN_LOCATI |
| 12 | END_DATE |
| 13 | END_TIME |
| 14 | COUNTY_END |
| 15 | COUNTYENDN |
| 16 | END_RANGE |
| 17 | END_AZI |
| 18 | END_LOCATI |
| 19 | LENGTH |
| 20 | WIDTH |
| 21 | F |
| 22 | MAG |
| 23 | FATALITIES |
| 24 | INJURIES |
| 25 | PROPDMG |
| 26 | PROPDMGEXP |
| 27 | CROPDMG |
| 28 | CROPDMGEXP |
| 29 | WFO |
| 30 | STATEOFFIC |
| 31 | ZONENAMES |
| 32 | LATITUDE |
| 33 | LONGITUDE |
| 34 | LATITUDE_E |
| 35 | LONGITUDE_ |
| 36 | REMARKS |
| 37 | REFNUM |
Data Processing
Data Subsetting
Since we’re only interested in:
- Event Type
EVTYPE - Fatalities
FATALITIES - Injuries
INJURIES - Damange to Property
PROPDMGEPROPDMGEXP - Damange to Crops
CROPDMGECROPDMGEXP
Quantifying Data
Making the Data Visualization-ready
The dataset used some interesting notation to represent the amount of damage done to their crops and property. For example,
is represented in two columns as,
Let’s have a look at the "xEXP" values present in the dataset.
[1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
[1] M K m B ? 0 k 2
Levels: ? 0 2 B k K m M
Corresponding to each unique symbol, we need to assign a numeric value. Let’s create key-value pairs with our assigned exponents as values to facilitate our transition from this current state (shown here) to this.
1. Change all xEXP entries to uppercase.
2. Map property and crop damage alphanumeric exponents to numeric values.
propDmgKey <- c("\"\"" = 10^0,
"-" = 10^0,
"+" = 10^0,
"0" = 10^0,
"1" = 10^1,
"2" = 10^2,
"3" = 10^3,
"4" = 10^4,
"5" = 10^5,
"6" = 10^6,
"7" = 10^7,
"8" = 10^8,
"9" = 10^9,
"H" = 10^2,
"K" = 10^3,
"M" = 10^6,
"B" = 10^9)
cropDmgKey <- c("\"\"" = 10^0,
"?" = 10^0,
"0" = 10^0,
"K" = 10^3,
"M" = 10^6,
"B" = 10^9)3. Replace the values in PROPDMGEXP with their corresponding numeric values.
dmg[, PROPDMGEXP := propDmgKey[as.character(dmg[,PROPDMGEXP])]]
dmg[is.na(PROPDMGEXP), PROPDMGEXP := 10^0 ]
dmg[, CROPDMGEXP := cropDmgKey[as.character(dmg[,CROPDMGEXP])] ]
dmg[is.na(CROPDMGEXP), CROPDMGEXP := 10^0 ]NOTE: Using mutate_at along with a key-value fetcher function for this task didn’t work as intended and instead, copied the same values across all rows. This is why it was avoided in this chunk.
4. Use mutate to create columns for PropertyDamage, CropDamage and TotalDamage and get rid of the raw column data
Results
Summary of Data
Let’s first summarize the data to find the total Economic and Manpower Damage of these Weather events
damageReport <- dmg %>%
group_by(EVTYPE) %>%
summarize(PropertyDamage = sum(PropertyDamage),
CropDamage = sum(CropDamage),
TotalDamage = sum(TotalDamage),
Injuries = sum(INJURIES),
Fatalities = sum(FATALITIES),
TotalCasualties = sum(TotalCasualties))
damageReportPrepping up the Data for Bar Plotting
Since ggplot works by plotting a set of variables grouped by a fill aesthetic, we need to melt our report data to make our report fully plot-ready. But first, to split it into its human and economic facets:
econLosses <- damageReport[ ,c("EVTYPE","PropertyDamage","CropDamage", "TotalDamage")]
humanLosses <- damageReport[, c("EVTYPE","Injuries", "Fatalities", "TotalCasualties")]Then, to order by Total Losses
econLosses <- econLosses[order(econLosses$TotalDamage, decreasing = TRUE),]
humanLosses <- humanLosses[order(humanLosses$TotalCasualties, decreasing = TRUE),]Take the Top 10 most costly events for each. The rest will merely clutter up our graph
Finally, melt the data
Plotting
# Specify Aesthetic mappings
plot <- ggplot(humanLosses, aes(x = reorder(EVTYPE, -value), y= value, fill= variable))
# Specify Bar Chart specs
plot = plot + geom_bar(stat = "identity", position = "dodge")
# Set x-axis label to blank
plot = plot + xlab(element_blank())
# Set y-axis label
plot = plot + ylab("Casualties")
# Prevent clutter around the x-axis
plot = plot + theme(axis.text.x = element_text(angle=45, hjust=1), axis.title.x = element_blank())
# Set chart title and center it
plot = plot + ggtitle("Top 10 Deadliest Events") + theme(plot.title = element_text(hjust = 0.5))
plotA Bar Plot of the Top 10 Events with the Most Reported Casualties
# Specify Aesthetic mappings
plot <- ggplot(econLosses, aes(x = reorder(EVTYPE, -value), y= value, fill= variable))
# Specify Bar Chart specs
plot = plot + geom_bar(stat = "identity", position = "dodge")
# Set y-axis label
plot = plot + ylab("Cost ($)")
# Prevent clutter around the x-axis
plot = plot + theme(axis.text.x = element_text(angle=45, hjust=1), axis.title.x = element_blank())
# Set chart title and center it
plot = plot + ggtitle("Top 10 Costliest Events") + theme(plot.title = element_text(hjust = 0.5))
plotA Bar Plot of the Top 10 Costliest Types of Weather Events
The data for Casualties is pretty clear on what the major contributor to weather-event related deaths is with Tornados taking a sizeable lead over the rest in both Injuries as well as Fatalities. The following four events are tied pretty evenly with each other while the events further down the list start appearing progressively insignificant next to one another.
On the other hand, the data on Economic Costs a steady progression a la Zipf’s Law with Floods still holding an indisputable lead over damages to Property and Crops with Typhoon and Tornados (no less) not very far behind. Interestingly, one may also notice that a few Events show extreme selectivity towards one type of Economic Resource. Upon closer inspection, however, it seems obvious why.
Concluding Words
This work was done as a part of a project towards the completion of the Reproducible Research course in the Data Science Specialization.
Keeping focus and familiarity in mind,I went with the readthedocs-esque layout available for Rmd courtesy of juba. I’ve made my full code available on github. If there are any suggestions, feel free to make them 😄
Created by Zaid Hassan aka alhazen on 2019-07-30