The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database is a collection of data related to weather events, containing information about location, date, cost and casualties of several incidents from 1966 to 2001. This report focuses on finding which events cause the majority of injuries and deaths and which ones cause the biggest expenses.
The data is loaded in the stormdata data frame, selecting only the necessary columns, namely:
EVTYPE, identifying the type of weather event of the entry;FATALITIES and INJURIES, for the number of health-related issues;PROPDMG and PROPDMGEXP, for the value in property damage; andCROPDMG and CROPDMGEXP, for the value in crops damage.Regarding health damage, the most harmful events can be found by computing the total deaths and injuries for each of them and filtering the ones with the most casualties.
most.casualties = stormdata %>%
group_by(EVTYPE) %>%
summarise(Total.Fatalities = sum(FATALITIES), Total.Injuries = sum(INJURIES)) %>%
arrange(-(Total.Fatalities + Total.Injuries)) %>%
head(5) %>%
mutate(EVTYPE = factor(EVTYPE, levels = EVTYPE))Those events are shown in the following chart.
barchart(Total.Injuries + Total.Fatalities ~ EVTYPE,
data = most.casualties,
origin = 0,
main = 'Health Impact for 5 Major Types of Events',
xlab = 'Event', ylab = 'Occurrences',
ylim = c(-3000, 100001),
par.settings = list(superpose.polygon = list(col = c('#ED6864','#9587AB'))),
# scales = list(x = list(rot = 90)),
key = list(text = list(c('Injuries', 'Fatalities')),
space = 'right',
rectangles = list(col = c('#ED6864','#9587AB')),
lineheight = 1, padding.text = 5),
panel = function(...){
panel.grid(v = 0, h = -1)
panel.barchart(...)})Similarly, the events with the greatest economic consequences can be found by compute the total expense in property and crops caused by each of them. However, the expenses are codified as a base value and an exponent multiplier, following the table below.
| Symbol | Multiplier |
|---|---|
| h or H | \(10^2\) |
| k or K | \(10^3\) |
| m or M | \(10^6\) |
| b or B | \(10^9\) |
| 0 to 8 | \(10^0\) to \(10^8\) |
Using this relation, it’s possible to find the required events.
s2exp = c(1E2, 1E2, 1E3, 1E3, 1E6, 1E6, 1E9, 1E9, 10^(0:8))
names(s2exp) = c('h', 'H', 'k', 'K', 'm', 'M', 'b', 'B', 0:8)
most.expenses = stormdata %>%
group_by(EVTYPE) %>%
filter(PROPDMGEXP %in% names(s2exp),
CROPDMGEXP %in% names(s2exp)) %>%
summarise(Prop.Exp = sum(PROPDMG * s2exp[as.character(PROPDMGEXP)], na.rm = T),
Crop.Exp = sum(CROPDMG * s2exp[as.character(CROPDMGEXP)], na.rm = T),
Total.Expenses = sum(Prop.Exp + Crop.Exp)) %>%
arrange(-Total.Expenses) %>%
head(5) %>%
mutate(EVTYPE = reorder(EVTYPE, -Total.Expenses))Those events are shown in the chart below.
barchart(Prop.Exp + Crop.Exp ~ EVTYPE,
data = most.expenses,
main = 'Economic Impact for 5 Major Types of Events',
xlab = 'Event', ylab = 'Damage (Dollars)',
origin = 0, ylim = c(-1e10, 1.5e11),
par.settings = list(superpose.polygon = list(col = c('#ED6864','#9587AB'))),
scales = list(y = list(at = pretty(c(0, 1.5e11), 7)),
x = list(label = gsub('/', ' &\n ', most.expenses$EVTYPE))),
key = list(title = 'Type', text = list(c('Property', 'Crops')),
rectangles = list(col = c('#ED6864','#9587AB')),
space = 'right', lineheight = 1, padding.text = 5),
panel = function(...){
panel.grid(v = 0, h = -7)
panel.barchart(...)})The barcharts clearly show the preponderance of tornados and floods in the impact over health and expenses, respectively. These events are followed by others such as heat, thunderstorms and lightning, with regard to casualties, and hurricanes, in connection with expenses.