Weather Events and their Impacts on Human Health and Economics

Synopsis

A photo of Hurricane Florence taken from the ISS (Courtesy: NASA)

A photo of Hurricane Florence taken from the ISS (Courtesy: NASA)

The NOAA Storm Database receives Storm Data from the National Weather Service from across the US. This project aims to quantify the impact of various documented storms from an Economic as well as Human perspective. The idea is to compare and contrast various events to answer the following questions:

  1. Across the United States, which types of events are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

This work was done as a part of a project towards the completion of the Reproducible Research course in the Data Science Specialization. This knitr generated publication documents all the work done (in R) towards the completion of said project.

Read the full data documentation

The Data

Table

Columns

colnames(df)
1 STATE__
2 BGN_DATE
3 BGN_TIME
4 TIME_ZONE
5 COUNTY
6 COUNTYNAME
7 STATE
8 EVTYPE
9 BGN_RANGE
10 BGN_AZI
11 BGN_LOCATI
12 END_DATE
13 END_TIME
14 COUNTY_END
15 COUNTYENDN
16 END_RANGE
17 END_AZI
18 END_LOCATI
19 LENGTH
20 WIDTH
21 F
22 MAG
23 FATALITIES
24 INJURIES
25 PROPDMG
26 PROPDMGEXP
27 CROPDMG
28 CROPDMGEXP
29 WFO
30 STATEOFFIC
31 ZONENAMES
32 LATITUDE
33 LONGITUDE
34 LATITUDE_E
35 LONGITUDE_
36 REMARKS
37 REFNUM

Data Processing

Data Subsetting

Since we’re only interested in:

  • Event Type EVTYPE
  • Fatalities FATALITIES
  • Injuries INJURIES
  • Damange to Property PROPDMGEPROPDMGEXP
  • Damange to Crops CROPDMGECROPDMGEXP

Quantifying Data

Making the Data Visualization-ready

The dataset used some interesting notation to represent the amount of damage done to their crops and property. For example,

is represented in two columns as,

Let’s have a look at the "xEXP" values present in the dataset.

 [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
[1]   M K m B ? 0 k 2
Levels:  ? 0 2 B k K m M

Corresponding to each unique symbol, we need to assign a numeric value. Let’s create key-value pairs with our assigned exponents as values to facilitate our transition from this current state (shown here) to this.

3. Replace the values in PROPDMGEXP with their corresponding numeric values.

NOTE: Using mutate_at along with a key-value fetcher function for this task didn’t work as intended and instead, copied the same values across all rows. This is why it was avoided in this chunk.

5. Preview the Dataframe before proceeding

Results

Summary of Data

Let’s first summarize the data to find the total Economic and Manpower Damage of these Weather events

Prepping up the Data for Bar Plotting

Since ggplot works by plotting a set of variables grouped by a fill aesthetic, we need to melt our report data to make our report fully plot-ready. But first, to split it into its human and economic facets:

Then, to order by Total Losses

Take the Top 10 most costly events for each. The rest will merely clutter up our graph

Finally, melt the data

Plotting

A Bar Plot of the Top 10 Events with the Most Reported Casualties

A Bar Plot of the Top 10 Events with the Most Reported Casualties

A Bar Plot of the Top 10 Costliest Types of Weather Events

A Bar Plot of the Top 10 Costliest Types of Weather Events

The data for Casualties is pretty clear on what the major contributor to weather-event related deaths is with Tornados taking a sizeable lead over the rest in both Injuries as well as Fatalities. The following four events are tied pretty evenly with each other while the events further down the list start appearing progressively insignificant next to one another.

On the other hand, the data on Economic Costs a steady progression a la Zipf’s Law with Floods still holding an indisputable lead over damages to Property and Crops with Typhoon and Tornados (no less) not very far behind. Interestingly, one may also notice that a few Events show extreme selectivity towards one type of Economic Resource. Upon closer inspection, however, it seems obvious why.

Concluding Words

This work was done as a part of a project towards the completion of the Reproducible Research course in the Data Science Specialization.

Keeping focus and familiarity in mind,I went with the readthedocs-esque layout available for Rmd courtesy of juba. I’ve made my full code available on github. If there are any suggestions, feel free to make them 😄


Created by Zaid Hassan aka alhazen on 2019-07-30

Zaid Hassan

2019-07-30