Instructions

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities.

This assignment explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The basic goal is to use the database to answer the questions below and show the code for the entire analysis.

Data

The data for this assignment come in the form of the csv file compressed via the bzip2 algorithm to reduce its size. The data for this assignment can be downloaded from here

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Synopsis

The analysis below aims to answer two questions: 1. Across the United States, which types of events are the most harmful to population health?
2. Across the United States, which types of events have the greatest economic consequences?

Data Preprocessing

First, we want to load all the necessary packages.

library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

Then, we import and read the dataset. We call it “weatherData”.

weatherData <- read_csv("~/Downloads/repdata_data_StormData.csv.bz2")
## Rows: 902297 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): BGN_DATE, BGN_TIME, TIME_ZONE, COUNTYNAME, STATE, EVTYPE, BGN_AZI,...
## dbl (18): STATE__, COUNTY, BGN_RANGE, COUNTY_END, END_RANGE, LENGTH, WIDTH, ...
## lgl  (1): COUNTYENDN
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Now, we inspect the data.

names(weatherData)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Data Processing

Find the weather events that are the most harmful to population health

For the sake of the analysis, we define population health by looking at fatalities and injuries caused by weather event. Let’s look at them separately.

First, we want to create a smaller dataset that is grouped by event type, so we can see what is the total number of fatalities caused by each weather event.

fatalities_weatherData<-weatherData%>%
  group_by(EVTYPE)%>%
  summarise(totalFATALITIES=sum(FATALITIES))%>%
  arrange(desc(totalFATALITIES))

Now let’s do the same for the total number of injuries, following the same method as above.

injuries_weatherData<-weatherData%>%
  group_by(EVTYPE)%>%
  summarise(totalINJURIES=sum(INJURIES))%>%
  arrange(desc(totalINJURIES))

Now let’s look at what kind of weather event causes the most injuries AND fatalities. For this, we want to subset the dataset.

weatherData$total <- rowSums(weatherData[, 23:24], na.rm = TRUE)
topHealth<-weatherData%>%
  group_by(EVTYPE)%>%
  summarise(total=sum(total,na.rm=TRUE))%>%
  arrange(desc(total))

Find the weather events that are the most harmful to the economy.

In this dataset, the economic consequences of weather events are captured through property and crop damage. The actual values are econded in the “EXP” columns.

unique(weatherData$PROPDMGEXP)
##  [1] "K" "M" NA  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(weatherData$CROPDMGEXP)
## [1] NA  "M" "K" "m" "B" "?" "0" "k" "2"

Note that numbers, characters, and capitals all mixed together. Therefore, we need to write a function that transforms the column into a factor value of 10.

getVal <- function(expType) {
  if (expType %in% c('h', 'H')) {
    return(2)
  } else if (expType %in% c('k', 'K')) {
    return(3)
  } else if (expType %in% c('m', 'M')) {
    return(6)
  } else if (expType %in% c('b', 'B')) {
    return(9)
  } else if (suppressWarnings(!is.na(as.numeric(expType)))) {
    return(as.numeric(expType))
  } else {
   return(0)
  }
}

c(10**getVal('h'), 10**getVal(4), 10**getVal('B'), 10**getVal('?'))
## [1] 1e+02 1e+04 1e+09 1e+00

Make a table that applies the function and calculates the actual value of property and crop damage.

weatherData_new<-weatherData[,c(8,25:28)]%>%
  rowwise()%>%
  mutate(PROP = PROPDMG*10**getVal(PROPDMGEXP),
         CROP = CROPDMG*10**getVal(CROPDMGEXP))

head(weatherData_new)
## # A tibble: 6 × 7
## # Rowwise: 
##   EVTYPE  PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP  PROP  CROP
##   <chr>     <dbl> <chr>        <dbl> <chr>      <dbl> <dbl>
## 1 TORNADO    25   K                0 <NA>       25000     0
## 2 TORNADO     2.5 K                0 <NA>        2500     0
## 3 TORNADO    25   K                0 <NA>       25000     0
## 4 TORNADO     2.5 K                0 <NA>        2500     0
## 5 TORNADO     2.5 K                0 <NA>        2500     0
## 6 TORNADO     2.5 K                0 <NA>        2500     0
weatherData_new_sum<-weatherData_new[,c(1,6,7)]%>%
  group_by(EVTYPE)%>%
  summarise_all(sum)
head(weatherData_new_sum)
## # A tibble: 6 × 3
##   EVTYPE                PROP     CROP
##   <chr>                <dbl>    <dbl>
## 1 ?                     5000        0
## 2 ABNORMAL WARMTH          0        0
## 3 ABNORMALLY DRY           0        0
## 4 ABNORMALLY WET           0        0
## 5 ACCUMULATED SNOWFALL     0        0
## 6 AGRICULTURAL FREEZE      0 28820000

Find the top 10 event types that causes the most property damage.

topProperty <- weatherData_new_sum[order(weatherData_new_sum$PROP,decreasing = TRUE),]

Find the top 10 event types that causes the most crop damage

topCrop <- weatherData_new_sum[order(weatherData_new_sum$CROP,decreasing = TRUE),]

Find the top 10 event types that causes the most total damage

weatherData_new_sum$total <- rowSums(weatherData_new_sum[, 2:3])
topEcon <- weatherData_new_sum[order(weatherData_new_sum$total, decreasing = T), ]

Results

Find the weather events that cause the most injuries and fatalities.

  • Tornados caused the most fatalities, specifically 5,633 total deaths.
head(fatalities_weatherData)
## # A tibble: 6 × 2
##   EVTYPE         totalFATALITIES
##   <chr>                    <dbl>
## 1 TORNADO                   5633
## 2 EXCESSIVE HEAT            1903
## 3 FLASH FLOOD                978
## 4 HEAT                       937
## 5 LIGHTNING                  816
## 6 TSTM WIND                  504
  • Tornados also caused the most injuries, specifically 91,346.
head(injuries_weatherData)
## # A tibble: 6 × 2
##   EVTYPE         totalINJURIES
##   <chr>                  <dbl>
## 1 TORNADO                91346
## 2 TSTM WIND               6957
## 3 FLOOD                   6789
## 4 EXCESSIVE HEAT          6525
## 5 LIGHTNING               5230
## 6 HEAT                    2100

If adding the numbers of fatalities and injuries and ranked by the total number, the top 10 events that causes the most population health are as follows.

head(topHealth)
## # A tibble: 6 × 2
##   EVTYPE         total
##   <chr>          <dbl>
## 1 TORNADO        96979
## 2 EXCESSIVE HEAT  8428
## 3 TSTM WIND       7461
## 4 FLOOD           7259
## 5 LIGHTNING       6046
## 6 HEAT            3037

The following figure depicts top 10 event types that causes population health hazards (sum of fatalities and injuries)

ggplot(topHealth[1:10,],aes(reorder(EVTYPE,total), total))+
  geom_col(fill="darkred")+
  coord_flip()+
  labs(title = "Top 10 weather events that cause damage to public health",
       x = "Event type",
       y = "Total injuries and fatalities")

As the figure shows, tornados are by far the weather events that affect public health the most.

Find weather events that have the greatest economic consequences.

If separated by property and crop damages, the top 10 events that causes the most

  • The events that cause the most property damage are floods
head(topProperty)
## # A tibble: 6 × 3
##   EVTYPE                     PROP       CROP
##   <chr>                     <dbl>      <dbl>
## 1 FLOOD             144657709807  5661968450
## 2 HURRICANE/TYPHOON  69305840000  2607872800
## 3 TORNADO            56947380676.  414953270
## 4 STORM SURGE        43323536000        5000
## 5 FLASH FLOOD        16822723978. 1421317100
## 6 HAIL               15735267513. 3025954473
  • The events that cause the most crop damage are droughts.
head(topCrop)
## # A tibble: 6 × 3
##   EVTYPE               PROP        CROP
##   <chr>               <dbl>       <dbl>
## 1 DROUGHT       1046106000  13972566000
## 2 FLOOD       144657709807   5661968450
## 3 RIVER FLOOD   5118945500   5029459000
## 4 ICE STORM     3944927860   5022113500
## 5 HAIL         15735267513.  3025954473
## 6 HURRICANE    11868319010   2741910000
  • If combining the amounts lost in property and crop damage and ranked by the total number, events that causes the most economic damages are floods.
head(topEcon)
## # A tibble: 6 × 4
##   EVTYPE                     PROP       CROP         total
##   <chr>                     <dbl>      <dbl>         <dbl>
## 1 FLOOD             144657709807  5661968450 150319678257 
## 2 HURRICANE/TYPHOON  69305840000  2607872800  71913712800 
## 3 TORNADO            56947380676.  414953270  57362333946.
## 4 STORM SURGE        43323536000        5000  43323541000 
## 5 HAIL               15735267513. 3025954473  18761221986.
## 6 FLASH FLOOD        16822723978. 1421317100  18244041078.

The following figure depicts top 10 event types that causes economic hazards (sum of crops and property damages)

library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:readr':
## 
##     col_factor
ggplot(topEcon[1:10,], aes(reorder(EVTYPE, total), total)) +
  geom_col(fill = "purple") +
  coord_flip() +
  scale_y_continuous(labels = comma) +  # Use scale_y_continuous since total is on y-axis
  labs(
    title = "Top 10 Weather Events Causing Economic Damage",
    x = "Weather Event", 
    y = "Total Property and Crop Damage") 

As the figure shows, floods cause by far the biggest amount of property and crop damages.