Assignment objectives

1.Across the United States, which types of events are most harmful with respect to population health?

2.Across the United States, which types of events have the greatest economic consequences?

Synopsis

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

We first load and skim through the data. Then we extract columns we need to solve the given questions. After extracting, transforming, and summarizing the data, we would be able to get the answers for the results. We would then plot the results into figures.

Data Processing

Preparing data. Storm Data Click the link above and save the file into your working directory.

Preparing R setup and loading libraries.

##As a Korean, had trouble loading csv. To solve the issue, needed to change langauage setting.
Sys.setlocale("LC_ALL", "English")
## [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
##libraries
library(plyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.8.0 (2020-02-14 07:10:20 UTC) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.23.0 successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## The following object is masked from 'package:R.methodsS3':
## 
##     throw
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## The following objects are masked from 'package:base':
## 
##     attach, detach, load, save
## R.utils v2.9.2 successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## The following object is masked from 'package:utils':
## 
##     timestamp
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, nullfile, parse,
##     warnings
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine

Loading the data.

if(!exists("stormdata")) {
  if(!file.exists("stormdata.csv")) {
    bunzip2("repdata_data_StormData.csv.bz2","stormdata.csv",remove=FALSE)
  }
stormdata<-read.csv("stormdata.csv", sep = ",", header = TRUE)
}

Examine the data

str(stormdata)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
head(stormdata)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6
summary(stormdata)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY       COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31.0   Class :character   Class :character   Class :character  
##  Median : 75.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100.6                                                           
##  3rd Qu.:131.0                                                           
##  Max.   :873.0                                                           
##                                                                          
##    BGN_RANGE          BGN_AZI           BGN_LOCATI          END_DATE        
##  Min.   :   0.000   Length:902297      Length:902297      Length:902297     
##  1st Qu.:   0.000   Class :character   Class :character   Class :character  
##  Median :   0.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :   1.484                                                           
##  3rd Qu.:   1.000                                                           
##  Max.   :3749.000                                                           
##                                                                             
##    END_TIME           COUNTY_END COUNTYENDN       END_RANGE       
##  Length:902297      Min.   :0    Mode:logical   Min.   :  0.0000  
##  Class :character   1st Qu.:0    NA's:902297    1st Qu.:  0.0000  
##  Mode  :character   Median :0                   Median :  0.0000  
##                     Mean   :0                   Mean   :  0.9862  
##                     3rd Qu.:0                   3rd Qu.:  0.0000  
##                     Max.   :0                   Max.   :925.0000  
##                                                                   
##    END_AZI           END_LOCATI            LENGTH              WIDTH         
##  Length:902297      Length:902297      Min.   :   0.0000   Min.   :   0.000  
##  Class :character   Class :character   1st Qu.:   0.0000   1st Qu.:   0.000  
##  Mode  :character   Mode  :character   Median :   0.0000   Median :   0.000  
##                                        Mean   :   0.2301   Mean   :   7.503  
##                                        3rd Qu.:   0.0000   3rd Qu.:   0.000  
##                                        Max.   :2315.0000   Max.   :4400.000  
##                                                                              
##        F               MAG            FATALITIES          INJURIES        
##  Min.   :0.0      Min.   :    0.0   Min.   :  0.0000   Min.   :   0.0000  
##  1st Qu.:0.0      1st Qu.:    0.0   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  Median :1.0      Median :   50.0   Median :  0.0000   Median :   0.0000  
##  Mean   :0.9      Mean   :   46.9   Mean   :  0.0168   Mean   :   0.1557  
##  3rd Qu.:1.0      3rd Qu.:   75.0   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  Max.   :5.0      Max.   :22000.0   Max.   :583.0000   Max.   :1700.0000  
##  NA's   :843563                                                           
##     PROPDMG         PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Min.   :   0.00   Length:902297      Min.   :  0.000   Length:902297     
##  1st Qu.:   0.00   Class :character   1st Qu.:  0.000   Class :character  
##  Median :   0.00   Mode  :character   Median :  0.000   Mode  :character  
##  Mean   :  12.06                      Mean   :  1.527                     
##  3rd Qu.:   0.50                      3rd Qu.:  0.000                     
##  Max.   :5000.00                      Max.   :990.000                     
##                                                                           
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
## 

By checking and skimming the data, we could find out that for the desired analysis we want to run, we only need to check following columns.

Required columns for analysis 1-EVTYPE : event type 2-FATALITIES,INJURIES : population health 3-PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP : economic results, Impact should be driven by multiplying numeric in PROPDMG and CROPDMG with K(1,000) and M(1,000,000) in PROPDMGEXP and CROPDMGEXP.

Impact on health

##Which type of events are most harmful to population health?
fatalities <- stormdata %>% select(EVTYPE,FATALITIES) %>% group_by(EVTYPE) %>% summarise(total = sum(FATALITIES)) %>% arrange(-total)
## `summarise()` ungrouping output (override with `.groups` argument)
head(fatalities)
## Warning: `...` is not empty.
## 
## We detected these problematic arguments:
## * `needs_dots`
## 
## These dots only exist to allow future extensions and should be empty.
## Did you misspecify an argument?
## # A tibble: 6 x 2
##   EVTYPE         total
##   <chr>          <dbl>
## 1 TORNADO         5633
## 2 EXCESSIVE HEAT  1903
## 3 FLASH FLOOD      978
## 4 HEAT             937
## 5 LIGHTNING        816
## 6 TSTM WIND        504
injuries <- stormdata %>% select(EVTYPE,INJURIES) %>% group_by(EVTYPE) %>% summarise(total = sum(INJURIES)) %>% arrange(-total)
## `summarise()` ungrouping output (override with `.groups` argument)
head(injuries)
## Warning: `...` is not empty.
## 
## We detected these problematic arguments:
## * `needs_dots`
## 
## These dots only exist to allow future extensions and should be empty.
## Did you misspecify an argument?
## # A tibble: 6 x 2
##   EVTYPE         total
##   <chr>          <dbl>
## 1 TORNADO        91346
## 2 TSTM WIND       6957
## 3 FLOOD           6789
## 4 EXCESSIVE HEAT  6525
## 5 LIGHTNING       5230
## 6 HEAT            2100

Imapct on economy: We need to set change the data a little bit. The rules are as follows, according to the link; (1)K,k - 10^3 (2)M,m - 10^6 (3)H,h - 10^2 (4)B,b - 10^9 (5)-,NA,? - 0

stormdata$PROPDMGEXP<-tolower(stormdata$PROPDMGEXP)
stormdata$PROPDMGEXP <- gsub("k",1000,stormdata$PROPDMGEXP)
stormdata$PROPDMGEXP <- gsub("m",100000,stormdata$PROPDMGEXP)
stormdata$PROPDMGEXP <- gsub("h",100,stormdata$PROPDMGEXP)
stormdata$PROPDMGEXP <- gsub("b",1000000000,stormdata$PROPDMGEXP)
stormdata$PROPDMGACTUAL <- stormdata$PROPDMG * as.numeric(stormdata$PROPDMGEXP)
## Warning: NAs introduced by coercion
stormdata$PROPDMGACTUAL[is.na(stormdata$PROPDMGACTUAL)] <- 0
stormdata$CROPDMGEXP<-tolower(stormdata$CROPDMGEXP)
stormdata$CROPDMGEXP <- gsub("k",1000,stormdata$CROPDMGEXP)
stormdata$CROPDMGEXP <- gsub("m",1000000,stormdata$CROPDMGEXP)
stormdata$CROPDMGEXP <- gsub("h",100,stormdata$CROPDMGEXP)
stormdata$CROPDMGEXP <- gsub("b",1000000000,stormdata$CROPDMGEXP)
stormdata$CROPDMGACTUAL <- stormdata$CROPDMG * as.numeric(stormdata$CROPDMGEXP)
## Warning: NAs introduced by coercion
stormdata$CROPDMGACTUAL[is.na(stormdata$CROPDMGACTUAL)] <- 0

stormdata$economic.dmg <- stormdata$CROPDMGACTUAL + stormdata$PROPDMGACTUAL

dmg.total <- stormdata %>% select(EVTYPE, economic.dmg) %>% group_by(EVTYPE) %>% summarise(total = sum(economic.dmg)) %>% arrange(-total)
## `summarise()` ungrouping output (override with `.groups` argument)
head(dmg.total)
## Warning: `...` is not empty.
## 
## We detected these problematic arguments:
## * `needs_dots`
## 
## These dots only exist to allow future extensions and should be empty.
## Did you misspecify an argument?
## # A tibble: 6 x 2
##   EVTYPE                   total
##   <chr>                    <dbl>
## 1 FLOOD             131168416250
## 2 HURRICANE/TYPHOON  68490229800
## 3 STORM SURGE        42653104000
## 4 DROUGHT            14079927000
## 5 TORNADO            13725802101
## 6 RIVER FLOOD        10053724500

Results

1.Across the United States, which types of events are most harmful with respect to population health?

According to the suggested table above, we could see that Tornado is the most harmful with respect to population health. It is most dangerous to both fatality and injury. We could check it more easily with the following figure.

fatality_plot <- ggplot() + geom_bar(data = fatalities[1:5,], aes(x = EVTYPE, 
    y = total, fill = interaction(total, EVTYPE)), stat = "identity", 
    show.legend = F) + theme(axis.text.x = element_text(angle = 30, hjust = 1)) + 
    xlab("Events") + ylab("No. of fatailities") + ggtitle("Top 5 weather events causing fatalities") + 
    theme(axis.text.x = element_text(angle = 30, hjust = 1))


injuries_plot <- ggplot() + geom_bar(data = injuries[1:5,], aes(x = EVTYPE, y = total, 
    fill = interaction(total, EVTYPE)), stat = "identity", show.legend = F) + 
    theme(axis.text.x = element_text(angle = 30, hjust = 1)) + xlab("Events") + 
    ylab("No. of Injuries") + ggtitle("Top 5 weather events causing Injuries") + 
    theme(axis.text.x = element_text(angle = 30, hjust = 1))

grid.arrange(fatality_plot, injuries_plot, ncol = 2)

2.Across the United States, which types of events have the greatest economic consequences?

It could be easily seen from the table above, that flood has the greatest ecoomic consequences. We could see it more clearly with the following figure below.

ggplot() + geom_bar(data = dmg.total[1:5,], aes(x = EVTYPE, y = total, fill = interaction(total, 
    EVTYPE)), stat = "identity", show.legend = F) + theme(axis.text.x = element_text(angle = 30, 
    hjust = 1)) + xlab("Events") + ylab("Total Damage")