Reproducible Research Week 2: Assignment 2

Severe weather events affecting public health & economy in US

Based on data from the U.S. National Oceanic and Atmospheric Administration’s(NOAA) database

Synopsis

Analysis of severe weather phenomena ranging from 1950 till November 2011. The results show that

  • Tornadoes are responsible for more fatalities and injuries
  • Floods cause greater economic loses
    • Droughts are causing high economic damages to Crops

Downloading and procesing data

library(data.table)
download.file('https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2','NOAA.csv.bz2', method='curl')
data <- as.data.table(read.csv("NOAA.csv.bz2"))

Checking the characteristics of data variables in the data table

str(data)
## Classes 'data.table' and 'data.frame':   902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "000","0000","0001",..: 152 167 2645 1563 2524 3126 122 1563 3126 3126 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 826 826 826 826 826 826 826 826 826 826 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels ""," Christiansburg",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels ""," CANTON"," TULIA",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","+","-","0",..: 16 16 16 16 16 16 16 16 16 16 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","0","2","?",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","\t","\t\t",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
##  - attr(*, ".internal.selfref")=<externalptr>

Considering data from January 2000 onwards. Changing the variable BGN_DATE (class: factor) to date format.

data$BGN_DATE <- as.POSIXct(strptime(as.character(data$BGN_DATE), "%m/%d/%Y %H:%M:%S"))

Subsetting the data based on time period (January 2000 to November 2011)

required.data <- subset(data, data$BGN_DATE > as.POSIXct("1999-12-31"))

Results

Q1. Across the United States, which types of events are most harmful with respect to population health?

casualties <- required.data[, lapply(.SD, sum), by = EVTYPE, .SDcols = c("FATALITIES", 
    "INJURIES")]
summary(casualties)
##                    EVTYPE      FATALITIES         INJURIES      
##     HIGH SURF ADVISORY:  1   Min.   :   0.00   Min.   :    0.0  
##   FLASH FLOOD         :  1   1st Qu.:   0.00   1st Qu.:    0.0  
##   TSTM WIND           :  1   Median :   0.00   Median :    0.0  
##   WATERSPOUT          :  1   Mean   :  30.58   Mean   :  179.2  
##  ABNORMALLY DRY       :  1   3rd Qu.:   2.25   3rd Qu.:    5.0  
##  ABNORMALLY WET       :  1   Max.   :1193.00   Max.   :15213.0  
##  (Other)              :190

Plotting the variables Fatalities and Injuries against each other on a scatterplot, we’ll get to see the most harmful events on the upper right side of the plot

par(mfrow = c(1, 1))
plot(casualties$FATALITIES, casualties$INJURIES, ylim = c(0, 16000), xlim = c(0, 
    1300), pch = 19, col = rgb(0, 1, 0, 0.5), cex = 1.5, xlab = "number of fatalities", 
    ylab = "number of injuries", main = "Fatalities and Injuries due to severe weather events")
casualties.sub <- casualties[which(casualties$FATALITIES > 300)]
text(casualties.sub$FATALITIES + 50, casualties.sub$INJURIES + 500, labels = casualties.sub$EVTYPE, 
    cex = 0.75)

Figure 1: Scatter plot with the number of injuries and fatalities due to severe weather events between 2000 and 2011.

casualties.sub
##            EVTYPE FATALITIES INJURIES
## 1:    FLASH FLOOD        600      812
## 2:        TORNADO       1193    15213
## 3: EXCESSIVE HEAT       1013     3708
## 4:      LIGHTNING        466     2993
## 5:    RIP CURRENT        340      208

Conclusion: From the plot, we can see that there are 5 events more harmful than the others, causing more than 3000 fatalities. These are tornadoes, excessive heat, lightning, flash floods rip currents.

Among these, tornadoes are the most devastating and harmful weather event, having caused around 1200 fatalities and over 15000 injuries during this period.

Q2. Across the United States, which types of events have the greatest economic consequences?

For this question, we have selected the following four variables

  • Property Damage
    • PROPDMG
    • PROPDMGEXP
  • Crop Damage
    • CROPDMG
    • CROPDMGEXP

Now we generate numerical value to denote the combined values (in billions USD) of these damages (Property + Crop).

library(car)
required.data$PROPDMG.B <- recode(as.character(required.data$PROPDMGEXP), "'K'=1e-6; 'M'=1e-3;'B'=1;''=0")
required.data$PROPDMG.B <- required.data$PROPDMG.B * required.data$PROPDMG
required.data$CROPDMG.B <- recode(as.character(required.data$CROPDMGEXP), "'K'=1e-6; 'M'=1e-3;'B'=1;''=0")
required.data$CROPDMG.B <- required.data$CROPDMG.B * required.data$CROPDMG

Now we create the subset of the data to visualize events causing major economic loses during this period.

event.cost <- required.data[, lapply(.SD, sum), by = EVTYPE, .SDcols = c("PROPDMG.B", 
    "CROPDMG.B")]
summary(event.cost)
##                    EVTYPE      PROPDMG.B           CROPDMG.B     
##     HIGH SURF ADVISORY:  1   Min.   :  0.00000   Min.   :0.0000  
##   FLASH FLOOD         :  1   1st Qu.:  0.00000   1st Qu.:0.0000  
##   TSTM WIND           :  1   Median :  0.00000   Median :0.0000  
##   WATERSPOUT          :  1   Mean   :  1.68774   Mean   :0.1204  
##  ABNORMALLY DRY       :  1   3rd Qu.:  0.00157   3rd Qu.:0.0000  
##  ABNORMALLY WET       :  1   Max.   :134.69107   Max.   :9.1356  
##  (Other)              :190

Now we plot the variables Crop damage against Property damage to understand which weather phenomena resulted in the highest economic loss.

plot(event.cost$PROPDMG.B, event.cost$CROPDMG.B, ylim = c(0, 10), xlim = c(0, 
    150), pch = 19, col = rgb(0, 1, 0, 0.5), cex = 1.5, xlab = "Property Damage (Billions of USD)", 
    ylab = "Crop Damage(Billions of USD)", main = "Economic losses due to severe weather phenomena")
event.cost.sub <- event.cost[which(event.cost$PROPDMG.B > 40)]
text(event.cost.sub$PROPDMG.B, event.cost.sub$CROPDMG.B + 0.5, labels = event.cost.sub$EVTYPE, 
    cex = 0.75)

Figure 2. Scatter plot to show the economic loss cause due to severe weather phenomena in the time period (January 2000 to November 2011)

event.cost.sub
##               EVTYPE PROPDMG.B CROPDMG.B
## 1:       STORM SURGE  43.17094  0.000000
## 2:             FLOOD 134.69107  4.221934
## 3: HURRICANE/TYPHOON  69.30584  2.607873

Conclusion: Severe weather events viz. Storm Surge, Hurricanes / Typhoons & Floods cause severe economic loses to the US, with Floods causing loses to the north of 135 Bn USD. While Floods are causing overall economic loss, Droughts are causing a very economic damages to crops.