Studies on the effect of weather events on population health as well as economic consequences in US based on NOAA database

1.Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

In this report,effect of weather events on personal as well as property damages was studied. Barplots were plotted seperately for the top 6 weather events that causes highest fatalities, highest injuries and highest economic loss. Results indicate that most fatalities and injuries were caused by Tornados, and the highest economic damages were caused by flood.

2. Data Processing

The data was downloaded from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 and saved on local computer. Then it was loaded on the R using the following code.

# Loading data into R
storm <- read.csv("repdata_data_StormData.csv")
head(storm)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

3. Results

Question #1 Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

library(plyr)
#summarize number of injuries and fatalities by different weather events
injuries = ddply(storm, .(EVTYPE), summarize, sum.injuries = sum(INJURIES,na.rm=TRUE))
injuries = injuries[order(injuries$sum.injuries, decreasing = TRUE), ]
head(injuries, 5)
##             EVTYPE sum.injuries
## 834        TORNADO        91346
## 856      TSTM WIND         6957
## 170          FLOOD         6789
## 130 EXCESSIVE HEAT         6525
## 464      LIGHTNING         5230

We see that tornado is the most harmful event with injuries. Top 6 events with the most injuries are represented in the below figure:

library(ggplot2)
ggplot(injuries[1:6, ], aes(EVTYPE, sum.injuries, fill = EVTYPE,alpha=0.5)) + geom_bar(stat = "identity") + 
  xlab("Event Type") + ylab("Number of Injuries") + ggtitle("Injuries by Event type") + coord_flip()

Now we check for the most fatalities events.

fatalities = ddply(storm, .(EVTYPE), summarize, sum = sum(FATALITIES))
fatalities = fatalities[order(fatalities$sum, decreasing = TRUE), ]
head(fatalities, 5)
##             EVTYPE  sum
## 834        TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153    FLASH FLOOD  978
## 275           HEAT  937
## 464      LIGHTNING  816

We see that it is tornado again with the most fatalities. Top 6 events with the most fatalities are represented in the below figure:

ggplot(fatalities[1:6, ], aes(EVTYPE, sum, fill=EVTYPE,alpha=0.3)) + geom_bar(stat = "identity") + 
  xlab("Event Type") + ylab("Number of Fatalities") + ggtitle("Fatalities by Event type") + coord_flip()

Question #2 Across the United States, which types of events have the greatest economic consequences?

#check for PROPDMGEXP and CROPDMGEXP
unique(storm$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(storm$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M
#As some have lower character, we convert them to upper character. Also we
#replace symbols other than character of numeric values to 0.
storm$PROPDMGEXP <- toupper(storm$PROPDMGEXP)
storm$PROPDMGEXP[storm$PROPDMGEXP %in% c("", "+", "-", "?")] = "0"
storm$CROPDMGEXP <- toupper(storm$CROPDMGEXP)
storm$CROPDMGEXP[storm$CROPDMGEXP %in% c("", "?")] = "0"
#Convert PROPDMGEXP and CROPDMGEXP
storm$PROPDMGEXP[storm$PROPDMGEXP %in% c("B")] = "9"
storm$PROPDMGEXP[storm$PROPDMGEXP %in% c("M")] = "6"
storm$PROPDMGEXP[storm$PROPDMGEXP %in% c("K")] = "3"
storm$PROPDMGEXP[storm$PROPDMGEXP %in% c("H")] = "2"
storm$CROPDMGEXP[storm$CROPDMGEXP %in% c("B")] = "9"
storm$CROPDMGEXP[storm$CROPDMGEXP %in% c("M")] = "6"
storm$CROPDMGEXP[storm$CROPDMGEXP %in% c("K")] = "3"
storm$CROPDMGEXP[storm$CROPDMGEXP %in% c("H")] = "2"
#calculate total damage by multiplying the damage by the corresponding exponent for PROP and CROP
storm$PROPDMGEXP <- 10^(as.numeric(storm$PROPDMGEXP))
damage.property = storm$PROPDMG * storm$PROPDMGEXP
storm=as.data.frame(cbind(storm,damage.property))
storm$CROPDMGEXP <- 10^(as.numeric(storm$CROPDMGEXP))
damage.crop = storm$CROPDMG * storm$CROPDMGEXP
storm=as.data.frame(cbind(storm,damage.crop))

Now we check for the events with most economic damages (both PROP and CROP).

#calculate PROP, CROP damages separately
Damage.property = ddply(storm, .(EVTYPE), summarize, damage.property = sum(damage.property, na.rm = TRUE))
Damage.property = Damage.property[order(Damage.property$damage.property, decreasing = T), ]
Damage.crop = ddply(storm, .(EVTYPE), summarize, damage.crop = sum(damage.crop, na.rm = TRUE))
Damage.crop = Damage.crop[order(Damage.crop$damage.crop, decreasing = T), ]

# calculate total damages of both
total.damage = damage.property + damage.crop
storm=as.data.frame(cbind(storm,total.damage))
Damage.total = ddply(storm, .(EVTYPE), summarize, damage.total = sum(total.damage, na.rm = TRUE))
Damage.total = Damage.total[order(Damage.total$damage.total, decreasing = T), ]
head(Damage.total)
##                EVTYPE damage.total
## 170             FLOOD 150319678257
## 411 HURRICANE/TYPHOON  71913712800
## 834           TORNADO  57362333947
## 670       STORM SURGE  43323541000
## 244              HAIL  18761221986
## 153       FLASH FLOOD  18243991079

We see that flood cause the most economic loss. Top 6 events with the most economic damages are represented in the below figure:

ggplot(Damage.total[1:6, ], aes(EVTYPE, damage.total, fill = EVTYPE, alpha=0.5)) + geom_bar(stat = "identity") + 
  xlab("Event Type") + ylab("Total damages") + ggtitle("Total damages by Event type") + coord_flip()