Synopsis

Severe weather events, such as Winter Storm, Tsunami and Tornado are known to cause fatalities, injuries, property damage, etc. The U.S. National Oceanic and Atmospheric Administration (NOAA) storm database lists 48 names that most accurately describe the meteorological events leading to such severe weather events. The primary aim of this project was to address the following two questions.

  1. Across the United States, which types of events are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Our results show that, between 1950 and 2011, of the 48 events that cause harm, Tornado stands out top place culprit for Fatalities, Injuries as well as Property Damage, when compared to other event types. Excessive Heat and Flash Flood are the second and third place causes of Fatalities. Flood and Excessive Heat are second and third place causes of Injuries. Hail, Flash Flood and Flood are blamed for Crop Damage (in that order).

Data Processing

Reading in the storm data to R

stormdata<-read.table("repdata-data-StormData.csv.bz2",sep = ",", header = TRUE)

This is a huge data with 902,297 observations and 37 variables.

dim(stormdata)
## [1] 902297     37

Checking some of the names of the data suggests formatting of names to fit for R-Processing

cnames <- readLines("repdata-data-StormData.csv.bz2", 1)
cnames <- strsplit(cnames, ",", fixed = TRUE)
names(stormdata) <- make.names(cnames[[1]])
names(stormdata)<-gsub("X.|[.]","", names(stormdata))
head(stormdata[,3:10])
##   BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE BGN_RANGE BGN_AZI
## 1     0130       CST     97     MOBILE    AL TORNADO         0        
## 2     0145       CST      3    BALDWIN    AL TORNADO         0        
## 3     1600       CST     57    FAYETTE    AL TORNADO         0        
## 4     0900       CST     89    MADISON    AL TORNADO         0        
## 5     1500       CST     43    CULLMAN    AL TORNADO         0        
## 6     2000       CST     77 LAUDERDALE    AL TORNADO         0

Missing values are common problems in many data analysis and severe storm events are not exceptions. We therefore check the proportion of observations with missing values.

mean(is.na(stormdata))  ## Are missing values important in the analysis to follow? 
## [1] 0.0523

About 5.2 percent of the rows have missing values. Since this proportion is relatively small, we will ignore missing values.This number does not account for missing values other than those reported as NAs.

Reading in the 48 valid names of events

We read to R the file which has the 48 valid names of events,which was downloaded from here, to our working directory, using the following code

event_table<-read.csv("event_table.csv",sep=";")
head(event_table)
##                  EVTYPE Designator
## 1 Astronomical Low Tide         Z 
## 2             Avalanche         Z 
## 3              Blizzard         Z 
## 4         Coastal Flood         Z 
## 5       Cold/Wind Chill         Z 
## 6           Debris Flow         C

One can see that the stormdata and event_table share a common variable EVTYPE. However, in order to merge the events of the two datasets by the same id, EVTYPE, we need to change the events in the EVTYPE column of the data_table to upper case and create a new data.

event_table$EVTYPE<-toupper(event_table$EVTYPE)
data_mgd<-merge(stormdata,event_table,by="EVTYPE")

We will do the rest of our analysis using latest tidy data, data_mgd.

Results

The variables that will address the question with regard to population health are FATALITIES and INJURIES.

The variables that will address the question with regard to economic consequences are Property Damage (listed as PROPDMG ) and Crop Damage (listed as CROPDMG ).

Fatalities due to severe weather events

Here, we will choose the two columns we want (EVTYPE and FATALITIES ), create an aggregated data-frame with total number of fatalities by event type (EVTYPE) across US and report the ten highest number of fatalities by event type (EVTYPE ).

data_fat<-data_mgd[,c(1,23)]
fat <- aggregate(FATALITIES~EVTYPE, data_fat, sum, rm.na=TRUE)
fat<-fat[order(fat$FATALITIES,decreasing = TRUE),]
head(fat,10)
##            EVTYPE FATALITIES
## 38        TORNADO       5634
## 11 EXCESSIVE HEAT       1904
## 13    FLASH FLOOD        979
## 19           HEAT        938
## 27      LIGHTNING        817
## 14          FLOOD        471
## 32    RIP CURRENT        369
## 23      HIGH WIND        249
## 2       AVALANCHE        225
## 45   WINTER STORM        207

The total number of Fatalities between \(1950\) and \(2011\) due to such events may be estimated to be at least

sum(fat$FATALITIES)
## [1] 13101

Following is a Bar chart of the total number of Fatalities between \(1950\) and \(2011\) by the top \(10\) harmful Events.

library(lattice)
barchart(EVTYPE~FATALITIES, data = fat[1:10,],main="Barchart: Fatalities by Top 10 Events")

plot of chunk unnamed-chunk-9

Injuries due to severe weather events

Here, we will choose the two columns we want (EVTYPE and INJURIES), create an aggregated data-frame with total number of fatalities by event type (EVTYPE) across US and report the ten highest number of injuries by event type (EVTYPE).

data_inj<-data_mgd[,c(1,24)]
inj <- aggregate(INJURIES~EVTYPE, data_inj, sum, rm.na=TRUE)
inj<-inj[order(inj$INJURIES,decreasing = TRUE),]
head(inj,10)
##               EVTYPE INJURIES
## 38           TORNADO    91347
## 14             FLOOD     6790
## 11    EXCESSIVE HEAT     6526
## 27         LIGHTNING     5231
## 19              HEAT     2101
## 24         ICE STORM     1976
## 13       FLASH FLOOD     1778
## 37 THUNDERSTORM WIND     1489
## 18              HAIL     1362
## 45      WINTER STORM     1322

The total number of Injuries between 1950 and 2011 due to such events may be estimated to be at least

sum(inj$INJURIES)
## [1] 126736

Following is a Bar-chart of the total number of Injuries between 1950 and 2011 by the top 10 harmful Events for Injuries.

library(lattice)
barchart(EVTYPE~INJURIES, data = inj[1:10,],main="Barchart: Injuries by Top 10 Events")

plot of chunk unnamed-chunk-12

Property Dammage due to severe weather events

For the purpose of analyzing property damage, we will choose the three columns we want (EVTYPE, PROPDMG and PROPDMGEXP), create an aggregated data-frame with total number of property damage by event type (EVTYPE) across US and report the ten highest number of damages by event type (EVTYPE). We also remark here that for the purpose of this analysis, we used the column PROPDMGEXPas a multiplying factor for `PROPDMG’ to reflect the actual expenses as follows: \(B=10^6, M=10^3,\dots \), etc., as indicated in the following codes.

data_pd<-data_mgd[,c(1,25,26)]
data_pd$PROPDMG[data_pd$PROPDMGEXP=="B"]<-(10^9)*data_pd$PROPDMG[data_pd$PROPDMGEXP=="B"]
data_pd$PROPDMG[data_pd$PROPDMGEXP=="M"]<-(10^6)*data_pd$PROPDMG[data_pd$PROPDMGEXP=="M"]
data_pd$PROPDMG[data_pd$PROPDMGEXP=="m"]<-(10^6)*data_pd$PROPDMG[data_pd$PROPDMGEXP=="m"]
data_pd$PROPDMG[data_pd$PROPDMGEXP=="K"]<-(10^3)*data_pd$PROPDMG[data_pd$PROPDMGEXP=="K"]
data_pd$PROPDMG[data_pd$PROPDMGEXP=="H"]<-100*data_pd$PROPDMG[data_pd$PROPDMGEXP=="H"]
data_pd$PROPDMG[data_pd$PROPDMGEXP=="h"]<-100*data_pd$PROPDMG[data_pd$PROPDMGEXP=="h"]
pd<- aggregate(PROPDMG~EVTYPE, data_pd, sum, rm.na=TRUE)
pd<-pd[order(pd$PROPDMG,decreasing = TRUE),]
head(pd,10)
##               EVTYPE PROPDMG
## 38           TORNADO 3212259
## 13       FLASH FLOOD 1420126
## 14             FLOOD  899939
## 37 THUNDERSTORM WIND  876845
## 18              HAIL  688694
## 27         LIGHTNING  603353
## 23         HIGH WIND  324733
## 45      WINTER STORM  132722
## 21        HEAVY SNOW  122253
## 44          WILDFIRE   84460

The total Property Damage between 1950 and 2011 due to such events may be estimated to be at least (in US Dollars):

sum(pd$PROPDMG)
## [1] 8709094

Crop Dammage due to severe weather events

Here, the three columns we want are EVTYPE, PROPDMG and PROPDMGEXP. We create an aggregated data-frame with total number of property damage by event type (EVTYPE) across US and report the ten highest number of damages by event type (EVTYPE). We also remark here that

data_cd<-data_mgd[,c(1,27,28)]
data_cd$CROPDMG[data_cd$CROPDMGEXP=="B"]<-(10^9)*data_cd$CROPDMG[data_cd$CROPDMGEXP=="B"]
data_cd$CROPDMG[data_cd$CROPDMGEXP=="M"]<-(10^6)*data_cd$CROPDMG[data_cd$CROPDMGEXP=="M"]
data_cd$CROPDMG[data_cd$CROPDMGEXP=="m"]<-(10^6)*data_cd$CROPDMG[data_cd$CROPDMGEXP=="m"]
data_cd$CROPDMG[data_cd$CROPDMGEXP=="K"]<-(10^10)*data_cd$CROPDMG[data_cd$CROPDMGEXP=="K"]
data_cd$CROPDMG[data_cd$CROPDMGEXP=="h"]<-1000*data_cd$CROPDMG[data_cd$CROPDMGEXP=="h"]
data_cd$CROPDMG[data_cd$CROPDMGEXP=="H"]<-100*data_cd$CROPDMG[data_cd$CROPDMGEXP=="H"]
cd<- aggregate(CROPDMG~EVTYPE, data_cd, sum, rm.na=TRUE)
cd<-cd[order(cd$CROPDMG,decreasing = TRUE),]
head(cd,10)
##               EVTYPE CROPDMG
## 18              HAIL  579597
## 13       FLASH FLOOD  179201
## 14             FLOOD  168039
## 38           TORNADO  100020
## 37 THUNDERSTORM WIND   66792
## 8            DROUGHT   33900
## 23         HIGH WIND   17284
## 20        HEAVY RAIN   11124
## 16      FROST/FREEZE    7035
## 40    TROPICAL STORM    5900
dim(cd)
## [1] 46  2

The total Crop Damage between 1950 and 2011 due to such events may be estimated to be at least (in US Dollars):

sum(cd$CROPDMG)
## [1] 1188839

Combined Property and Crop Dammage due to severe weather events

To see the combined effects of Property and Crop Damage, we merge the data-frames used before

total<-merge(pd,cd,by="EVTYPE")
head(total)
##                  EVTYPE PROPDMG CROPDMG
## 1 ASTRONOMICAL LOW TIDE     321       1
## 2             AVALANCHE    1625       1
## 3              BLIZZARD   25319     173
## 4         COASTAL FLOOD   12612       1
## 5       COLD/WIND CHILL    1991     601
## 6             DENSE FOG    8226       1

Then we create a new column combined for the combined damage and report the top ten culprits with table and bar-chart.

total$combined<-total$PROPDMG+total$CROPDMG
total<-total[order(total$combined,decreasing = TRUE),]
head(total,10)
##               EVTYPE PROPDMG CROPDMG combined
## 38           TORNADO 3212259  100020  3312279
## 13       FLASH FLOOD 1420126  179201  1599327
## 18              HAIL  688694  579597  1268292
## 14             FLOOD  899939  168039  1067978
## 37 THUNDERSTORM WIND  876845   66792   943638
## 27         LIGHTNING  603353    3582   606934
## 23         HIGH WIND  324733   17284   342017
## 45      WINTER STORM  132722    1980   134702
## 21        HEAVY SNOW  122253    2167   124420
## 44          WILDFIRE   84460    4365    88826
library(lattice)
barchart(EVTYPE~combined, data = total[1:10,],xlab="Property and Crop Damage in US Dollars",main="Barchart: Property and Crop Dammage by Top 10 Events")

plot of chunk unnamed-chunk-18