1. Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

This assignment will examine and analyse the database to determine the answers to the following questions

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

2. Data Processing

Loading the library packages

require(plyr)
require(data.table)

Data Preparation

Setting the working directory and preparing the database to be used for the analysis.

setwd("C:/Users/rohidah/Google Drive/Data Science/Reproducible Research/Assignment 2")
datas <- read.csv("repdata-data-StormData.csv.bz2")
datas$EVTYPE = toupper(datas$EVTYPE)

Checking the dimension of the data set to give ideas on information available in the data set.

dim(datas)
## [1] 902297     37
head(datas)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

Extraction of Useful Information

From the above tables, we are extracting only information useful for our analysis namely data from the following columns: EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG and CROPDMGEXP

newdataset <- datas[,c("EVTYPE","FATALITIES","INJURIES","PROPDMG", "PROPDMGEXP","CROPDMG","CROPDMGEXP")]

Tidy Up Data

Quite a few event types are recorded differently even though they are of similar event. Thus this section will do some tidying up to the event type data to ensure consistencies and accuracies of the events.

newdataset[newdataset$EVTYPE == "TSTM WIND", ]$EVTYPE = "THUNDERSTORM WIND"
newdataset[newdataset$EVTYPE == "THUNDERSTORM WINDS", ]$EVTYPE = "THUNDERSTORM WIND"
newdataset[newdataset$EVTYPE == "RIVER FLOOD", ]$EVTYPE = "FLOOD"
newdataset[newdataset$EVTYPE == "FLASH FLOODING", ]$EVTYPE = "FLASH FLOOD"
newdataset[newdataset$EVTYPE == "FLOOD/FLASH FLOOD",]$EVTYPE = "FLASH FLOOD"
newdataset[newdataset$EVTYPE == "HURRICANE/TYPHOON", ]$EVTYPE = "HURRICANE-TYPHOON"
newdataset[newdataset$EVTYPE == "HURRICANE", ]$EVTYPE = "HURRICANE-TYPHOON"
head(newdataset)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0

Cost Calculation on Damages

As the value of the damages are recorded using an indicator where H - Hudred, K - thousands, M - millions, B - Billions, a calculation of the cost needs to be done so that the costing reflects the correct values.

calcdmg <- function(dmg, dmgexp) dmg * switch(toupper(dmgexp), H=100, K=1000, M=1000000, B=1000000000, 1)
newdataset$pdmg <- mapply(calcdmg, newdataset$PROPDMG, newdataset$PROPDMGEXP)
newdataset$cdmg <- mapply(calcdmg, newdataset$CROPDMG, newdataset$CROPDMGEXP)

Checking Final Dataset

The below code is just a check point to ensure all data are filled and there is no data with NA.

sum (is.na (newdataset))
## [1] 0

Analysing Impact to Human Health & Safety

To analyse the impact of the events towards human health and safety, a new colum called totalfatin is created to record the sum of both fatalities and injuries.

fatin <- mutate(newdataset, totalfatin = FATALITIES + INJURIES)

The data set is then filtered to create a subset called fatin2, containing only event type and the total fatal and injuries for ease refence.

fatin2 <- aggregate(totalfatin ~ EVTYPE, fatin, sum)

The subset fatin2 is then sorted to displat the top 5 severe events

top5fatin <- head(fatin2[order(fatin2$totalfatin, decreasing=TRUE), ], 5)

Analysing Impact Towards Econonomy

To analyse the impact of severe weathers towards economy, new colum called cpdmg is created to record the total damages to crop as well as properties.

cpdmg <-mutate(newdataset, totalcpdmg = cdmg + pdmg) 

The data set is then filtered to create a subset called cpdmg2, containing only event type and the total damages for ease refence.

cpdmg2 <- aggregate(totalcpdmg ~ EVTYPE, cpdmg, sum)

The subset fatin2 is then sorted to display the top 5 severe wheather events that led to bad economic consequences

top5cpdmg <- head(cpdmg2[order(cpdmg2$totalcpdmg, decreasing=TRUE), ], 5)

3. Results

From the above analysis, 2 graphs are plotted. One is a graph displaying 5 most harmful events to human health while another one is a graph depicting the most harmful event that lead to bas economic consequences.

barplot(top5fatin$totalfatin, main="Top 5 Most Most Harmful Events to Human Health", xlab="events",  ylab="total fatalities", col =rainbow(5))
legend("topright", top5fatin$EVTYPE, cex = 0.6, fill=rainbow(5))

barplot(top5cpdmg$totalcpdmg, main="Top 5 Most events lead to greatest economic consequences", xlab="events", ylab="total damages(USD)", col =rainbow(5))
legend("topright", top5cpdmg$EVTYPE, cex = 0.6, fill=rainbow(5))

As such, the following conclusions can be made:

  1. The most harmful event that leads to human fatalities is TORNADO.
  2. The most harmful event that leads to bad economic consequences is FLOOD.