1.Prologue

The basic goal of this project is to explore the NOAA Storm Database and answer some basic questions about severe weather events. This analysis consists of tables, figures, or other summaries to help readers understand the findings. The program is written using R language, however, you don’t need to know R to be able to successfully comprehend this study.

Introduction: The rate of severe weather events have been more frequent in present days. Overwhelming number of studies show that Global Warming is to blame for such extreme events and the loss of property and lives associated with them. This study studies the how public health and economy get affected due to the storms and other severe weather events.

The data for this study came from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks when and where the major storms and other weather events took place in the United States, including the number of fatalities (if any), injuries and property damage.

2.Research Questions

This study established following research questions:

These reports are prepared for government or municipal managers who are responsible for preparing for severe weather events and who need to prioritize resources for different types of events. However, this report refrains from making specific recommendations.

Invoking Required Libraries

library(knitr)
library(markdown)
library(rmarkdown)
library(plyr)
library(stats)

Getting Directory Straight

getwd()
## [1] "C:/Users/nirma/Documents/Coursera/Data Science/Reproducible Research"

3.Loading the Storm Dataset and understanding its structure and components

stormdata<-read.table("C:/Users/nirma/Documents/GitHub/RepRes_Project2/repdata_data_StormData.csv.bz2",header = TRUE,sep=",")
dim(stormdata)
## [1] 902297     37
head(stormdata)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6
str(stormdata)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
names(stormdata)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Wrangling the Data

Variables of Interest and their Meanings

myVar<-c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP",
               "CROPDMG","CROPDMGEXP")
newdata<-stormdata[myVar]
dim(newdata)
## [1] 902297      7
names(newdata)
## [1] "EVTYPE"     "FATALITIES" "INJURIES"   "PROPDMG"    "PROPDMGEXP"
## [6] "CROPDMG"    "CROPDMGEXP"

4.Assessing Damages

a.Calculating Property Damage by Severe Weather Events

The statistics of property damage was mentioned in the column ‘PROPDMGEXP’. To assess the damage following steps were taken:

  • Find the Property Damage Exponent and Level using ‘unique ()’ syntax

  • Exclude invalid data like “+”, “-”, and “?”

  • Assign values for the reported damage

unique(newdata$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
newdata$PROPDMGEXP<-mapvalues(newdata$PROPDMGEXP, from= c("K","M","","B","m","+","0","5","6","?","4","2","3","h","7","H","-","1","8"), to =c(10^3,10^6,1,10^9,10^6,0,1,10^5,10^6,0,10^4,10^2,10^3,10^2,10^7,10^2,0,10,10^8))
newdata$PROPDMGEXP<-as.numeric(as.character(newdata$PROPDMGEXP))
newdata$PROPDMGETOTAL<-(newdata$PROPDMG*newdata$PROPDMGEXP)/1000000000

b.Calculating Crop Damage by Severe Weather Events

The statistics of Crop damage was mentioned in the column ‘CROPDMGEXP’. To assess the damage following steps were taken:

  • Find the Crop Damage Exponent and Level using ‘unique ()’ syntax

  • Exclude invalid data like “?”

  • Assign values for the reported damage

unique(newdata$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"
newdata$CROPDMGEXP<-mapvalues(newdata$CROPDMGEXP, from= c("","M", "K", "m", "B", "?", "0", "k","2"), to = c(1,10^6, 10^3, 10^6, 10^9, 0, 1, 10^3, 10^2))
newdata$CROPDMGEXP<-as.numeric(as.character(newdata$CROPDMGEXP))
newdata$CROPDMGETOTAL<-(newdata$CROPDMG*newdata$CROPDMGEXP)/1000000000

5. Results and Analyses

a.Research Question.1. Across the United States, which types of events are most harmful with respect to population health?

To assess the population health consequences of the severe weather events we need to check the number of fatalities and injuries caused by such events.

TotFatalities<-aggregate(FATALITIES~EVTYPE, data=newdata, FUN="sum")
TotInjuries<-aggregate(INJURIES~EVTYPE,data=newdata, FUN="sum")
dim(TotFatalities)
## [1] 985   2
dim(TotInjuries)
## [1] 985   2
#Ordering total Fatalities and injuries caused by the top 10 weather events
dedly10fatal<-TotFatalities[order(-TotFatalities$FATALITIES), ][1:10,]
dim(dedly10fatal)
## [1] 10  2
most10injuries<-TotInjuries[order(-TotInjuries$INJURIES), ][1:10,]
dim(most10injuries)
## [1] 10  2
dedly10fatal
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224
most10injuries
##                EVTYPE INJURIES
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361
#Plotting 10 Deadliest and most injury causing Weather Events in the United States
par(mfrow=c(1,2),mar=c(12,4,3,2),mgp=c(2.4, 0.8, 0),cex=0.6,bg='gray')
barplot(dedly10fatal$FATALITIES, names.arg=dedly10fatal$EVTYPE, las=2, main="Top Ten Dedliest Weather Events in the U.S.", ylab="Number of Fatalities", col="blue")
barplot(most10injuries$INJURIES, names.arg=most10injuries$EVTYPE, las=2, main="Top Ten Weather Events In Terms of Total Injuries", ylab="Number of Injuries", col="blue")

Findings

Tornedo is the cause of most fatalities and injuries in the United States. Excessive heat causes the second highest death, however, it’s causes fourth highest injuries.

b.Research Question.2. Across the United States, which types of events have the greatest economic consequences?

It is no brainier that we assess the total property damage and crop damage and add them to find the direct economic consequences of severe weather events.

#Identifying top 10 weather events in terms of property and crop damage
propertydamage<-aggregate(PROPDMGETOTAL~EVTYPE,data=newdata, FUN="sum")
dim(propertydamage)
## [1] 985   2
cropdamage<-aggregate(CROPDMGETOTAL~EVTYPE,data=newdata, FUN="sum")
dim(cropdamage)
## [1] 985   2
#Identifying top 10 weather events in terms of property  and crop damage
top10propdamage<-propertydamage[order(-propertydamage$PROPDMGETOTAL), ][1:10,]
top10propdamage
##                EVTYPE PROPDMGETOTAL
## 170             FLOOD    144.657710
## 411 HURRICANE/TYPHOON     69.305840
## 834           TORNADO     56.947381
## 670       STORM SURGE     43.323536
## 153       FLASH FLOOD     16.822674
## 244              HAIL     15.735268
## 402         HURRICANE     11.868319
## 848    TROPICAL STORM      7.703891
## 972      WINTER STORM      6.688497
## 359         HIGH WIND      5.270046
top10cropdamage<-cropdamage[order(-cropdamage$CROPDMGETOTAL), ][1:10,]
top10cropdamage
##                EVTYPE CROPDMGETOTAL
## 95            DROUGHT     13.972566
## 170             FLOOD      5.661968
## 590       RIVER FLOOD      5.029459
## 427         ICE STORM      5.022113
## 244              HAIL      3.025954
## 402         HURRICANE      2.741910
## 411 HURRICANE/TYPHOON      2.607873
## 153       FLASH FLOOD      1.421317
## 140      EXTREME COLD      1.292973
## 212      FROST/FREEZE      1.094086

Plotting top ten property damage and crop damage weather events

par(mfrow=c(1,2),mar=c(12,4,3,2),mgp=c(2.4, 0.8, 0),cex=0.6,bg='gray')
barplot(top10propdamage$PROPDMGETOTAL, names.arg=top10propdamage$EVTYPE, las=2, main="Top Ten Weather Events Causing Property Damage", ylab="Total Property Damage", col="blue")

barplot(top10cropdamage$CROPDMGETOTAL, names.arg=top10cropdamage$EVTYPE, las=2, main="Top Ten Weather Events Causing Weather Events", ylab="Total Crop Damage", col="blue")

Findings

** Flood is by far the cause of the most property damage in the United States followed by Hurricane/Typhone, Tornedo, Storm Surge, Flash Flood. However, Drought is the biggest cause of the crop loss. The impact of flood is almost half of the impact of draught.**