Impact of environmental disastar on human life and reserouces

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

I have explored the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database suggested by this Assignment page. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Download required resource file repdata-data-StormData.csv from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2.

Store the data file in the ./data directory under working directory.

Load necessary library files.

library(knitr)
## Warning: package 'knitr' was built under R version 3.2.4
library(ggplot2)

Data Preprocessing

Load storm data from repdata-data-StormData.csv file.

fileUrl <- "data\\repdata-data-StormData.csv"
stormData <- read.csv(file = fileUrl)

Check the dimensions of the data. Dimensions of the data.

dim(stormData)
## [1] 902297     37

Names of the attributes.

names(stormData)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Check if there is missing or NA values in the data.

sum(is.na(stormData))
## [1] 1745947

For our analysis we have to select only the a few of the attributes. Out of 37 attributes, we will require only 8 attributes. Select the subset from the storm data.

stormD <- stormData[,c("STATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP","CROPDMG", "CROPDMGEXP")]

Lets check the dimension of this sub-set.

dim(stormD)
## [1] 902297      8

Also, lets see a few values in the subset.

kable(rbind(head(stormD), tail(stormD)))
STATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
1 AL TORNADO 0 15 25.0 K 0
2 AL TORNADO 0 0 2.5 K 0
3 AL TORNADO 0 2 25.0 K 0
4 AL TORNADO 0 2 2.5 K 0
5 AL TORNADO 0 2 2.5 K 0
6 AL TORNADO 0 6 2.5 K 0
902292 TN WINTER WEATHER 0 0 0.0 K 0 K
902293 WY HIGH WIND 0 0 0.0 K 0 K
902294 MT HIGH WIND 0 0 0.0 K 0 K
902295 AK HIGH WIND 0 0 0.0 K 0 K
902296 AK BLIZZARD 0 0 0.0 K 0 K
902297 AL HEAVY SNOW 0 0 0.0 K 0 K

Compact structure summary of the sub-set is.

str(stormD)
## 'data.frame':    902297 obs. of  8 variables:
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...

Missing values in the sub-set.

sum(is.na(stormD))
## [1] 0

States in the data set.

unique(stormD$STATE)
##  [1] AL AZ AR CA CO CT DE DC FL GA HI ID IL IN IA KS KY LA ME MD MA MI MN
## [24] MS MO MT NE NV NH NJ NM NY NC ND OH OK OR PA RI SC SD TN TX UT VT VA
## [47] WA WV WI WY PR AK ST AS GU MH VI AM LC PH GM PZ AN LH LM LE LS SL LO
## [70] PM PK XX
## 72 Levels: AK AL AM AN AR AS AZ CA CO CT DC DE FL GA GM GU HI IA ID ... XX

Question (1) :

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

We have shown,

        a. Event wise fatality statistics
        b. State wise fatality statistics
        c. Event and State wise fatality statistics
        
        d. Event wise injury statistics
        e. State wise injury statistics
        f. Event and State wise injury statistics
        

Event wise fatality statistics

  1. Calculate sum of FATALITIES grouped by EVTYPE and save it in fatal.
  2. Oder fatal w.r.t FATALITIES.
fatal <- aggregate (FATALITIES~EVTYPE, stormD, sum)
fatal <- fatal [order(fatal$FATALITIES, decreasing=TRUE),]

The number one event having the largest fatality is, TORNADO.

ggplot(fatal[1:10,], aes(EVTYPE[1:10], FATALITIES[1:10]))+
    geom_bar(stat="identity", aes(fill = EVTYPE[1:10]))+
    theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
    labs(x = "Event Type", y = "Fatality Count for Events",
         title = "Top 10 Fatality causing Events")

State wise fatality statistics

  1. Calculate sum of FATALITIES grouped by STATE and save it in fatalStatewise.
  2. Oder fatalStatewise w.r.t FATALITIES.
fatalStatewise <- aggregate (FATALITIES~STATE, stormD, sum)

fatalStatewise <- fatalStatewise[order(fatalStatewise$FATALITIES, decreasing = TRUE),]

State having largest number of fatality, IL.

ggplot(fatalStatewise[1:10,], aes(x = STATE, y = FATALITIES))+
    geom_bar(stat = "identity", aes(fill = STATE))+
    theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
    labs(x = "State", y = "Fatality Count",
         title = "Top 10 States having highers Fatality")

Event and State wise fatality statistics

  1. Calculate sum of FATALITIES grouped by EVTYPE and STATE and save it in fatalState.
  2. Oder fatalState w.r.t FATALITIES.
fatalState <- aggregate (FATALITIES~EVTYPE+STATE, stormD, sum)

fatalState <- fatalState[order(fatalState$FATALITIES, decreasing = TRUE),]

firstGroup <- fatalState[1:10,]
firstGroup$EVT_STATE <- paste(firstGroup$EVTYPE, firstGroup$STATE,sep = "@")

Most fatality caused by an event is HEAT in IL.

ggplot(firstGroup, aes(x = EVT_STATE, y = FATALITIES))+
    geom_bar(stat = "identity", aes(fill = EVT_STATE))+
    theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
    labs(x = "Event@State", y = "Fatality Count",
         title = "Top 10 Individual Events In one State having highers Fatality")

Event wise injury statistics

  1. Calculate sum of INJURIES grouped by EVTYPE and STATE and save it in injur.
  2. Oder injur w.r.t FATALITIES.
injur <- aggregate (INJURIES~EVTYPE, stormD, sum)
injur <- injur [order(injur$INJURIES, decreasing=TRUE),]

The number one event having the largest injury is, TORNADO.

ggplot(injur[1:10,], aes(EVTYPE[1:10], INJURIES[1:10]))+
    geom_bar(stat="identity", aes(fill = EVTYPE[1:10]))+
    theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
    labs(x = "Event Type", y = "Injury Count for Events",
         title = "Top 10 Injury causing Events")

State wise injury statistics

  1. Calculate sum of INJURIES grouped by STATE and STATE and save it in injurStatewise.
  2. Oder injurStatewise w.r.t FATALITIES.
injurStatewise <- aggregate (INJURIES~STATE, stormD, sum)
dim(injurStatewise)
## [1] 72  2
injurStatewise <- injurStatewise[order(injurStatewise$INJURIES, decreasing = TRUE),]

State having largest number of injury, TX.

ggplot(injurStatewise[1:10,], aes(x = STATE, y = INJURIES))+
    geom_bar(stat = "identity", aes(fill = STATE))+
    theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
    labs(x = "State", y = "Injury Count",
         title = "Top 10 States having highers Injury")

Event and State wise injury statistics

  1. Calculate sum of INJURIES grouped by EVTYPE+STATE and STATE and save it in injurState.
  2. Oder injurState w.r.t FATALITIES.
injurState <- aggregate (INJURIES~EVTYPE+STATE, stormD, sum)
dim(injurState)
## [1] 4258    3
injurState <- injurState[order(injurState$INJURIES, decreasing = TRUE),]

firstGroup <- injurState[1:10,]
firstGroup$EVT_STATE <- paste(firstGroup$EVTYPE, firstGroup$STATE,sep = "@")

Most injury caused by an event is TORNADO in TX.

ggplot(firstGroup, aes(x = EVT_STATE, y = INJURIES))+
    geom_bar(stat = "identity", aes(fill = EVT_STATE))+
    theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
    labs(x = "State", y = "Fatality Count",
         title = "Top 10 Individual Events In one State having highers Injury")

Question(2) :

Across the United States, which types of events have the greatest economic consequences?

The property damage value is given in two columns. PROPDMG contains a value and PROPDMGEXP is an exponent. In order to calculate any property damage by for a specific row, we have to use, PROPDMG*10^PROPDMGEXP equation. Similarly, for crop damage, the two necessary values are given in CROPDMG and CROPDMGEXP columns.

The PROPDMGEXP values are,

sort(unique(stormD$PROPDMGEXP))
##  [1]   - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M

Calculate the necessary multiplication factors.

symbol <- c("", "+", "-", "?", 0:9, "h", "H", "k", "K", "m", "M", "b", "B");
factor <- c(rep(0,4), 0:9, 2, 2, 3, 3, 6, 6, 9, 9)
multiplier <- data.frame (symbol, factor)

Calculate property damage by PROPDMG*10^PROPDMGEXP. Similarly calculate crop damage by CROPDMG*10^CROPDMGEXP. Total will be the sum of property damage and crop damage.

stormD$damage.prop <- stormD$PROPDMG*10^multiplier[match(stormD$PROPDMGEXP,multiplier$symbol),2]
stormD$damage.crop <- stormD$CROPDMG*10^multiplier[match(stormD$CROPDMGEXP,multiplier$symbol),2]
stormD$damage <- stormD$damage.prop + stormD$damage.crop

We showed,

a. Event wise damage statistics
b. State wise damage statistics
c. Event and State wise damage statistics

Event wise damage statistics

  1. Calculate sum of dagame grouped by EVTYPE and save it in damage.
  2. Oder damage w.r.t damage in billions.
damage <- aggregate (damage~EVTYPE, stormD, sum);
damage$bilion <- damage$damage / 1e9;
damage <- damage [order(damage$bilion, decreasing=TRUE),]

The number one event having the largest damage (in billion) is, FLOOD.

ggplot(damage[1:10,], aes(EVTYPE, bilion))+
    geom_bar(stat="identity", aes(fill = EVTYPE))+
    theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
    labs(x = "Event Type", y = "Damage in Billion Dollar",
         title = "Top 10 Damage (in Billion) causing Events")

State wise damage statistics

  1. Calculate sum of dagame grouped by STATE and save it in damageStatewise.
  2. Oder Statewise w.r.t damage in billions.
damageStatewise <- aggregate (damage~STATE, stormD, sum);
damageStatewise$bilion <- damageStatewise$damage / 1e9;
damageStatewise <- damageStatewise[order(damageStatewise$bilion, decreasing=TRUE),]

State having largest number of damage (in billion), CA.

ggplot(damageStatewise[1:10,], aes(STATE, bilion))+
    geom_bar(stat="identity", aes(fill = STATE))+
    theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
    labs(x = "State", y = "Fatality Count",
         title = "Top 10 States having highers Damage")

Event and State wise damage statistics

  1. Calculate sum of dagame grouped by EVTYPE+State and save it in damageState.
  2. Oder damageState w.r.t damage in billions.
damageState <- aggregate (damage~EVTYPE+STATE, stormD, sum);
damageState$bilion <- damageState$damage / 1e9;
damageState <- damageState[order(damageState$bilion, decreasing=TRUE),]

firstGroup <- damageState[1:10,]
firstGroup$EVT_STATE <- paste(firstGroup$EVTYPE, firstGroup$STATE,sep = "@")

Most damage (in billion) caused by an event is FLOOD in CA.

ggplot(firstGroup, aes(EVT_STATE, bilion))+
    geom_bar(stat="identity", aes(fill = EVT_STATE))+
    theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
    labs(x = "State", y = "Fatality Count",
         title = "Top 10 Individual Events In one State having highers Damage")

Conclusion

Different event has different impact on different states. Tornado is number one fatality and injury causing event. Interestingly, the answer to the question, “which state has the highers fatality count?” is Illinois, where as, Texas has the most injuries. Also, Heat and heat realted events caused higher number of fatalities in the state of Illinois, which is, for a single event on a state causing highest number of fatalities. In case of injury, Texas also takes the most toll from Tornado. So, in short, Tornado is the number one danger in terms of fatality and injury.

In terms of damage (both property and crop damage), Flood is the most dangerous of all. And among all the states, California is affected the most by flood. The second most damaging event is Hurricane or Typhoon, which has much impact in the state of Florida. So, some states reuire more safety measures and awareness to save lives. In order to minimize damages, some states require proper strategy and resource management.