SYNOPSIS

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data analysis address the below concerns:

ASSUMPTIONS ON WHICH ANALYSIS IS BASED

DATA PROCESSING

#Install the Required Packages
#install.packages("dplyr"); install.packages("ggplot2"); install.packages("gridExtra")

#Loading Requierd Libraries
library(dplyr); library(ggplot2); library(gridExtra);
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
#Extracting data file StormData.csv 
s <- read.csv("StormData.csv",sep = ",", header = TRUE)
#Number of Columns and Rows in the StormData.csv database
dim(s)
## [1] 902297     37
#Name of the Cloumns  in the StormData.csv datase
names(s)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

For this analysis we will be consdering Seven Columns as below. The respective column numbers are mentioned in brackets

ETYTPE(8), FATALITIES(23), INJURIES(24), PROPDMG(25), PROPDMGEXP(26), CROPDMG(27), CROPDMGEXP(28)

#Fetching only the required Columns
s<-s[,c(8,23:28)]
head(s)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0
  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Data Processing Related to Population Health

Considering that Population Health is affected by “FATALITIES” and “INJURIES”

##Data Processing for FATALITIES (one of reasons affecting Poplulation Health)
agg_fatalities <- aggregate(FATALITIES~EVTYPE, s, FUN=sum, na.rm=TRUE)
agg_fatalities <- arrange(agg_fatalities, desc(FATALITIES))
agg_fatalities5 <- agg_fatalities[1:5,]
head(agg_fatalities5)
##           EVTYPE FATALITIES
## 1        TORNADO       5633
## 2 EXCESSIVE HEAT       1903
## 3    FLASH FLOOD        978
## 4           HEAT        937
## 5      LIGHTNING        816
#Script to plot Total Fatalities with respect to Event Type
ft <- ggplot(agg_fatalities5, aes(x=reorder(EVTYPE, -FATALITIES), y=FATALITIES,FILL=EVTYPE))
ft <- ft + geom_bar(aes(fill=EVTYPE),stat="identity")
ft <- ft +  geom_text(aes(label=round(FATALITIES,0), hjust=0.5, vjust=-0.7), size=3.5)
ft <- ft +  labs(x = "\n EVENT TYPE \n")
ft <- ft +  labs(y = "\n FATALITIES \n")
ft <- ft +  labs(title = "\n EVENTS CAUSING FATALITIES \n (TOP 5)\n")
ft <- ft +  coord_cartesian(ylim=c(0,6000))
ft <- ft +  theme(axis.text.x = element_text(angle=45, hjust=1, size=10, face = "bold"))
ft <- ft +  theme(legend.position="none")
ft <- ft +  theme(plot.title = element_text(size=11, face = "bold"))
##Data Processing for INJURIES (one of reasons affecting Poplulation Health)
agg_injuries <- aggregate(INJURIES~EVTYPE, s, FUN=sum, na.rm=TRUE)
agg_injuries <- arrange(agg_injuries, desc(INJURIES))
agg_injuries5 <- agg_injuries[1:5,]
head(agg_injuries5)
##           EVTYPE INJURIES
## 1        TORNADO    91346
## 2      TSTM WIND     6957
## 3          FLOOD     6789
## 4 EXCESSIVE HEAT     6525
## 5      LIGHTNING     5230
###Script to plot Total Injuries with respect to Event Type
ij <- ggplot(agg_injuries5, aes(x=reorder(EVTYPE, -INJURIES), y=INJURIES,FILL=EVTYPE))
ij <- ij + geom_bar(aes(fill=EVTYPE),stat="identity")
ij <- ij + geom_text(aes(label=round(INJURIES,0),  hjust=0.5, vjust=-0.7),size=3.5)
ij <- ij + labs(x="\n EVENT TYPE \n")
ij <- ij + labs(y="\n INJURIES \n")
ij <- ij + labs(title = "\n EVENTS CAUSING INJURIES \n (TOP 5)\n")
ij <- ij + coord_cartesian(ylim=c(0,100000))
ij <- ij + theme(axis.text.x = element_text(angle=45, hjust=1, size=10, face = "bold"))
ij <- ij + theme(legend.position="none")
ij <- ij + theme(plot.title = element_text(size=11, face = "bold"))
#Panel Plot for Events affecting Population Health
grid.arrange(ft, ij, ncol=2)

###Data Processing  to merge the data sets of Total Fatalities and Total Injuries
merge_health <- merge(agg_fatalities, agg_injuries, all=TRUE)
merge_health[is.na(merge_health)] <- 0
merge_health <- mutate(merge_health, TOTAL = FATALITIES + INJURIES)
merge_health <- arrange(merge_health, desc(TOTAL))
#Top Five Events affecting Public Health (Fatalities and Injuries)
head(merge_health,5)
##           EVTYPE FATALITIES INJURIES TOTAL
## 1        TORNADO       5633    91346 96979
## 2 EXCESSIVE HEAT       1903     6525  8428
## 3      TSTM WIND        504     6957  7461
## 4          FLOOD        470     6789  7259
## 5      LIGHTNING        816     5230  6046
#Fetching Top 5 Events
merge_health <- merge_health[1:5,]

###Script to plot Total Fatalities and Injuries with respect to Event Type
ph <- ggplot(merge_health, aes(x=reorder(EVTYPE, -TOTAL), y=TOTAL))
ph <- ph + geom_bar(aes(fill=EVTYPE),stat="identity")
ph <- ph + geom_text(aes(label=round(TOTAL,0),  hjust=0.5, vjust=-0.7),size=3.5)
ph <- ph + labs(x="\n EVENT TYPE \n")
ph <- ph + labs(y="\n TOTAL AFFECTED\n")
ph <- ph + labs(title = "\n Top 5 Events Affecting Public Health \n (Considering FATALITIES and INJURIES together)\n")
ph <- ph + coord_cartesian(ylim=c(0,110000))
ph <- ph + theme(axis.text.x = element_text(angle=45, hjust=1, size=12, face = "bold"))
ph <- ph + theme(legend.position="none")
ph <- ph + theme(plot.title = element_text(size=14, face = "bold"))

#Plot for Top 5 Events Affecting Public Health
ph

  1. Across the United States, which types of events have the greatest economic consequences

Data Processing related to Economic Consequences

Considering economic Consequences are because of Property Damage(“PROPDMG”) and Crop Damage(“CROPDMG”)

#Data Processing for Property Damage (PROPDMG)

#List of characters in the columns PROPDMGEXP
table(s$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330
p <- filter(s, PROPDMGEXP %in% c("h", "H", "k", "K","m","M","b", "B"))
table(p$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
##      0      0      0      0      0      0      0      0      0      0 
##      6      7      8      B      h      H      K      m      M 
##      0      0      0     40      1      6 424665      7  11330
#Replacing "h", "H", "k", "K","m","M","b", "B" by their numerical counterparts
p$PROPDMG <- ifelse((p$PROPDMGEXP =="h"| p$PROPDMGEXP =="H"),p$PROPDMG*10^2,p$PROPDMG)
p$PROPDMG <- ifelse((p$PROPDMGEXP =="k"| p$PROPDMGEXP =="K"),p$PROPDMG*10^3,p$PROPDMG)
p$PROPDMG <- ifelse((p$PROPDMGEXP =="m"| p$PROPDMGEXP =="M"),p$PROPDMG*10^6,p$PROPDMG)
p$PROPDMG <- ifelse((p$PROPDMGEXP =="b"| p$PROPDMGEXP =="B"),p$PROPDMG*10^9,p$PROPDMG)

#Aggrgating Total Property Damage by Event Type
agg_propdmg <- aggregate(PROPDMG~EVTYPE, p, FUN=sum, na.rm=TRUE)

#Sorting the data so that Event causing Maximum Property Damage is on Top
agg_propdmg <- arrange(agg_propdmg, desc(PROPDMG))

#Selecting the top 5 Events causing maximum Property Damage
agg_propdmg5 <- agg_propdmg[1:5,]
#Top Five Events Causing Property Damage
head(agg_propdmg5)
##              EVTYPE      PROPDMG
## 1             FLOOD 144657709800
## 2 HURRICANE/TYPHOON  69305840000
## 3           TORNADO  56937160480
## 4       STORM SURGE  43323536000
## 5       FLASH FLOOD  16140811510
#Data Processing for Crop Damage (CROPDMG)

#List of characters in the columns CROPDMGEXP
table(s$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994
c <- filter(s, CROPDMGEXP %in% c("h", "H", "k", "K","m","M","b", "B"))

#Replacing "h", "H", "k", "K","m","M","b", "B" by their numerical counterparts
c$CROPDMG <- ifelse((c$CROPDMGEXP =="h"| c$CROPDMGEXP =="H"),c$CROPDMG*10^2,c$CROPDMG)
c$CROPDMG <- ifelse((c$CROPDMGEXP =="k"| c$CROPDMGEXP =="K"),c$CROPDMG*10^3,c$CROPDMG)
c$CROPDMG <- ifelse((c$CROPDMGEXP =="m"| c$CROPDMGEXP =="M"),c$CROPDMG*10^6,c$CROPDMG)
c$CROPDMG <- ifelse((c$CROPDMGEXP =="b"| c$CROPDMGEXP =="B"),c$CROPDMG*10^9,c$CROPDMG)

##Aggrgating Total Crop Damage by Event Type
agg_cropdmg <- aggregate(CROPDMG~EVTYPE, c, FUN=sum, na.rm=TRUE)

#Sorting the data so that Event causing maximum Crop Damage is on Top
agg_cropdmg <- arrange(agg_cropdmg, desc(CROPDMG))

#Selecting the top 5 Events causing maximum Crop Damage
agg_cropdmg5 <- agg_cropdmg[1:5,]

#Top Five Events Causing Crop Damage
head(agg_cropdmg5)
##        EVTYPE     CROPDMG
## 1     DROUGHT 13972566000
## 2       FLOOD  5661968450
## 3 RIVER FLOOD  5029459000
## 4   ICE STORM  5022113500
## 5        HAIL  3025954450
##Data Processing to combine FATALITIES and INJURIES to get the TOTAL

#Merging the data sets for Population Damage and Crop Damage
merge_economic <- merge(agg_propdmg, agg_cropdmg, all=TRUE)
merge_economic[is.na(merge_economic)] <- 0
merge_economic <- mutate(merge_economic, TOTAL = PROPDMG + CROPDMG)
merge_economic <- arrange(merge_economic, desc(TOTAL))
#Top Five Events affecting Public Health (Fatalities and Injuries)
head(merge_economic,5)
##              EVTYPE      PROPDMG    CROPDMG        TOTAL
## 1             FLOOD 144657709800 5661968450 150319678250
## 2 HURRICANE/TYPHOON  69305840000 2607872800  71913712800
## 3           TORNADO  56937160480  414953110  57352113590
## 4       STORM SURGE  43323536000       5000  43323541000
## 5              HAIL  15732267220 3025954450  18758221670
#Fetching Top 5 Events
merge_economic <- merge_economic[1:5,]

###Script to plot Total Fatalities and Injuries with respect to Event Type
ec <- ggplot(merge_economic, aes(x=reorder(EVTYPE, -TOTAL), y=TOTAL/10^9,FILL=EVTYPE))
ec <- ec + geom_bar(aes(fill=EVTYPE),stat="identity")
ec <- ec + geom_text(aes(label=round(TOTAL/10^9,0),  hjust=0.5, vjust=-0.7),size=3.5)
ec <- ec + labs(x="\n EVENT TYPE \n")
ec <- ec + labs(y="\n TOTAL DAMAGE (in Billion $) \n")
ec <- ec + labs(title = "\n Top 5 Events With Greatest Economic Consequences \n (Considering PROPERTY and CROP Damage together)\n")
ec <- ec + coord_cartesian(ylim=c(0,165))
ec <- ec + theme(axis.text.x = element_text(angle=45, hjust=1, size=11, face = "bold"))
ec <- ec + theme(legend.position="none")
ec <- ec + theme(plot.title = element_text(size=14, face = "bold"))

#Plot for Top 5 Events With Greatest Economic Consequences
ec

RESULTS

  1. Across the United States, most harmful types of events (in order) with respect to Population Health considering both FATALITIES and INJURIES are :

    1. TORNADO (96,979)

    2. EXCESSIVE HEAT (8,428)

    3. TSTM WIND (7,461)

    4. FLOOD (7,259)

    5. LIGHTNING (6,046)

  2. Across the United States, types of events (in order) that have the greatest Economic Consequences considering both Property Damage and Crop Damage are :

    1. FLOOD (150 Billion Dollars)

    2. HURRICANE/TYPHONE (72 Billion Dollars)

    3. TORNADO (57 Billion Dollars)

    4. STORM SURGE (43 Billion Dollars)

    5. HAIL (18 Billion Dollars)