Synopsis

This is a report presenting the anaysis of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database with respect to population health and economic consequences. The data analysed is from 1950 to November 2011. The report contains the analysis and the corresponding codes used to perform the analysis. The data was processed and analysed using various r packages. The storm data contains 902297 observations of 37 variables. With 5633 deaths Tornado causes the most fatalities among all the weather events followed by excessive heat and flash floods with 1903 and 978 deaths respectively. In causing injuries, not surprisingly Tornado causes the most injuries with 91346 reported cases and is followed by TSTM WIND and FLOOD with 6957 and 6789 reported cases respectively. In terms of economic damages including property and crop damages the most devastating event is Floods, causing damages amounting $144.6 billion totally.

1 Data Processing

filename<-"repdata_data_StormData.csv"
   if (!file.exists(filename)){
      fileURL<- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
      download.file(fileURL, filename)
      }
stormdata<-read.csv(bzfile("repdata_data_StormData.csv.bz2"), sep=",", header = TRUE)

1.1 Understanding the data layout

str(stormdata)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
names(stormdata)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

1.2 Data Reshapping/Pruning

subStormdat <- stormdata[ , c('EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG', 'PROPDMGEXP', 
                         'CROPDMG', 'CROPDMGEXP')]

1.3 Analysis to find the most harmful event causing maximum human fatalities across US

head(subStormdat)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0

1.3.1 Aggregating fatalities against each event type

fatalitys <- aggregate(FATALITIES ~ EVTYPE, data = subStormdat, sum)
fatalitys <- fatalitys[fatalitys$FATALITIES > 0, ]
fatalitys <- fatalitys[order(fatalitys$FATALITIES, decreasing = TRUE), ][1:10, ]
head(fatalitys)
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504

1.3.2 Aggregating injuries caused by various event type

injurys <- aggregate(INJURIES ~ EVTYPE, data = subStormdat, sum)
injurys <- injurys[injurys$INJURIES > 0, ]
injurys <- injurys[order(injurys$INJURIES, decreasing = TRUE), ][1:10, ]
head(injurys)
##             EVTYPE INJURIES
## 834        TORNADO    91346
## 856      TSTM WIND     6957
## 170          FLOOD     6789
## 130 EXCESSIVE HEAT     6525
## 464      LIGHTNING     5230
## 275           HEAT     2100

1.4 Feeling the factor levels for Property Damages estimates

unique(subStormdat$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M

By studying the above output of the individual factor levels and the codebook, it’s clear that (i) variable has both upper- and lower case letters and (ii) that K equals to thousands, M equals to millions, and B equals to billions. To achieve the analysis, we need to get the real values back by converting back the exponential values

1.4.1 Converting exponential to actual values for property damage

subStormdat[subStormdat$PROPDMGEXP == "K", ]$PROPDMG <- subStormdat[subStormdat$PROPDMGEXP == "K", ]$PROPDMG * 1000
subStormdat[subStormdat$PROPDMGEXP == "M", ]$PROPDMG <- subStormdat[subStormdat$PROPDMGEXP == "M", ]$PROPDMG * 1e+06
subStormdat[subStormdat$PROPDMGEXP == "m", ]$PROPDMG <- subStormdat[subStormdat$PROPDMGEXP == "m", ]$PROPDMG * 1e+06
subStormdat[subStormdat$PROPDMGEXP == "B", ]$PROPDMG <- subStormdat[subStormdat$PROPDMGEXP == "B", ]$PROPDMG * 1e+09

1.4.2 Aggregating the property damage across event type

propDmg<- aggregate(PROPDMG ~ EVTYPE, subStormdat, sum)
propDmg <- propDmg[propDmg$PROPDMG > 0,]
propDmg <- propDmg[order(propDmg$PROPDMG, decreasing = TRUE),][1:10,]
head(propDmg)
##                EVTYPE      PROPDMG
## 170             FLOOD 144657709807
## 411 HURRICANE/TYPHOON  69305840000
## 834           TORNADO  56937160779
## 670       STORM SURGE  43323536000
## 153       FLASH FLOOD  16140812067
## 244              HAIL  15732267048

1.4.3 Feeling the factor levels for estimating crop damages

unique(subStormdat$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M

1.4.4 Converting exponential to actual values for crop damage

subStormdat[subStormdat$CROPDMGEXP == "K", ]$CROPDMG <- subStormdat[subStormdat$CROPDMGEXP == "K", ]$CROPDMG * 1000
subStormdat[subStormdat$CROPDMGEXP == "k", ]$CROPDMG <- subStormdat[subStormdat$CROPDMGEXP == "k", ]$CROPDMG * 1000
subStormdat[subStormdat$CROPDMGEXP == "M", ]$CROPDMG <- subStormdat[subStormdat$CROPDMGEXP == "M", ]$CROPDMG * 1e+06
subStormdat[subStormdat$CROPDMGEXP == "m", ]$CROPDMG <- subStormdat[subStormdat$CROPDMGEXP == "m", ]$CROPDMG * 1e+06
subStormdat[subStormdat$CROPDMGEXP == "B", ]$CROPDMG <- subStormdat[subStormdat$CROPDMGEXP == "B", ]$CROPDMG * 1e+09

1.4.5 Aggregating the Crop damage across event type

cropDmg <- aggregate(CROPDMG ~ EVTYPE, subStormdat, sum)
cropDmg <- cropDmg[cropDmg$CROPDMG > 0,]
cropDmg <- cropDmg[order(cropDmg$CROPDMG, decreasing = TRUE),][1:10,]
head(cropDmg)
##          EVTYPE     CROPDMG
## 95      DROUGHT 13972566000
## 170       FLOOD  5661968450
## 590 RIVER FLOOD  5029459000
## 427   ICE STORM  5022113500
## 244        HAIL  3025954473
## 402   HURRICANE  2741910000

1.4.6 Arriving at total damages

totalDmg <- merge(propDmg, cropDmg, by = "EVTYPE")
totalDmg$total <- totalDmg$PROPDMG + totalDmg$CROPDMG
totalDmg <- totalDmg[order(totalDmg$total, decreasing = TRUE),][1:5,]
head(totalDmg)
##              EVTYPE      PROPDMG    CROPDMG        total
## 2             FLOOD 144657709807 5661968450 150319678257
## 5 HURRICANE/TYPHOON  69305840000 2607872800  71913712800
## 3              HAIL  15732267048 3025954473  18758221521
## 1       FLASH FLOOD  16140812067 1421317100  17562129167
## 4         HURRICANE  11868319010 2741910000  14610229010

2 Results :

Further analysis involves plotting the exploratory graphs to reveal the answers to the questions asked in the assignment: 1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? 2.Across the United States, which types of events have the greatest economic consequences?

2.1 Plots for number of fatalities/injuries caused by various events in the US

# Plotting the Number of Fatalities By the Most Harmful Event Types
fatalitys$EVTYPE <- factor(fatalitys$EVTYPE, levels = fatalitys$EVTYPE)
gfat<-ggplot(head(fatalitys, 10), aes(reorder(EVTYPE, FATALITIES), FATALITIES))  + 
  geom_bar(stat = "identity", fill = rainbow(n=length(fatalitys$FATALITIES))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) + coord_flip()+
  xlab("EVENT TYPE") + ylab("FATALITIES") +
  ggtitle("Fatalities by event type across the US")
injurys$EVTYPE <- factor(injurys$EVTYPE, levels = injurys$EVTYPE)
ginj<-ggplot(head(injurys, 10), aes(reorder(EVTYPE, INJURIES), INJURIES))  + 
  geom_bar(stat = "identity", fill = rainbow(n=length(injurys$INJURIES))) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) + coord_flip()+
  xlab("EVENT TYPE") + ylab("INJURIES") +
  ggtitle("Injuries by event type across the US")
grid.arrange(gfat, ginj, nrow = 2)
grid.text("Fig 1 : Plot of top ten harmful events leading to most fatalaties and injuries in the US",      x = unit(0.5, "npc"), y = unit(0, "npc"),vjust = -0.10, gp = gpar(cex=0.80))

intersect(fatalitys[1:10, 1], injurys[1:10, 1])
## [1] "TORNADO"        "EXCESSIVE HEAT" "FLASH FLOOD"    "HEAT"          
## [5] "LIGHTNING"      "TSTM WIND"      "FLOOD"

Analysis reveals that “TORNADO” is the most harmful event in both variables across the US

2.2 Plots for property/crop damage and total damage caused by various events across the US

2.2.1 Plots for property/crop damage

propDmg$EVTYPE <- factor(propDmg$EVTYPE, levels = propDmg$EVTYPE)
gprop<-ggplot(head(propDmg, 10), aes(reorder(EVTYPE, PROPDMG), PROPDMG)) + 
    geom_bar(stat = "identity", fill = rainbow(n=length(propDmg$PROPDMG))) + coord_flip()+
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Property Damages (US$)") +
  ggtitle("Damages to Property by Weather Events across the US")
cropDmg$EVTYPE <- factor(cropDmg$EVTYPE, levels = cropDmg$EVTYPE)
gcrop<-ggplot(head(cropDmg, 10), aes(reorder(EVTYPE, CROPDMG), CROPDMG)) + 
    geom_bar(stat = "identity", fill = rainbow(n=length(cropDmg$CROPDMG))) + coord_flip()+
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Crop Damages (US$)") +
  ggtitle("Damages to Crop by Weather Events across the US")
grid.arrange(gprop, gcrop, nrow = 2)
grid.text("Fig 2 : Plot of top ten harmful events leading to property & crop damages in the US",      x = unit(0.5, "npc"), y = unit(0, "npc"),vjust = -0.10, gp = gpar(cex=0.80))

intersect(propDmg[1:10, 1], cropDmg[1:10, 1])
## [1] "FLOOD"             "HURRICANE/TYPHOON" "FLASH FLOOD"      
## [4] "HAIL"              "HURRICANE"

Analysis reveals that “FLOOD” is the most economically damaging weather event across the US

2.2.2 Plots for total damages

totalDmg$EVTYPE <- factor(totalDmg$EVTYPE, levels = totalDmg$EVTYPE)
    ggplot(head(totalDmg, 10), aes(reorder(EVTYPE, total), total)) + 
    geom_bar(stat = "identity", fill = rainbow(n= length(totalDmg$total)))+ coord_flip()+
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("EVENT TYPE") + ylab("DAMAGES (US$)") +
  ggtitle("Total Damages to Property & Crop by Weather Events across the US")