Synopsis

The analysis of storm event database showed that tornadoes are most dangerous weather event to the population health. And Flash floods cost most with property damages.

Description

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. https://www.coursera.org/learn/reproducible-research/peer/OMZ37/course-project-2.

Data Processing

First we need to load data into R

## suppose the csv file is in this default folder
df <- read.csv("repdata%2Fdata%2FStormData.csv")
head (df)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6
names (df)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Question 1

Since we want to know which types of events are most harmful with respect to population health, we can calculate the total numbers of “FATALITIES” and “INJURIES” with different type of events.

First we can calculate the sum number of “FATALITIES” with various “EVTYPE” variables.

fatalByEvt <- aggregate(FATALITIES ~ EVTYPE, data = df, FUN = sum)
fatalByEvt <- fatalByEvt[order(-fatalByEvt$FATALITIES),]

The top 10 events cause “FATALITIES” are

head(fatalByEvt, 10)
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224

Then we can calculate the total number of “INJURIES” with different type of events

injByEvt <- aggregate(INJURIES ~ EVTYPE, data = df, FUN = sum)
injByEvt <- injByEvt[order(-injByEvt$INJURIES),]

The top 10 events cause “INJURIES” are

head (injByEvt, 10)
##                EVTYPE INJURIES
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361

Question 2

Now let’s find out which types of events have the greatest economic consequences.

We noticed there is a colomn called PROPDMG which may represent property damage. So we can calculate the total numbers of damages caused by various events and compare which one has largest impact.

exp_unit <- function (e) {
        if (e %in% c ('h','H'))
                return (2)
        else if (e %in% c ('k','K'))
                return (3) 
        else if (e %in% c ('m','M'))
                return (6)
        else if (e %in% c ('B','B'))
                return (9)
        else if (!is.na(as.numeric((e))))
                return (as.numeric (e))
        else if (e %in% c (' ', '-', '+', '?'))
                return (0)
        else {
                stop ("invalid exponent value")
        }
}
prog_dm_exp <- sapply (df$PROPDMGEXP, FUN = exp_unit)
df$prog_dmg <-df$PROPDMG * (10**prog_dm_exp)
crop_dm_exp <- sapply (df$CROPDMGEXP, FUN = exp_unit)
df$crop_dmg <- df$CROPDMG * (10 ** crop_dm_exp)

The top 10 events which have the greatest economic consequences are:

PDMG <- aggregate(prog_dmg ~ EVTYPE, data = df, FUN = sum)
PDMG <- PDMG[order(-PDMG$prog_dmg),]
head (PDMG, 10)
##                 EVTYPE     prog_dmg
## 153        FLASH FLOOD 6.820237e+13
## 786 THUNDERSTORM WINDS 2.086532e+13
## 834            TORNADO 1.078951e+12
## 244               HAIL 3.157558e+11
## 464          LIGHTNING 1.729433e+11
## 170              FLOOD 1.446577e+11
## 411  HURRICANE/TYPHOON 6.930584e+10
## 185           FLOODING 5.920826e+10
## 670        STORM SURGE 4.332354e+10
## 310         HEAVY SNOW 1.793259e+10

The top 10 events which have the greatest crop economic damage consequences are

CDMG <- aggregate(crop_dmg ~ EVTYPE, data = df, FUN = sum)
CDMG <- CDMG[order(-CDMG$crop_dmg),]
head (CDMG, 10)
##                EVTYPE    crop_dmg
## 95            DROUGHT 13972566000
## 170             FLOOD  5661968450
## 590       RIVER FLOOD  5029459000
## 427         ICE STORM  5022113500
## 244              HAIL  3025974480
## 402         HURRICANE  2741910000
## 411 HURRICANE/TYPHOON  2607872800
## 153       FLASH FLOOD  1421317100
## 140      EXTREME COLD  1292973000
## 212      FROST/FREEZE  1094086000

Results

Here we can draw the plot of the calculated results from above

library (ggplot2)
library(scales)
topFatal <- head(fatalByEvt, 10)
topInjur <- head (injByEvt, 10)
topPDMG <- head (PDMG, 10)
topCDMG <- head (CDMG, 10)
F <- ggplot (data = topFatal, aes(x = reorder(EVTYPE,FATALITIES),
                                  y = FATALITIES), fill = supp)
F + geom_bar(width = .5,  stat = "identity", 
             fill = "dark blue") + coord_flip()+ xlab ("TYPE") +ggtitle ("Top 10 Fatalities")

Below is the plot of top injuries with various type

G <- ggplot (data = topInjur, aes(x = reorder(EVTYPE,INJURIES),
                                  y = INJURIES))
G + geom_bar(width = .5,  stat = "identity", 
             fill = "dark blue") + coord_flip()+ xlab ("TYPE") +ggtitle ("Top 10 Injuries")

Below is the plot of top property damages with variou type. The property damage is given in log scale since the difference between numbers are huge.

H <- ggplot (data = topPDMG, aes(x = reorder(EVTYPE,prog_dmg),
                                  y = log10(prog_dmg) ))
H + geom_bar(width = .5,  stat = "identity", 
             fill = "dark blue") + coord_flip()+ xlab ("TYPE") +ggtitle ("Top 10 property damage")+ylab ("Property damage in dollars (log-sclae)")