Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The report tries to answer the following two questions:

Q1. Across the United States, which types of events are most harmful with respect to population health?

Q2. Across the United States, which types of events have the greatest economic consequences?

Importing libraries

library(dplyr)
library(knitr)
library(data.table)
setwd("D:/R working directory")

Processing Data

Downloading Data

From the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database we obtain Storm Data in the form of a comma-separated-value file compressed via the bzip2 algorithm.

if(!file.exists("./StormData.csv.bz2"))
    {
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                  destfile = "./StormData.csv.bz2")
    }

Reading Data

The events in the database start in the year 1950 and end in November 2011.

f <- fread(
    "./StormData.csv.bz2",
    sep = ",",
    header = T
    )
dim(f)
## [1] 902297     37
head(f)
##    STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1:       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2:       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3:       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4:       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5:       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6:       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##     EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1: TORNADO         0                                               0         NA
## 2: TORNADO         0                                               0         NA
## 3: TORNADO         0                                               0         NA
## 4: TORNADO         0                                               0         NA
## 5: TORNADO         0                                               0         NA
## 6: TORNADO         0                                               0         NA
##    END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1:         0                      14.0   100 3   0          0       15    25.0
## 2:         0                       2.0   150 2   0          0        0     2.5
## 3:         0                       0.1   123 2   0          0        2    25.0
## 4:         0                       0.0   100 2   0          0        2     2.5
## 5:         0                       0.0   150 2   0          0        2     2.5
## 6:         0                       1.5   177 2   0          0        6     2.5
##    PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1:          K       0                                         3040      8812
## 2:          K       0                                         3042      8755
## 3:          K       0                                         3340      8742
## 4:          K       0                                         3458      8626
## 5:          K       0                                         3412      8642
## 6:          K       0                                         3450      8748
##    LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1:       3051       8806              1
## 2:          0          0              2
## 3:          0          0              3
## 4:          0          0              4
## 5:          0          0              5
## 6:          0          0              6

Extracting Required Data

The extracted dataset includes:

  • Event Types (EVTYPE)
  • Fatalities
  • Injuries
  • Property Damage (PROPDMG)
  • Property Damage Expense by symbol (PROPDAMAGE)
  • Crop Damage (CROPDMG)
  • Crop Damage Expense by symbol (CROPDAMAGE)
f <- f[,c(8,23:28)]
head(f)
##     EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1: TORNADO          0       15    25.0          K       0           
## 2: TORNADO          0        0     2.5          K       0           
## 3: TORNADO          0        2    25.0          K       0           
## 4: TORNADO          0        2     2.5          K       0           
## 5: TORNADO          0        2     2.5          K       0           
## 6: TORNADO          0        6     2.5          K       0

PROPDMGEXP and CROPDMGEXP columns identify the magnitude that the damage should be multiplied against to accurately assess damage amount. For example, K indicates multiplying by 103, M by 106, B by 109 etc.

Finding Property Damage

table(f$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5      6 
## 465934      1      8      5    216     25     13      4      4     28      4 
##      7      8      B      h      H      K      m      M 
##      5      1     40      1      6 424665      7  11330
f <- f %>%
    mutate(
        PROPDMG = PROPDMG*case_when(
            PROPDMGEXP == "B"|PROPDMGEXP == "b" ~ 10^9,
            PROPDMGEXP == "M"|PROPDMGEXP == "m" ~ 10^6,
            PROPDMGEXP == "K"|PROPDMGEXP == "k" ~ 10^3,
            PROPDMGEXP == "H"|PROPDMGEXP == "h" ~ 10^2,
            PROPDMGEXP == 1 ~ 1,
            PROPDMGEXP == 2 ~ 2,
            PROPDMGEXP == 3 ~ 3,
            PROPDMGEXP == 4 ~ 4,
            PROPDMGEXP == 5 ~ 5,
            PROPDMGEXP == 6 ~ 6,
            PROPDMGEXP == 7 ~ 7,
            PROPDMGEXP == 8 ~ 8,
            T ~ 0
            ),
        PROPDMGEXP = NULL
        )
head(f)
##     EVTYPE FATALITIES INJURIES PROPDMG CROPDMG CROPDMGEXP
## 1: TORNADO          0       15   25000       0           
## 2: TORNADO          0        0    2500       0           
## 3: TORNADO          0        2   25000       0           
## 4: TORNADO          0        2    2500       0           
## 5: TORNADO          0        2    2500       0           
## 6: TORNADO          0        6    2500       0

Finding Crop Damage

table(f$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994
f <- f %>%
    mutate(
        CROPDMG = CROPDMG*case_when(
            CROPDMGEXP == "B"|CROPDMGEXP == "b" ~ 10^9,
            CROPDMGEXP == "M"|CROPDMGEXP == "m" ~ 10^6,
            CROPDMGEXP == "K"|CROPDMGEXP == "k" ~ 10^3,
            CROPDMGEXP == 2 ~ 2,
            T ~ 0
            ),
        CROPDMGEXP = NULL
        )
head(f)
##     EVTYPE FATALITIES INJURIES PROPDMG CROPDMG
## 1: TORNADO          0       15   25000       0
## 2: TORNADO          0        0    2500       0
## 3: TORNADO          0        2   25000       0
## 4: TORNADO          0        2    2500       0
## 5: TORNADO          0        2    2500       0
## 6: TORNADO          0        6    2500       0

Analysing Data

Q1. Across the United States, which types of events are most harmful with respect to population health?

Finding Aggregate Fatalities per event type

total_fatalities <- aggregate(FATALITIES~EVTYPE,f, sum) %>%
    arrange(desc(FATALITIES))
total_fatalities <- total_fatalities[1:10,]
total_fatalities
##            EVTYPE FATALITIES
## 1         TORNADO       5633
## 2  EXCESSIVE HEAT       1903
## 3     FLASH FLOOD        978
## 4            HEAT        937
## 5       LIGHTNING        816
## 6       TSTM WIND        504
## 7           FLOOD        470
## 8     RIP CURRENT        368
## 9       HIGH WIND        248
## 10      AVALANCHE        224

Finding Aggregate Injuries per event type

total_injuries <- aggregate(INJURIES~EVTYPE,f, sum) %>%
    arrange(desc(INJURIES))
total_injuries <- total_injuries[1:10,]
total_injuries
##               EVTYPE INJURIES
## 1            TORNADO    91346
## 2          TSTM WIND     6957
## 3              FLOOD     6789
## 4     EXCESSIVE HEAT     6525
## 5          LIGHTNING     5230
## 6               HEAT     2100
## 7          ICE STORM     1975
## 8        FLASH FLOOD     1777
## 9  THUNDERSTORM WIND     1488
## 10              HAIL     1361

Plotting events with highest fatalities and injuries

par(
    mfrow = c(1, 2),
    mar = c(12, 4, 3, 2),
    cex = 0.8
    )
barplot(
    total_fatalities$FATALITIES,
    names.arg = total_fatalities$EVTYPE,
    las = 3,
    main = "Events with Highest Fatalities",
    ylab = "Number of fatalities",
    col = "red"
    )
barplot(
    total_injuries$INJURIES,
    names.arg = total_injuries$EVTYPE,
    las = 3,
    main = "Events with Highest Injuries",
    ylab = "Number of injuries",
    col = "red"
    )

Q2. Across the United States, which types of events have the greatest economic consequences?

Finding Aggregate Property Damage per event type

total_propdmg <- aggregate(PROPDMG~EVTYPE,f, sum) %>%
    arrange(desc(PROPDMG))
total_propdmg <- total_propdmg[1:10,]
total_propdmg
##               EVTYPE      PROPDMG
## 1              FLOOD 144657709800
## 2  HURRICANE/TYPHOON  69305840000
## 3            TORNADO  56937160991
## 4        STORM SURGE  43323536000
## 5        FLASH FLOOD  16140812087
## 6               HAIL  15732267370
## 7          HURRICANE  11868319010
## 8     TROPICAL STORM   7703890550
## 9       WINTER STORM   6688497250
## 10         HIGH WIND   5270046260

Finding Aggregate Crop Damage per event type

total_cropdmg <- aggregate(CROPDMG~EVTYPE,f, sum) %>%
    arrange(desc(CROPDMG))
total_cropdmg <- total_cropdmg[1:10,]
total_cropdmg
##               EVTYPE     CROPDMG
## 1            DROUGHT 13972566000
## 2              FLOOD  5661968450
## 3        RIVER FLOOD  5029459000
## 4          ICE STORM  5022113500
## 5               HAIL  3025954450
## 6          HURRICANE  2741910000
## 7  HURRICANE/TYPHOON  2607872800
## 8        FLASH FLOOD  1421317100
## 9       EXTREME COLD  1292973000
## 10      FROST/FREEZE  1094086000

Plotting events with highest property and crop damage

par(
    mfrow = c(1, 2),
    mar = c(12, 4, 3, 2),
    cex = 0.8
    )
barplot(
    total_propdmg$PROPDMG/10^9,
    names.arg = total_propdmg$EVTYPE,
    las = 3,
    main = "Events with Highest Property Damage",
    ylab = "Damage Cost(in billion $)",
    col = "green"
    )
barplot(
    total_cropdmg$CROPDMG/10^9,
    names.arg = total_cropdmg$EVTYPE,
    las = 3,
    main = "Events with Highest Crop Damage",
    ylab = "Damage Cost(in billion $)",
    col = "green"
    )

Finding total economic damage per event type

economic_damage <- aggregate(PROPDMG+CROPDMG~EVTYPE, f, sum)
names(economic_damage) <- c("EVTYPE","Total_Damage")
economic_damage <- arrange(economic_damage,desc(Total_Damage))
economic_damage <- economic_damage[1:15,]
economic_damage
##               EVTYPE Total_Damage
## 1              FLOOD 150319678250
## 2  HURRICANE/TYPHOON  71913712800
## 3            TORNADO  57352114101
## 4        STORM SURGE  43323541000
## 5               HAIL  18758221820
## 6        FLASH FLOOD  17562129187
## 7            DROUGHT  15018672000
## 8          HURRICANE  14610229010
## 9        RIVER FLOOD  10148404500
## 10         ICE STORM   8967041310
## 11    TROPICAL STORM   8382236550
## 12      WINTER STORM   6715441250
## 13         HIGH WIND   5908617560
## 14          WILDFIRE   5060586800
## 15         TSTM WIND   5038935790

Plotting events with highest total economic damage

par(
    mfrow = c(1, 1),
    mar = c(12, 4, 3, 2),
    cex = 0.8
    )
barplot(
    economic_damage$Total_Damage/10^9,
    names.arg = economic_damage$EVTYPE,
    las = 3,
    main = "Events with Highest Crop Damage",
    ylab = "Damage Cost(in billion $)",
    col = "yellow"
    )

Results

1. Across the United States, which types of events are most harmful with respect to population health?

Tornados caused the maximum number of fatalities and injuries. It was followed by Excessive Heat for fatalities and Thunderstorm wind for injuries.

2. Across the United States, which types of events have the greatest economic consequences?

Floods caused the maximum property damage where as Drought caused the maximum crop damage. Second major events that caused the maximum damage was Hurricanes/Typhoon for property damage and Floods for crop damage.

Ovreall maximum total economic damage is caused by Floods, followed by Hurricanes/Typhoon.