Sypnosis

In this report, analyisis is performed on the NOAA Storm Database to answer questions about severe weather events across the United States:

The database documents the occurrence of storms and other weather events that cause loss of life, injuries, property damage and/or disruption to commerce, which we can use to identify the top event-types based on the impact on population heath (death and injuries) and damage (property and crop).

The analysis shows

This report is useful for a government or municipal manager who might be responsible for preparing for severe weather events and will need to prioritize resources for different types of events.

Loading and Processing the Raw Data

The data for this assignment can be downloaded from the course web site: Storm Data [47Mb]. The events in the database start in the year 1950 and end in November 2011.

In the earlier years it was observed that there there are generally fewer events recorded while the records in recent years are more complete.

setwd("F:/Data Science Course/ProgammingAssignment53")
# Check if the "repdata-data-StormData.csv.bz" data frame exists
if (!file.exists("repdata_data_StormData.csv.bz2")) {
        # If not, download it 
        setInternet2(use = TRUE)
        download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "F:/Data Science Course/ProgammingAssignment53/repdata_data_StormData.csv.bz2")
}

# Read data from downloaded file
storm.raw.data <- read.csv(bzfile("repdata_data_StormData.csv.bz2"), 
                    header = TRUE, 
                    nrows = -1,
                    sep = ",",
                    stringsAsFactors = FALSE)
print(nrow(storm.raw.data))
## [1] 902297
print(ncol(storm.raw.data))
## [1] 37

Exploratory Analysis

The raw data frame contains 902297 observations and 37 fields

In order to optimize computational resources, we only consider rows where Fatalities, Injuries, Property Damage Expense or Crop Damage Expense are > 0 and select 7 relevant columns (EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) that will be used in this analysis.

storm.data <- subset(storm.raw.data, FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0, select = c(8, 23:28))
message("There are ", nrow(storm.data), " selected observations and ", ncol(storm.data), " fields")
## There are 254633 selected observations and 7 fields
#check out the structure of selected columns
str(storm.data)
## 'data.frame':    254633 obs. of  7 variables:
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...

Data Processing

Impact on population health

To analyze the impact on population health, we prepare the dataset to include both injuries and fatalities.

# Calculating events with fatalities
fatalities.data <- aggregate(FATALITIES ~ EVTYPE, data = storm.data, sum, na.rm = TRUE)
names(fatalities.data) <- c("EVENT_TYPE", "FATALITIES")
fatalities.data <- fatalities.data[order(-fatalities.data$FATALITIES), ]
fatalities.data[1:15, ]
##            EVENT_TYPE FATALITIES
## 407           TORNADO       5633
## 61     EXCESSIVE HEAT       1903
## 73        FLASH FLOOD        978
## 151              HEAT        937
## 258         LIGHTNING        816
## 423         TSTM WIND        504
## 86              FLOOD        470
## 306       RIP CURRENT        368
## 200         HIGH WIND        248
## 11          AVALANCHE        224
## 481      WINTER STORM        206
## 307      RIP CURRENTS        204
## 153         HEAT WAVE        172
## 67       EXTREME COLD        160
## 364 THUNDERSTORM WIND        133
# Calculating events with injuries
injuries.data <- aggregate(INJURIES ~ EVTYPE, data = storm.data, sum, na.rm = TRUE)
names(injuries.data) = c("EVENT_TYPE", "INJURIES")
injuries.data = injuries.data[order(-injuries.data$INJURIES), ]
injuries.data[1:15, ]
##            EVENT_TYPE INJURIES
## 407           TORNADO    91346
## 423         TSTM WIND     6957
## 86              FLOOD     6789
## 61     EXCESSIVE HEAT     6525
## 258         LIGHTNING     5230
## 151              HEAT     2100
## 238         ICE STORM     1975
## 73        FLASH FLOOD     1777
## 364 THUNDERSTORM WIND     1488
## 134              HAIL     1361
## 481      WINTER STORM     1321
## 224 HURRICANE/TYPHOON     1275
## 200         HIGH WIND     1137
## 170        HEAVY SNOW     1021
## 471          WILDFIRE      911
# Calcutating events with fatalities and injuries
impact.on.health <- aggregate(FATALITIES + INJURIES ~ EVTYPE, data = storm.data, sum, 
    na.rm = TRUE)
names(impact.on.health) <- c("EVENT_TYPE", "FATALITIES.AND.INJURIES")
impact.on.health = impact.on.health[order(-impact.on.health$FATALITIES.AND.INJURIES), ]

#Merge the data for future use in results section
casualties0 <- merge(fatalities.data, injuries.data)
casualties.data <- merge(casualties0,impact.on.health)
casualties.data <- casualties.data[order(-casualties.data$FATALITIES.AND.INJURIES), ]

Impact on economy

To analyze the impact on economy, we prepare the dataset to include both property and crop damages in order to find the events with the greatest economic consequences.

The damage value is represented by two parts “-DMG” (numeric) and “-DMGEXP” (alphanumeric) so we use the followin steps:

Retrieving values of exponents

expData <- storm.data[storm.data$PROPDMGEXP %in% c("", "K", "M", "B") & storm.data$CROPDMGEXP %in% c("", "K", "M", "B"), ]

We need to transform the exponent values (“”,1,H,K,M,B) for both crop damages and property damages into numerical values and multiply them by the economic damages. The following shows a function created to convert exponent values to numeric for the calculation of total damages, where the formula is DMG * Exponent

convExponent <- function(dmg, exp) {
    if (exp == "K") {
        dmg * 1000
    } else if (exp == "M") {
        dmg * 1e+06
    } else if (exp == "B") {
        dmg * 1e+09
    } else if (exp == "") {
        dmg
    } else {
        stop("NOT VALID DATA")
    }
}

Applying conversion function to CROPDMG and PROPDMG, and adding two new fields with total damage amounts

expData$PROP_DMG <- mapply(convExponent, expData$PROPDMG, expData$PROPDMGEXP)
expData$CROP_DMG <- mapply(convExponent, expData$CROPDMG, expData$CROPDMGEXP)

Calculate for the events (crop and property damages) which have the greatest economic consequences and convert total economic impact to “million dollars”"

#calculation crop damages
crop.damage <- aggregate(expData$CROP_DMG ~ EVTYPE, data = expData, sum, na.rm = TRUE)
names(crop.damage) <- c("EVENT_TYPE", "CROP_TOTAL_DMG")
crop.damage <- crop.damage[order(-crop.damage$CROP_TOTAL_DMG),]
crop.damage$cropMILLS <- crop.damage$CROP_TOTAL_DMG/10^6
crop.damage[1:15,c(1,3)]
##            EVENT_TYPE  cropMILLS
## 48            DROUGHT 13972.5660
## 84              FLOOD  5661.9685
## 305       RIVER FLOOD  5029.4590
## 233         ICE STORM  5022.1100
## 131              HAIL  3000.5375
## 210         HURRICANE  2741.9100
## 219 HURRICANE/TYPHOON  2607.8728
## 72        FLASH FLOOD  1420.7271
## 66       EXTREME COLD  1292.9730
## 111      FROST/FREEZE  1094.0860
## 156        HEAVY RAIN   733.3998
## 411    TROPICAL STORM   678.3460
## 196         HIGH WIND   638.5713
## 417         TSTM WIND   554.0073
## 60     EXCESSIVE HEAT   492.4020
#calculate the property damages
property.damage <- aggregate(expData$PROP_DMG ~ EVTYPE, data = expData, sum, na.rm = TRUE)
names(property.damage) <- c("EVENT_TYPE", "PROP_TOTAL_DMG")
property.damage <- property.damage[order(-property.damage$PROP_TOTAL_DMG),]
property.damage$propMILLS <- property.damage$PROP_TOTAL_DMG/10^6
property.damage[1:15,c(1,3)]
##            EVENT_TYPE  propMILLS
## 84              FLOOD 144657.710
## 219 HURRICANE/TYPHOON  69305.840
## 401           TORNADO  56925.485
## 345       STORM SURGE  43323.536
## 72        FLASH FLOOD  16140.812
## 131              HAIL  15727.166
## 210         HURRICANE  11868.319
## 411    TROPICAL STORM   7703.891
## 475      WINTER STORM   6688.497
## 196         HIGH WIND   5270.046
## 305       RIVER FLOOD   5118.945
## 465          WILDFIRE   4765.114
## 346  STORM SURGE/TIDE   4641.188
## 417         TSTM WIND   4484.928
## 233         ICE STORM   3944.928
#calculate the combined crop and property damages
economic.damage<- aggregate(expData$CROP_DMG + expData$PROP_DMG ~ 
    EVTYPE, data <- expData, sum, na.rm = TRUE)
names(economic.damage) <- c("EVENT_TYPE", "CROP_PROP_TOTAL_DMG")
economic.damage<- economic.damage[order(-economic.damage$CROP_PROP_TOTAL_DMG),]
economic.damage$ECODMGMILLS <- economic.damage$CROP_PROP_TOTAL_DMG/10^6

#merge data for future use in result section
economic0 <- merge(crop.damage,property.damage)
economic.data <- merge(economic0,economic.damage)
economic.data <- economic.data[order(-economic.data$CROP_PROP_TOTAL_DMG), ]

Results

Impact on Population Health

Question 1. Across the United States, which types of events (as indicated in the [EVTYPE] variable) are most harmful with respect to human health?

casualties.data[1:15, ]
##            EVENT_TYPE FATALITIES INJURIES FATALITIES.AND.INJURIES
## 407           TORNADO       5633    91346                   96979
## 61     EXCESSIVE HEAT       1903     6525                    8428
## 423         TSTM WIND        504     6957                    7461
## 86              FLOOD        470     6789                    7259
## 258         LIGHTNING        816     5230                    6046
## 151              HEAT        937     2100                    3037
## 73        FLASH FLOOD        978     1777                    2755
## 238         ICE STORM         89     1975                    2064
## 364 THUNDERSTORM WIND        133     1488                    1621
## 481      WINTER STORM        206     1321                    1527
## 200         HIGH WIND        248     1137                    1385
## 134              HAIL         15     1361                    1376
## 224 HURRICANE/TYPHOON         64     1275                    1339
## 170        HEAVY SNOW        127     1021                    1148
## 471          WILDFIRE         75      911                     986

The following plot shows that tornados are the most harmful weather event to population health. (injuries = 91346.0 and fatalities = 5633.0, total “casualties” = 96979.0)

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.2
ggplot(impact.on.health[1:15, ], aes(x = reorder(EVENT_TYPE, FATALITIES.AND.INJURIES), y = FATALITIES.AND.INJURIES)) + 
    geom_bar(stat = "identity",fill="red") + coord_flip() + 
     labs(x = "Event types", y = "Fatalities & Injuries", title = "Top 15 Weather Events with Fatalities & Injuries" )

Economic Impact

Question 2. Across the United States, which types of events (as indicated in the [EVTYPE] variable) have the greatest economic consequences?

economic.data[1:15,c(1,3,5,7)]
##            EVENT_TYPE  cropMILLS  propMILLS ECODMGMILLS
## 84              FLOOD  5661.9685 144657.710  150319.678
## 219 HURRICANE/TYPHOON  2607.8728  69305.840   71913.713
## 401           TORNADO   364.9501  56925.485   57290.436
## 345       STORM SURGE     0.0050  43323.536   43323.541
## 131              HAIL  3000.5375  15727.166   18727.703
## 72        FLASH FLOOD  1420.7271  16140.812   17561.539
## 48            DROUGHT 13972.5660   1046.106   15018.672
## 210         HURRICANE  2741.9100  11868.319   14610.229
## 305       RIVER FLOOD  5029.4590   5118.945   10148.405
## 233         ICE STORM  5022.1100   3944.928    8967.038
## 411    TROPICAL STORM   678.3460   7703.891    8382.237
## 475      WINTER STORM    26.9440   6688.497    6715.441
## 196         HIGH WIND   638.5713   5270.046    5908.618
## 465          WILDFIRE   295.4728   4765.114    5060.587
## 417         TSTM WIND   554.0073   4484.928    5038.936

The following plot shows that floods are the most harmful weather event to economy. (crop damages = $5661.97M and property damages = $144657.71M, total “economic impact” = $150319.68M)

ggplot(economic.damage[1:15, ], aes(x = reorder(EVENT_TYPE, ECODMGMILLS), y = ECODMGMILLS)) + 
    geom_bar(stat = "identity",fill="red") + coord_flip() + labs(x = "Event types", y = "Crop & Property Damage (in Millions) ", 
    title = "Top 15 Weather Events with Crop & Property Damages")

Conclusions of the analysis: