Synopsis

This report indicates which weather type event creates the biggest economic consequenses and have the biggest impact to public health across The United States of America. The data used for prosessing is provided by NOAA storm database. The data in this report is aggregated by the event type for damage and health indicators. The result set shows that the Tornadoes have the greatest human cost, while Floods have the biggest economical consequences.

Data Processing

1. Extract and load the data

# Download zip file if it doesn't exist in the working directory.
if(!file.exists("StormData.csv.bz2")){
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "StormData.csv.bz2", method = "curl")
}

#Read file
stormData <- read.csv("StormData.csv.bz2", stringsAsFactors = FALSE)

2. Transformations

  1. Have a look at the structure of the dataset.
    The columns that are used to produce this report are:
  • EVTYPE - the type of the weather phenomena event.
  • FATALITIES - number of fatalities.
  • INJURIES - number of injuries.
  • PROPDMG - estimate of property damage.
  • PROPDMGEXP - magnitude / exponent of property damage.
  • CROPDMG- estimate of crop damage.
  • CROPDMGEXP - magnitude / exponent of crop damage.

Start by loading the dplyr library, then select only the columns that we want to use.

library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
stormData <- select(stormData, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

# Show the structure of the dataset.
str(stormData)
## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
  1. First, we have to get the dollar values for the estimates of the property damage and crop damage.
# Create a conversion table 
expFac <- data.frame(c("","B","m","M","K","H","h", "1","2","3","4","5","6","7","8","0","+","-","?","k"),
                        c(1,1e+09,1e+06,1e+06,1000,100,100,10,100,1000,10000,1e+05,1e+06,1e+07,1e+08,
                          1,0,0,0,1000))

# Extract property and crop damage data
PD <- stormData$PROPDMG
PDE <- stormData$PROPDMGEXP
CD <- stormData$CROPDMG
CDE <- stormData$CROPDMGEXP

# Create new variables and populate the value of the damage to property and crops
rows <- length(PD)
PROPDMGVAL <- 1:rows
CROPDMGVAL <- 1:rows

for(i in 1:rows){
      PROPDMGVAL[i] <- PD[i] * expFac[expFac[,1] == PDE[i],2]
      CROPDMGVAL[i] <- CD[i] * expFac[expFac[,1] == CDE[i],2]
}

# Add values of damages to stormData  
stormData <- cbind(stormData, PROPDMGVAL) 
stormData <- cbind(stormData, CROPDMGVAL) 
  1. Sum human costs and economical costs of the events.
stormData$HCOST <- stormData$FATALITIES + stormData$INJURIES
stormData$ECOST <- stormData$PROPDMGVAL + stormData$CROPDMGVAL
  1. Subset the data to aggregate the human costs and the economical costs of the events.
h_cost <- subset(stormData, select = c(EVTYPE, FATALITIES, INJURIES, HCOST ))
e_cost <- subset(stormData, select = c(EVTYPE, PROPDMGVAL, CROPDMGVAL, ECOST))
  1. Aggregate the costs by events, and sort by descending order.
# Aggregates for fatalities and injuries
totFatalities <- aggregate(FATALITIES ~ EVTYPE, data = h_cost, FUN = sum)
totFatalities <- arrange(totFatalities, desc(FATALITIES))
totInjuries <- aggregate(INJURIES ~ EVTYPE, data = h_cost, FUN = sum)
totInjuries <- arrange(totInjuries, desc(INJURIES))
totHCost <- aggregate(HCOST ~ EVTYPE, data = h_cost, FUN = sum)
totHCost <- arrange(totHCost, desc(HCOST))

# Aggregates for property and crop damage
totPropDmg <- aggregate(PROPDMGVAL ~ EVTYPE, data = h_cost, FUN = sum)
totPropDmg <- arrange(totPropDmg, desc(PROPDMGVAL))
totCropDmg <- aggregate(CROPDMGVAL ~ EVTYPE, data = h_cost, FUN = sum)
totCropDmg <- arrange(totCropDmg, desc(CROPDMGVAL))
totECost <- aggregate(ECOST ~ EVTYPE, data = e_cost, FUN = sum)
totECost <- arrange(totECost, desc(ECOST))

Results

Questions to answer:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
head(totFatalities)
##           EVTYPE FATALITIES
## 1        TORNADO       5633
## 2 EXCESSIVE HEAT       1903
## 3    FLASH FLOOD        978
## 4           HEAT        937
## 5      LIGHTNING        816
## 6      TSTM WIND        504
head(totInjuries)
##           EVTYPE INJURIES
## 1        TORNADO    91346
## 2      TSTM WIND     6957
## 3          FLOOD     6789
## 4 EXCESSIVE HEAT     6525
## 5      LIGHTNING     5230
## 6           HEAT     2100
head(totHCost)
##           EVTYPE HCOST
## 1        TORNADO 96979
## 2 EXCESSIVE HEAT  8428
## 3      TSTM WIND  7461
## 4          FLOOD  7259
## 5      LIGHTNING  6046
## 6           HEAT  3037

From the tables, we can deduce that Tornadoes are the most harmfulwith respect to population health.

  1. Across the United States, which types of events have the greatest economic consequences?
head(totPropDmg)
##              EVTYPE   PROPDMGVAL
## 1             FLOOD 144657709807
## 2 HURRICANE/TYPHOON  69305840000
## 3           TORNADO  56947380616
## 4       STORM SURGE  43323536000
## 5       FLASH FLOOD  16822673978
## 6              HAIL  15735267513
head(totCropDmg)
##        EVTYPE  CROPDMGVAL
## 1     DROUGHT 13972566000
## 2       FLOOD  5661968450
## 3 RIVER FLOOD  5029459000
## 4   ICE STORM  5022113500
## 5        HAIL  3025954473
## 6   HURRICANE  2741910000
head(totECost)
##              EVTYPE        ECOST
## 1             FLOOD 150319678257
## 2 HURRICANE/TYPHOON  71913712800
## 3           TORNADO  57362333886
## 4       STORM SURGE  43323541000
## 5              HAIL  18761221986
## 6       FLASH FLOOD  18243991078

From the tables, eventhough Droughts cause more Crop Damage then Floods, when added to the Property Damage, Floods cause the greatest overall economic consequences.

Plot Data

The plot below shows the Top 10 Total Human Costs (Fatalities + Injuries) by Types of Weather Event.

par(mar= c(8, 5, 4, 2) + 0.1)
barplot(height = totHCost$HCOST[1:10], names.arg = totHCost$EVTYPE[1:10], las = 2, cex.names= 0.7, col = heat.colors (10), main = "Top 10 Total Human Costs (Fatalities + Injuries) by Event Type")

The plot below shows the types of weather events which cause the most economic consequences.

par(mar= c(8, 6, 4, 2) + 0.1)
barplot(height = totECost$ECOST[1:10], names.arg = totECost$EVTYPE[1:10], las = 2, cex.names= 0.7, col = heat.colors (10), main = "Top 10 Total Economic Costs (Property + Crop) by Event Type")

Conclusions

The result set shows that the Tornadoes have the greatest human cost, while Floods have the biggest economical consequences.