Synopsis

This document presents an analysis of various meteorological events across the USA and its human health (injuries and fatalities) and economic (property and crop damage) impact. The data was compiled from NOAA (via Coursera) for the years from 1950 thru November 2011.

The questions being analyzed are the type of events most harmful to population health, and type of events that had the greatest economic consequences.

The highest human fatalities are from Tornoadoes at 5,633. The highest injuries are also from Tornadoes at 91,346.

The highest property damage is from Flood at USD 144.7 billion, while the highest crop damage is from Drought at USD 14 billion. The highest combined economic damage is from Flood at USD 150.3 billion.

Data Processing

The weather event data is compiled from National Weather Service. Prior to 1993, the data is available for fewer weather events such as, Hail, Tornado or Tsunami Winds only. This may cause the results to be skewed towards those events.

Load the full dataset into a data frame evWeatherFull and examine evWeatherFull.

evWeatherFull <- read.csv(bzfile("repdata_data_StormData.csv.bz2"))

str(evWeatherFull)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
head(evWeatherFull)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

Data Cleaning.

  • EVTYPE - Type of meteorological event.
  • FATALITIES - Fatalities caused by the event. Unit of measure is assumed as count.
  • INJURIES - Injuries caused by the event. Unit of measure is assumed as count.
  • PROPDMG - Property damage caused by the event.
  • PROPDMGEXP - Unit of measure for property damage, deemed as economic damage in USD.
  • CROPDMG - Crop damage caused by the event.
  • CROPDMGEXP- Unit of measure for crop damage, deemed as economic damage in USD.

Create a data frame that includes only the above fields.

evWeather <- data.frame(evWeatherFull$EVTYPE, evWeatherFull$FATALITIES, evWeatherFull$INJURIES, 
   evWeatherFull$PROPDMG, evWeatherFull$PROPDMGEXP, evWeatherFull$CROPDMG, evWeatherFull$CROPDMGEXP)

Cleaning the Data

The following data is cleaned up for missing values and applying unit of measure to the amounts make the data frame evWeather amenable to further analysis.

  • Missing values are replaced wityh 0.
  • Unit of measure fields PROPDMGEXP and CROPDMGEXP contain values h, k, m, b to signify hundreds, thousands, millions or billions. Instead of letters, the fields may contain numbers to signify the most significant digits. These measures are applied to PROPDMG and CROPDMG fields respectively to get actual USD value.

The unique values for PROPDMGEXP are

unique(evWeather$evWeatherFull.PROPDMGEXP )
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M

The unique values for CROPDMGEXP are

unique(evWeather$evWeatherFull.CROPDMGEXP )
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M
# Impute zero for missing values
evWeather$evWeatherFull.FATALITIES[(evWeather$evWeatherFull.FATALITIES == "")] <- 0
evWeather$evWeatherFull.INJURIES[(evWeather$evWeatherFull.INJURIES == "")] <- 0
evWeather$evWeatherFull.PROPDMG[(evWeather$evWeatherFull.PROPDMG == "")] <- 0
evWeather$evWeatherFull.CROPDMG[(evWeather$evWeatherFull.CROPDMG == "")] <- 0

# Apply unit of measure.
# Convert numeric units of measure to character for interpretation.
evWeather$evWeatherFull.PROPDMGEXP <- as.character(evWeather$evWeatherFull.PROPDMGEXP)
evWeather$evWeatherFull.CROPDMGEXP <- as.character(evWeather$evWeatherFull.CROPDMGEXP)

# interpret units of measure.
evWeather$evWeatherFull.PROPDMGEXP[(evWeather$evWeatherFull.PROPDMGEXP == "")] <- 0
evWeather$evWeatherFull.PROPDMGEXP[(evWeather$evWeatherFull.PROPDMGEXP == "+") | 
   (evWeather$evWeatherFull.PROPDMGEXP == "-") | (evWeather$evWeatherFull.PROPDMGEXP == "?")] <- 1
evWeather$evWeatherFull.PROPDMGEXP[(evWeather$evWeatherFull.PROPDMGEXP == "h") | 
   (evWeather$evWeatherFull.PROPDMGEXP == "H")] <- 2
evWeather$evWeatherFull.PROPDMGEXP[(evWeather$evWeatherFull.PROPDMGEXP == "k") | 
   (evWeather$evWeatherFull.PROPDMGEXP == "K")] <- 3
evWeather$evWeatherFull.PROPDMGEXP[(evWeather$evWeatherFull.PROPDMGEXP == "m") | 
   (evWeather$evWeatherFull.PROPDMGEXP == "M")] <- 6
evWeather$evWeatherFull.PROPDMGEXP[(evWeather$evWeatherFull.PROPDMGEXP == "B")] <- 9

evWeather$evWeatherFull.CROPDMGEXP[(evWeather$evWeatherFull.CROPDMGEXP == "")] <- 0
evWeather$evWeatherFull.CROPDMGEXP[(evWeather$evWeatherFull.CROPDMGEXP == "+") | 
   (evWeather$evWeatherFull.CROPDMGEXP == "-") | (evWeather$evWeatherFull.CROPDMGEXP == "?")] <- 1
evWeather$evWeatherFull.CROPDMGEXP[(evWeather$evWeatherFull.CROPDMGEXP == "h") | 
   (evWeather$evWeatherFull.CROPDMGEXP == "H")] <- 2
evWeather$evWeatherFull.CROPDMGEXP[(evWeather$evWeatherFull.CROPDMGEXP == "k") | 
   (evWeather$evWeatherFull.CROPDMGEXP == "K")] <- 3
evWeather$evWeatherFull.CROPDMGEXP[(evWeather$evWeatherFull.CROPDMGEXP == "m") | 
   (evWeather$evWeatherFull.CROPDMGEXP == "M")] <- 6
evWeather$evWeatherFull.CROPDMGEXP[(evWeather$evWeatherFull.CROPDMGEXP == "B")] <- 9

# Convert units of measure - PROPDMGEXP and CROPDMGEXP - to numeric, so it can be multiplied to 
# PROPDMG and CROPDMG.
evWeather$evWeatherFull.PROPDMGEXP <- as.numeric(evWeather$evWeatherFull.PROPDMGEXP)
evWeather$evWeatherFull.CROPDMGEXP <- as.numeric(evWeather$evWeatherFull.CROPDMGEXP)

# Multiply the units of measure to get actual values for total economic damage.
propertyDamage <- (evWeather$evWeatherFull.PROPDMG * 10^evWeather$evWeatherFull.PROPDMGEXP) 
evWeather <- cbind(evWeather, propertyDamage)

cropDamage <- (evWeather$evWeatherFull.CROPDMG * 10^evWeather$evWeatherFull.CROPDMGEXP)
evWeather <- cbind(evWeather, cropDamage)

TOTALDMG <- (evWeather$evWeatherFull.PROPDMG * 10^evWeather$evWeatherFull.PROPDMGEXP) + 
   (evWeather$evWeatherFull.CROPDMG * 10^evWeather$evWeatherFull.CROPDMGEXP)
evWeather <- cbind(evWeather, TOTALDMG)

Results

Belos is aggregate of FATALITIES, INJURIES, PROPDMG, CROPDMG and TOTALDMG by EVTYPE and sorted descending to get the highest values by EVTYPE.

totalDamage is the sum of PROPDMG and CROPDMG.

totalFatalities <- sort(tapply(evWeather$evWeatherFull.FATALITIES, evWeather$evWeatherFull.EVTYPE, sum), 
   decreasing = T)
totalInjuries <- sort(tapply(evWeather$evWeatherFull.INJURIES, evWeather$evWeatherFull.EVTYPE, sum), 
   decreasing = T)

totalPropDamage <- sort(tapply(evWeather$propertyDamage, evWeather$evWeatherFull.EVTYPE, sum), 
   decreasing = T)
totalCropDamage <- sort(tapply(evWeather$cropDamage, evWeather$evWeatherFull.EVTYPE, sum), 
   decreasing = T)
totalDamage <- sort(tapply(evWeather$TOTALDMG, evWeather$evWeatherFull.EVTYPE, sum), decreasing = T)

library(ggplot2)

Human Consequences of Meteorological Events

The events resulting in highest fatalities are :

head(format(round(totalFatalities, 0), digits=12, nsmall=0, big.mark=","), 12)
##        TORNADO EXCESSIVE HEAT    FLASH FLOOD           HEAT      LIGHTNING 
##        "5,633"        "1,903"        "  978"        "  937"        "  816" 
##      TSTM WIND          FLOOD    RIP CURRENT      HIGH WIND      AVALANCHE 
##        "  504"        "  470"        "  368"        "  248"        "  224" 
##   WINTER STORM   RIP CURRENTS 
##        "  206"        "  204"
events <- as.vector(names(head(totalFatalities, 10)))
damage <- as.vector(head(totalFatalities, 10))
plotData <- data.frame(events, damage)
ggplot(data = plotData, aes(x = plotData$events, y = plotData$damage)) + geom_bar(stat = "identity") +
   theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Top Ten Meteorological Events") + 
   ylab("Total Fatalities") + ggtitle("Ten Highest Total Fatalities by Meteorological Events - 1950-2011") 

The events resulting in highest injuries are :

head(format(round(totalInjuries, 0), digits=12, nsmall=0, big.mark=","), 12)
##           TORNADO         TSTM WIND             FLOOD    EXCESSIVE HEAT 
##          "91,346"          " 6,957"          " 6,789"          " 6,525" 
##         LIGHTNING              HEAT         ICE STORM       FLASH FLOOD 
##          " 5,230"          " 2,100"          " 1,975"          " 1,777" 
## THUNDERSTORM WIND              HAIL      WINTER STORM HURRICANE/TYPHOON 
##          " 1,488"          " 1,361"          " 1,321"          " 1,275"
events <- as.vector(names(head(totalInjuries, 10)))
damage <- as.vector(head(totalInjuries, 10))
plotData <- data.frame(events, damage)
ggplot(data = plotData, aes(x = plotData$events, y = plotData$damage)) + geom_bar(stat = "identity") + 
   theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Top Ten Meteorological Events") + 
   ylab("Total Injuries") + ggtitle("Ten Highest Total Injuries by Meteorological Events - 1950-2011") 

Economic Consequences of Meteorological Events

The events resulting in highest property damage are :

head(format(round(totalPropDamage, 0), digits=12, nsmall=0, big.mark=","), 12)
##             FLOOD HURRICANE/TYPHOON           TORNADO       STORM SURGE 
## "144,657,709,807" " 69,305,840,000" " 56,947,381,216" " 43,323,536,000" 
##       FLASH FLOOD              HAIL         HURRICANE    TROPICAL STORM 
## " 16,822,673,978" " 15,735,267,513" " 11,868,319,010" "  7,703,890,550" 
##      WINTER STORM         HIGH WIND       RIVER FLOOD          WILDFIRE 
## "  6,688,497,251" "  5,270,046,610" "  5,118,945,500" "  4,765,114,000"

The events resulting in highest crop damage are :

head(format(round(totalCropDamage, 0), digits=12, nsmall=0, big.mark=","), 12)
##           DROUGHT             FLOOD       RIVER FLOOD         ICE STORM 
##  "13,972,566,000"  " 5,661,968,450"  " 5,029,459,000"  " 5,022,113,500" 
##              HAIL         HURRICANE HURRICANE/TYPHOON       FLASH FLOOD 
##  " 3,025,954,473"  " 2,741,910,000"  " 2,607,872,800"  " 1,421,317,100" 
##      EXTREME COLD      FROST/FREEZE        HEAVY RAIN    TROPICAL STORM 
##  " 1,292,973,000"  " 1,094,086,000"  "   733,399,800"  "   678,346,000"

The events resulting in highest damage (combined property and crop) are :

head(format(round(totalDamage, 0), digits=12, nsmall=0, big.mark=","), 12)
##             FLOOD HURRICANE/TYPHOON           TORNADO       STORM SURGE 
## "150,319,678,257" " 71,913,712,800" " 57,362,334,486" " 43,323,541,000" 
##              HAIL       FLASH FLOOD           DROUGHT         HURRICANE 
## " 18,761,221,986" " 18,243,991,078" " 15,018,672,000" " 14,610,229,010" 
##       RIVER FLOOD         ICE STORM    TROPICAL STORM      WINTER STORM 
## " 10,148,404,500" "  8,967,041,360" "  8,382,236,550" "  6,715,441,251"
events <- as.vector(names(head(totalDamage, 10)))
damage <- as.vector(head(totalDamage/(10^6), 10))
plotData <- data.frame(events, damage)
ggplot(data = plotData, aes(x = plotData$events, y = plotData$damage)) + geom_bar(stat = "identity") +
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Top Ten Meteorological Events") + 
    ylab("Total Damage (USD Millions)") + ggtitle("Ten Highest Combined Damage by Meteorological Events - 1950-2011")