This document presents an analysis of various meteorological events across the USA and its human health (injuries and fatalities) and economic (property and crop damage) impact. The data was compiled from NOAA (via Coursera) for the years from 1950 thru November 2011.
The questions being analyzed are the type of events most harmful to population health, and type of events that had the greatest economic consequences.
The highest human fatalities are from Tornoadoes at 5,633. The highest injuries are also from Tornadoes at 91,346.
The highest property damage is from Flood at USD 144.7 billion, while the highest crop damage is from Drought at USD 14 billion. The highest combined economic damage is from Flood at USD 150.3 billion.
The weather event data is compiled from National Weather Service. Prior to 1993, the data is available for fewer weather events such as, Hail, Tornado or Tsunami Winds only. This may cause the results to be skewed towards those events.
evWeatherFull <- read.csv(bzfile("repdata_data_StormData.csv.bz2"))
str(evWeatherFull)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
head(evWeatherFull)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
Create a data frame that includes only the above fields.
evWeather <- data.frame(evWeatherFull$EVTYPE, evWeatherFull$FATALITIES, evWeatherFull$INJURIES,
evWeatherFull$PROPDMG, evWeatherFull$PROPDMGEXP, evWeatherFull$CROPDMG, evWeatherFull$CROPDMGEXP)
The following data is cleaned up for missing values and applying unit of measure to the amounts make the data frame evWeather amenable to further analysis.
The unique values for PROPDMGEXP are
unique(evWeather$evWeatherFull.PROPDMGEXP )
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
The unique values for CROPDMGEXP are
unique(evWeather$evWeatherFull.CROPDMGEXP )
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
# Impute zero for missing values
evWeather$evWeatherFull.FATALITIES[(evWeather$evWeatherFull.FATALITIES == "")] <- 0
evWeather$evWeatherFull.INJURIES[(evWeather$evWeatherFull.INJURIES == "")] <- 0
evWeather$evWeatherFull.PROPDMG[(evWeather$evWeatherFull.PROPDMG == "")] <- 0
evWeather$evWeatherFull.CROPDMG[(evWeather$evWeatherFull.CROPDMG == "")] <- 0
# Apply unit of measure.
# Convert numeric units of measure to character for interpretation.
evWeather$evWeatherFull.PROPDMGEXP <- as.character(evWeather$evWeatherFull.PROPDMGEXP)
evWeather$evWeatherFull.CROPDMGEXP <- as.character(evWeather$evWeatherFull.CROPDMGEXP)
# interpret units of measure.
evWeather$evWeatherFull.PROPDMGEXP[(evWeather$evWeatherFull.PROPDMGEXP == "")] <- 0
evWeather$evWeatherFull.PROPDMGEXP[(evWeather$evWeatherFull.PROPDMGEXP == "+") |
(evWeather$evWeatherFull.PROPDMGEXP == "-") | (evWeather$evWeatherFull.PROPDMGEXP == "?")] <- 1
evWeather$evWeatherFull.PROPDMGEXP[(evWeather$evWeatherFull.PROPDMGEXP == "h") |
(evWeather$evWeatherFull.PROPDMGEXP == "H")] <- 2
evWeather$evWeatherFull.PROPDMGEXP[(evWeather$evWeatherFull.PROPDMGEXP == "k") |
(evWeather$evWeatherFull.PROPDMGEXP == "K")] <- 3
evWeather$evWeatherFull.PROPDMGEXP[(evWeather$evWeatherFull.PROPDMGEXP == "m") |
(evWeather$evWeatherFull.PROPDMGEXP == "M")] <- 6
evWeather$evWeatherFull.PROPDMGEXP[(evWeather$evWeatherFull.PROPDMGEXP == "B")] <- 9
evWeather$evWeatherFull.CROPDMGEXP[(evWeather$evWeatherFull.CROPDMGEXP == "")] <- 0
evWeather$evWeatherFull.CROPDMGEXP[(evWeather$evWeatherFull.CROPDMGEXP == "+") |
(evWeather$evWeatherFull.CROPDMGEXP == "-") | (evWeather$evWeatherFull.CROPDMGEXP == "?")] <- 1
evWeather$evWeatherFull.CROPDMGEXP[(evWeather$evWeatherFull.CROPDMGEXP == "h") |
(evWeather$evWeatherFull.CROPDMGEXP == "H")] <- 2
evWeather$evWeatherFull.CROPDMGEXP[(evWeather$evWeatherFull.CROPDMGEXP == "k") |
(evWeather$evWeatherFull.CROPDMGEXP == "K")] <- 3
evWeather$evWeatherFull.CROPDMGEXP[(evWeather$evWeatherFull.CROPDMGEXP == "m") |
(evWeather$evWeatherFull.CROPDMGEXP == "M")] <- 6
evWeather$evWeatherFull.CROPDMGEXP[(evWeather$evWeatherFull.CROPDMGEXP == "B")] <- 9
# Convert units of measure - PROPDMGEXP and CROPDMGEXP - to numeric, so it can be multiplied to
# PROPDMG and CROPDMG.
evWeather$evWeatherFull.PROPDMGEXP <- as.numeric(evWeather$evWeatherFull.PROPDMGEXP)
evWeather$evWeatherFull.CROPDMGEXP <- as.numeric(evWeather$evWeatherFull.CROPDMGEXP)
# Multiply the units of measure to get actual values for total economic damage.
propertyDamage <- (evWeather$evWeatherFull.PROPDMG * 10^evWeather$evWeatherFull.PROPDMGEXP)
evWeather <- cbind(evWeather, propertyDamage)
cropDamage <- (evWeather$evWeatherFull.CROPDMG * 10^evWeather$evWeatherFull.CROPDMGEXP)
evWeather <- cbind(evWeather, cropDamage)
TOTALDMG <- (evWeather$evWeatherFull.PROPDMG * 10^evWeather$evWeatherFull.PROPDMGEXP) +
(evWeather$evWeatherFull.CROPDMG * 10^evWeather$evWeatherFull.CROPDMGEXP)
evWeather <- cbind(evWeather, TOTALDMG)
Belos is aggregate of FATALITIES, INJURIES, PROPDMG, CROPDMG and TOTALDMG by EVTYPE and sorted descending to get the highest values by EVTYPE.
totalDamage is the sum of PROPDMG and CROPDMG.
totalFatalities <- sort(tapply(evWeather$evWeatherFull.FATALITIES, evWeather$evWeatherFull.EVTYPE, sum),
decreasing = T)
totalInjuries <- sort(tapply(evWeather$evWeatherFull.INJURIES, evWeather$evWeatherFull.EVTYPE, sum),
decreasing = T)
totalPropDamage <- sort(tapply(evWeather$propertyDamage, evWeather$evWeatherFull.EVTYPE, sum),
decreasing = T)
totalCropDamage <- sort(tapply(evWeather$cropDamage, evWeather$evWeatherFull.EVTYPE, sum),
decreasing = T)
totalDamage <- sort(tapply(evWeather$TOTALDMG, evWeather$evWeatherFull.EVTYPE, sum), decreasing = T)
library(ggplot2)
The events resulting in highest fatalities are :
head(format(round(totalFatalities, 0), digits=12, nsmall=0, big.mark=","), 12)
## TORNADO EXCESSIVE HEAT FLASH FLOOD HEAT LIGHTNING
## "5,633" "1,903" " 978" " 937" " 816"
## TSTM WIND FLOOD RIP CURRENT HIGH WIND AVALANCHE
## " 504" " 470" " 368" " 248" " 224"
## WINTER STORM RIP CURRENTS
## " 206" " 204"
events <- as.vector(names(head(totalFatalities, 10)))
damage <- as.vector(head(totalFatalities, 10))
plotData <- data.frame(events, damage)
ggplot(data = plotData, aes(x = plotData$events, y = plotData$damage)) + geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Top Ten Meteorological Events") +
ylab("Total Fatalities") + ggtitle("Ten Highest Total Fatalities by Meteorological Events - 1950-2011")
The events resulting in highest injuries are :
head(format(round(totalInjuries, 0), digits=12, nsmall=0, big.mark=","), 12)
## TORNADO TSTM WIND FLOOD EXCESSIVE HEAT
## "91,346" " 6,957" " 6,789" " 6,525"
## LIGHTNING HEAT ICE STORM FLASH FLOOD
## " 5,230" " 2,100" " 1,975" " 1,777"
## THUNDERSTORM WIND HAIL WINTER STORM HURRICANE/TYPHOON
## " 1,488" " 1,361" " 1,321" " 1,275"
events <- as.vector(names(head(totalInjuries, 10)))
damage <- as.vector(head(totalInjuries, 10))
plotData <- data.frame(events, damage)
ggplot(data = plotData, aes(x = plotData$events, y = plotData$damage)) + geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Top Ten Meteorological Events") +
ylab("Total Injuries") + ggtitle("Ten Highest Total Injuries by Meteorological Events - 1950-2011")
The events resulting in highest property damage are :
head(format(round(totalPropDamage, 0), digits=12, nsmall=0, big.mark=","), 12)
## FLOOD HURRICANE/TYPHOON TORNADO STORM SURGE
## "144,657,709,807" " 69,305,840,000" " 56,947,381,216" " 43,323,536,000"
## FLASH FLOOD HAIL HURRICANE TROPICAL STORM
## " 16,822,673,978" " 15,735,267,513" " 11,868,319,010" " 7,703,890,550"
## WINTER STORM HIGH WIND RIVER FLOOD WILDFIRE
## " 6,688,497,251" " 5,270,046,610" " 5,118,945,500" " 4,765,114,000"
The events resulting in highest crop damage are :
head(format(round(totalCropDamage, 0), digits=12, nsmall=0, big.mark=","), 12)
## DROUGHT FLOOD RIVER FLOOD ICE STORM
## "13,972,566,000" " 5,661,968,450" " 5,029,459,000" " 5,022,113,500"
## HAIL HURRICANE HURRICANE/TYPHOON FLASH FLOOD
## " 3,025,954,473" " 2,741,910,000" " 2,607,872,800" " 1,421,317,100"
## EXTREME COLD FROST/FREEZE HEAVY RAIN TROPICAL STORM
## " 1,292,973,000" " 1,094,086,000" " 733,399,800" " 678,346,000"
The events resulting in highest damage (combined property and crop) are :
head(format(round(totalDamage, 0), digits=12, nsmall=0, big.mark=","), 12)
## FLOOD HURRICANE/TYPHOON TORNADO STORM SURGE
## "150,319,678,257" " 71,913,712,800" " 57,362,334,486" " 43,323,541,000"
## HAIL FLASH FLOOD DROUGHT HURRICANE
## " 18,761,221,986" " 18,243,991,078" " 15,018,672,000" " 14,610,229,010"
## RIVER FLOOD ICE STORM TROPICAL STORM WINTER STORM
## " 10,148,404,500" " 8,967,041,360" " 8,382,236,550" " 6,715,441,251"
events <- as.vector(names(head(totalDamage, 10)))
damage <- as.vector(head(totalDamage/(10^6), 10))
plotData <- data.frame(events, damage)
ggplot(data = plotData, aes(x = plotData$events, y = plotData$damage)) + geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Top Ten Meteorological Events") +
ylab("Total Damage (USD Millions)") + ggtitle("Ten Highest Combined Damage by Meteorological Events - 1950-2011")