Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
According to NWS Directive 10-1605, 48 event types are being recorded only from 1996. From 1955 through 1992, only tornado, thunderstorm wind and hail events were keyed from the paper publications into digital data. From 1993 to 1995, only tornado, thunderstorm wind and hail events have been extracted from the Unformatted Text Files. Hence we have only taken data from 1996 to be analysed.
To the questions asked:
1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
According to the current dataset which has been sanitized to 80%, the most harmful events are “TORNADO” followed by “EXCESSIVE HEAT”
2. Across the United States, which types of events have the greatest economic consequences?
According to the current dataset which has been sanitized to 80%, the events of greatest economic consequences are “TORNADO” followed by “EXCESSIVE HEAT”
| Event | Economic Consequences | Event | Total Injuries |
|---|---|---|---|
| HURRICANE/TYPHOON | 86,467,941,810 | TORNADO | 22,178 |
| STORM SURGE | 43,193,541,000 | EXCESSIVE HEAT | 8,188 |
| FLOOD | 34,034,611,950 | FLOOD | 7,172 |
| TORNADO | 24,900,370,720 | THUNDERSTORM WIND | 5,400 |
| HAIL | 17,071,172,870 | LIGHTNING | 4,792 |
| FLASH FLOOD | 16,557,155,610 | FLASH FLOOD | 2,561 |
| DROUGHT | 14,413,667,000 | WINTER STORM | 1,483 |
| THUNDERSTORM WIND | 8,821,057,230 | HEAT | 1,459 |
| TROPICAL STORM | 8,320,186,550 | HURRICANE/TYPHOON | 1,446 |
| HIGH WIND | 5,881,921,660 | HIGH WIND | 1,318 |
The NOAA dataset contains 92 obs. or 37 variables, which primarily include Date/Time of recording,
Pl. do be patient as the resultant data is large.
# Read the CSV File.
NOAA <- read.csv("repdata_data_StormData.csv")
## Convert BGN_DATE to a date format for analysis.
NOAA$ds_bgn_date <- as.POSIXct(NOAA$BGN_DATE,format="%m/%d/%Y %H:%M:%S") # Correct Date
NOAA$ds_bgn_year <- as.integer(as.character(NOAA$ds_bgn_date,format="%Y"))
## Change Name of 1st Column
n1 <- names(NOAA)
n1[1] <- c("STATE1")
names(NOAA) <- n1
rm(n1)
##
# Data Cleaning
# 48 event types are recorded as defined in NWS Directive 10-1605 only from 1996
# Hence consider Data only from 1996.
# Dataset now becomes 201318 obs of 13 variables.
##
NOAA<-subset(NOAA,
(ds_bgn_year>=1996 & (!NOAA$PROPDMG==0 | !NOAA$CROPDMG==0 |
!NOAA$FATALITIES==0 | !NOAA$INJURIES==0)),
select = c("STATE", "BGN_DATE", "BGN_TIME", "TIME_ZONE", "EVTYPE", "FATALITIES", "INJURIES",
"PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP", "REFNUM", "REMARKS",
"ds_bgn_date", "ds_bgn_year"))
NOAA$ds_bgn_year <- as.factor(NOAA$ds_bgn_year)
NOAA$PROPDMGEXP <- as.factor(NOAA$PROPDMGEXP)
NOAA$CROPDMGEXP <- as.factor(NOAA$CROPDMGEXP)
NOAA$EVTYPE <- as.factor(NOAA$EVTYPE)
# NAPA Correction
NOAA$PROPDMGEXP[NOAA$REFNUM==605943] <- "M"
## Build a lookup for Multipler to be used with exponent (PRODDMGEXP & CROPDMGEXP)
## Then compute actual value of Damage
multiplier<-c('K'=10^3, 'k'=10^3, # Thousands
'M'=10^6, 'm'=10^6, # Millions
'B'=10^9, 'b'=10^9, '0'=1, # Billions & Zero
'1'=1, '2'=10^2, '3'=10^3, '4'=10^4, '5'=10^5, # Exponent 1 to 5
'6'=10^6, '7'=10^7, '8'=10^8, '9'=10^9, '10'=10^10, # Exponent 6 to 10
'?'=10^0, '+'=10^0, '-'=10^0) # Unknowns = 1
NOAA$proploss <- NOAA$PROPDMG * ifelse(NOAA$PROPDMGEXP=="",0,
(multiplier[as.character(NOAA$PROPDMGEXP)]))
NOAA$croploss <- NOAA$CROPDMG * ifelse(NOAA$CROPDMGEXP=="",0,
(multiplier[as.character(NOAA$CROPDMGEXP)]))
NOAA$ttlloss <- NOAA$proploss + NOAA$croploss
rm(multiplier)
#
# Cleanup EVTYPE ; Several More Cleanups can possibly be done.
#
NOAA$eventType <- toupper(as.character(NOAA$EVTYPE)) # Convert all Events to Upper Case
NOAA$eventType <- gsub("^ *","",NOAA$eventType) # Remove Leading Spaces
NOAA$eventType <- gsub("S$","",NOAA$eventType) # Remove Trailing "S"'s
NOAA$eventType <- gsub("COASTALSTORM","COASTAL STORM",NOAA$eventType)
NOAA$eventType <- gsub("ICE ROAD","ICE ON ROAD",NOAA$eventType)
NOAA$eventType <- gsub("ICY ROAD","ICE ON ROAD",NOAA$eventType)
NOAA$eventType <- gsub("LAKE-EFFECT SNOW","LAKE EFFECT SNOW",NOAA$eventType)
NOAA$eventType <- gsub("LIGHT SNOW","LIGHT SNOWFALL",NOAA$eventType)
NOAA$eventType <- gsub("MARINE TSTM WIND","MARINE THUNDERSTORM WIND",NOAA$eventType)
NOAA$eventType <- gsub("MIXED PRECIP","MIXED PRECIPITATION",NOAA$eventType)
NOAA$eventType <- gsub("MUDSLIDE","MUD SLIDE",NOAA$eventType)
NOAA$eventType <- gsub("NON-TSTM WIND","NON TSTM WIND",NOAA$eventType)
NOAA$eventType <- gsub("NON-THUNDERSTORM WIND","NON THUNDERSTORM WIND",NOAA$eventType)
NOAA$eventType <- gsub("TSTM WIND","THUNDERSTORM WIND",NOAA$eventType)
NOAA$eventType <- gsub("THUNDERSTORM WIND 40","THUNDERSTORM WIND (G40)",NOAA$eventType)
NOAA$eventType <- gsub("THUNDERSTORM WIND 45","THUNDERSTORM WIND (G45)",NOAA$eventType)
NOAA$eventType <- gsub("THUNDERSTORM WIND G45","THUNDERSTORM WIND (G45)",NOAA$eventType)
NOAA$eventType <- gsub("WINTER WEATHER/MIX","WINTER WEATHER MIX",NOAA$eventType)
## Merging Hurricane/Typhoon with Hurricane on review of 1st Graph
NOAA$eventType <- gsub("^HURRICANE$","HURRICANE/TYPHOON",NOAA$eventType)
NOAA$eventType <- as.factor(NOAA$eventType)
# Write Tid Data Set
NOAA_PRINT <- NOAA[,c( "STATE1", "BGN_DATE", "BGN_TIME", "TIME_ZONE", "COUNTY", "COUNTYNAME",
"STATE", "EVTYPE", "BGN_RANGE", "BGN_AZI", "BGN_LOCATI", "END_DATE",
"END_TIME", "COUNTY_END", "COUNTYENDN", "END_RANGE", "END_AZI",
"END_LOCATI", "LENGTH", "WIDTH", "F", "MAG", "FATALITIES", "INJURIES",
"PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP", "WFO", "STATEOFFIC",
"ZONENAMES", "LATITUDE", "LONGITUDE", "LATITUDE_E", "LONGITUDE_", "REFNUM")]
write.csv(NOAA_PRINT,"storm_data_with_no_remarks.csv")
rm(NOAA_PRINT)
# End Write Tidy Data Set
library(plyr)
library(lattice)
# Q1. Across the United States, which types of events (as indicated in the EVTYPE variable)
# are most harmful with respect to population health?
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
# get Worst Events by Population Health (FATALITIES / INJURIES)
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
health_loss_ttl <- aggregate((FATALITIES+INJURIES) ~ eventType, data = NOAA, sum)
names(health_loss_ttl) <- c("eventType","Total_Injuries")
health_loss_ttl <- health_loss_ttl[order(-health_loss_ttl[,"Total_Injuries"]), ][1:10,] # top 10 mostCostly
health_loss_ttl$eventType <- as.character(health_loss_ttl$eventType)
health_loss_ttl$eventType <- as.factor(health_loss_ttl$eventType ) # refactor the eventType
# Now subset data again for biggest disasters.
NOAA_health_ttl<-subset(NOAA,
(NOAA$eventType %in% health_loss_ttl$eventType),
select = c("STATE", "BGN_DATE", "BGN_TIME", "TIME_ZONE",
"EVTYPE", "FATALITIES", "INJURIES",
"PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP",
"REFNUM", "REMARKS",
"ds_bgn_date", "ds_bgn_year", "proploss", "croploss", "ttlloss", "eventType"))
NOAA_health_ttl$eventType <- as.character(NOAA_health_ttl$eventType)
NOAA_health_ttl$eventType <- as.factor(NOAA_health_ttl$eventType)
health <- ddply(NOAA_health_ttl,
.(ds_bgn_year, eventType),
summarize,
ttl_loss = sum((FATALITIES+INJURIES), na.rm = T))
xyplot((ttl_loss) ~ ds_bgn_year | eventType, health,
type = "p",
layout = c(2, 5),
auto.key = list(space = "right"),
ylab = "Loss of Life",
xlab = "Years",
main = "Most harmful Events w/ respect to population health")
#
# Q2. Across the United States, which types of events have the greatest economic consequences?
# -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
# get Worst Events by Total Loss (Propoerty + Crop) Loss
#
event_loss_ttl <- aggregate(ttlloss ~ eventType,
data = NOAA, sum) # aggregate ttlDamage by eventType only
event_loss_ttl <- event_loss_ttl[order(-event_loss_ttl[,"ttlloss"]), ][1:10,] # top 10 mostCostly
event_loss_ttl$eventType <- as.character(event_loss_ttl$eventType)
event_loss_ttl$eventType <- factor(event_loss_ttl$eventType) # re-level the eventType
# Now subset data again for biggest disasters.
NOAA_loss_ttl<-subset(NOAA,
(NOAA$eventType %in% event_loss_ttl$eventType),
select = c("STATE", "BGN_DATE", "BGN_TIME", "TIME_ZONE",
"EVTYPE", "FATALITIES", "INJURIES",
"PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP",
"REFNUM", "REMARKS", "ds_bgn_date", "ds_bgn_year",
"proploss", "croploss", "ttlloss", "eventType"))
NOAA_loss_ttl$eventType <- as.character(NOAA_loss_ttl$eventType)
NOAA_loss_ttl$eventType <- as.factor(NOAA_loss_ttl$eventType)
t2 <- ddply(NOAA_loss_ttl,
.(ds_bgn_year, eventType),
summarize,
ttl_loss = sum(ttlloss, na.rm = T))
xyplot((ttl_loss/10^9) ~ ds_bgn_year | eventType, t1,
type = "p",
layout = c(2, 5),
auto.key = list(space = "right"),
ylab = "Loss (in Billions)",
xlab = "Years",
main = "Total Loss (Property+Crop)")
The most harmful events are “TORNADO” followed by “EXCESSIVE HEAT”. The events of greatest economic consequences are “TORNADO” followed by “EXCESSIVE HEAT”
However Caution needs to be exercised after seeing the graphs, which indicate that some data is still missing. Hurricane/Typoon data seems to be missing for years 2006 & 2010. Heat Data is missing for 1996, 1999, 2002-2005.
However Caution needs to be exercised after seeing the graphs, which indicate that some data is still missing. StormSurge data seems to be missing for years 2007-2011, Hurricane data is missing for 2006 & 2010.