The purpose of the project was to determine which storm event(s) had the most significant impacts in economic and health.
Severe weather causes property damage, crop damage, injury and even death, bringing severe economy and possible health impact. The purpose of this work is to determine which severe weather types have the greatest economic and health effects. Economic effects were operationalized as the degree of property and crop damage. Health effects were operationalized as number of fatalities and injuries.
The data used thru the study comes from NOAA (National Oceanic and Atmospheric Administration’s) database. The data, as presented isn’t ready for analisys so, trasformations were required to set it on the Tidy data principles.
The report begins with initial data processing followed by a subsequent analysis with the most important results plotted following the assigniment rules.
library(reshape2)
library(plyr)
library(dplyr)
library(ggplot2)
library(knitr)
Data was retrieved via Coursera’s Cloudfont Link and unzipped into the working directory.
if(!file.exists("stormData.csv.bz2")) {
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "stormData.csv.bz2")
}
Dt_all <- read.csv(bzfile("stormData.csv.bz2"), strip.white = TRUE)
Data from and after 1996 have all the 48 event types that are in current use. Thus, this subset of data was considered best for analysis because event types were most evenly distributed.
Before subsetting by date, the date variable was reshaped to reflect just the date stripping off hour information. Due the actual variable naming convension (all Uppercase abbreviated jointted workds) all new variable (columns) were named all in uppercase and as abbreviated as possible.
Also, in some code chunks the output supression option include=FALSE were used to make the report easier to read.
Dt_all$BGN_DATE <- as.Date(Dt_all$BGN_DATE, format = "%m/%d/%Y")
filteredData <- subset(Dt_all, select = c("EVTYPE","FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP", "BGN_DATE", "REMARKS"))
filteredData <- subset(filteredData, filteredData$BGN_DATE >= as.Date("1996-01-01"))
ncnt <- 0
for (colName in colnames(filteredData)) {
ncnt <- as.numeric(sum(is.na(filteredData[,colName])))
if(ncnt > 0) {
message(colName, ":", NAcnt)
}
}
print(ncnt)
## [1] 0
The original data present financial values formated using a letter in order to indicate unit (“K” ( thousands),“M” (millions), and “B” (billions)), the idea is to merge all this data into one variable for all observations and also converting it proporly to a numeric type to enable the propper computation in order to determine damage by monetary value.
lettersToNumbers <- function(x){
ifelse (as.character(x) == "B", as.numeric(10^9),
ifelse(as.character(x) == "M", as.numeric(10^6),
ifelse(as.character(x) == "K", as.numeric(10^3), 0)))
}
filteredData$PROPDMGEXP <- toupper(filteredData$PROPDMGEXP)
filteredData$PROPDMGEXP <- lettersToNumbers(filteredData$PROPDMGEXP)
A new column PROPDMGDOL was created to represent the total amount of property damage in USD. The same calculation for crop damage, using a new variable CROPDMGDOL. As we can see the calculations are quite straight forward.
filteredData$PROPDMGDOL <- as.numeric(filteredData$PROPDMG*filteredData$PROPDMGEXP)
filteredData$CROPDMGEXP <- toupper(filteredData$CROPDMGEXP)
filteredData$CROPDMGEXP <- lettersToNumbers(filteredData$CROPDMGEXP)
filteredData$CROPDMGDOL <- as.numeric(filteredData$CROPDMG*filteredData$CROPDMGEXP)
I looked for individual values that comprised 8% or more of the data. There were two data points that fit this cutoff.
psum <- sum(filteredData$PROPDMGDOL)
prop_outliers <- filteredData[filteredData$PROPDMGDOL/psum > 0.08, ]
#adjusting the number for B... this is far too slow to be good but...
filteredData[filteredData$PROPDMGDOL == 1.15e+11,c("PROPDMGDOL")] <- 70000000
Checking outliers in the Crop (None found)
csum <- sum(filteredData$CROPDMGDOL)
crop_outliers <- filteredData[filteredData$CROPDMGDOL/csum > 0.08, ]
nrow(crop_outliers) > 0
## [1] FALSE
#With all outliers fixed, I calculated the total damage in US dollars and added it to a new variable.
filteredData$TOTALDMGDOL <- filteredData$CROPDMGDOL + filteredData$PROPDMGDOL
This data contained a total of 516 unique event types, which is much higher than the standard 48 event types outlined by NOAA, so to the the majory of events we will convert to the Nooa event list based on the EVTYPE convention reference.
filteredData$EVTYPE <- toupper(filteredData$EVTYPE)
length(unique(filteredData$EVTYPE))
## [1] 438
The list of 48 event list used by the NOAA since 1996.
noaaEvtList <- read.table("48events.txt",
col.names=c("EVTYPE"),
sep="\n",
strip.white=TRUE)
head(noaaEvtList)
## EVTYPE
## 1 Astronomical Low Tide
## 2 Avalanche
## 3 Blizzard
## 4 Coastal Flood
## 5 Cold/Wind Chill
## 6 Debris Flow
Fuction to help clean up the event type data
list_unique <- function(col) {
y <- toupper(noaaEvtList$EVTYPE)
no_std <- setdiff(col,y)
l <- length(no_std)
no_std
}
Understanding and cleaning up the TSTM Nooa event types and it’s frequency
tstm <- subset(Dt_all,select = c("EVTYPE","STATE"))
tstm <- tstm[grepl("TSTM", tstm$EVTYPE),]
tstm <- as.data.frame(table(tstm$EVTYPE, tstm$STATE))
colnames(tstm) <- c("EVTYPE", "STATE","FREQ")
tstm <- tstm[order(-tstm$FREQ),]
head(tstm)
## EVTYPE STATE FREQ
## 61926 TSTM WIND TX 16713
## 21541 TSTM WIND KS 11162
## 48136 TSTM WIND OK 11053
## 47151 TSTM WIND OH 9005
## 36316 TSTM WIND MO 8627
## 13661 TSTM WIND GA 8558
filteredData$EVTYPE[grepl("MARINE TSTM WIND", filteredData$EVTYPE)]<-"MARINE THUNDERSTORM WIND"
filteredData$EVTYPE[grepl("NON-TSTM WIND|NON TSTM WIND", filteredData$EVTYPE)]<-"STRONG WIND"
filteredData$EVTYPE[grepl("TSTM WIND", filteredData$EVTYPE)]<-"THUNDERSTORM WIND"
head(tstm[tstm$EVTYPE=="TSTM WIND/HAIL",])
## EVTYPE STATE FREQ
## 13678 TSTM WIND/HAIL GA 102
## 12693 TSTM WIND/HAIL FL 100
## 18603 TSTM WIND/HAIL ID 71
## 49138 TSTM WIND/HAIL OR 64
## 7768 TSTM WIND/HAIL CA 63
## 38303 TSTM WIND/HAIL MT 52
To keep things to a certain order, the EVTYPE clean up by matching noaaEvtList in alphabetical order. If an event type matching was non-obvious, remarks column were used to back it.
Consult the remarks to replace four ambiguous Events: -Wind and Wave -Gradient Wind -Whirlwind -Marine Accident
Added them to their appropriate categories below.
unique(filteredData$EVTYPE[grepl(toupper("wind"), filteredData$EVTYPE)])
## [1] "THUNDERSTORM WIND" "HIGH WIND"
## [3] "EXTREME COLD/WIND CHILL" "STRONG WIND"
## [5] "WIND" "WIND DAMAGE"
## [7] "WINDS" "HEAVY RAIN AND WIND"
## [9] "HEAVY RAIN/WIND" "STRONG WINDS"
## [11] "WHIRLWIND" "GUSTY WIND"
## [13] "GRADIENT WIND" "COLD/WIND CHILL"
## [15] "HAIL/WIND" "GUSTY WIND/RAIN"
## [17] "GUSTY WIND/HVY RAIN" "GUSTY WINDS"
## [19] "STRONG WIND GUST" "FLOOD/STRONG WIND"
## [21] "HEAVY SURF AND WIND" "HIGH WINDS"
## [23] "HIGH WIND (G40)" "WAKE LOW WIND"
## [25] "WIND ADVISORY" "GUSTY WIND/HAIL"
## [27] "WIND AND WAVE" " WIND"
## [29] "NON-SEVERE WIND DAMAGE" "THUNDERSTORM WIND (G40)"
## [31] "WIND GUSTS" "GUSTY LAKE WIND"
## [33] "GUSTY THUNDERSTORM WINDS" "MARINE THUNDERSTORM WIND"
## [35] "GUSTY THUNDERSTORM WIND" "MARINE HIGH WIND"
## [37] "MARINE STRONG WIND"
filteredData[filteredData$EVTYPE=="WIND AND WAVE",c("REMARKS")]
## [1] Douglas Lake Marina was hit by waves estimated at 4 to 6 feet and then a wind gust estimated at 45-55 mph. One 300 foot section of the 3-section dock was totally destroyed. 15-20 boats were either damaged or destroyed. Storm survey revealed no tree or structure damage on adjacent or upwind shoreline.
## 436781 Levels: \t \t\t ... Zones 22 and 23 were added to the high wind warning of January 26. Peak winds Sitka 55MPH, Cape Decision 58MPH, and Cape Spencer 64MPH.\n
filteredData[filteredData$EVTYPE=="GRADIENT WIND",c("REMARKS")]
## [1] Strong gradient winds combined with saturated soils to blow down several trees in Jackson and Madison counties during the early morning hours. Wind speeds did not quite meet high wind criteria.
## [2] Gusty west and northwest gradient winds associated with a passing cold front lifted the roof from a pawn shop in Wewoka. Although the winds were sustained at 20 to 25 mph and gusting to only 30-35 mph at the time, the roof was lifted and flipped into an adjacent alley. The roof hit several power lines, cutting power to most of the northern half of Wewoka. \n\nThe roof was poorly constructed, consisting of tin nailed to old wooden beams, which apparently were not well attached to the main structure.\n\n
## [3] Gradient induced winds flipped an 18-wheeler on FM 529 near Rosenberg.\n
## [4] Gradient induced wind blew a tin roof off a patio in Shady Acres Subdivision.\n
## [5] Gradient induced wind blew trees down in Memorial Park.\n
## [6] Gradient induced wind blew trees down in Hempstead.\n
## [7] Strong gradient winds blew powerlines and trees down in Bellville and throughout county.\n
## [8] \nSummary for 4/23/01 gradient wind event:\nA vigorous low pressure system in Wisconsin brought a trailing cold front through Iowa and Illinois. The system brought strong gradient winds to western Illinois. From late morning through sunset, a southwest wind gusted over 40 mph across the area. The highest measured speed in northwest Illinois was 46 mph at the Quad City Airport in Moline. There was no significant damage reported.
## [9] \nSummary for 4/23/01 gradient wind event:\nA vigorous low pressure system in Wisconsin brought a trailing cold front through Iowa and Illinois. During the early morning, around 12 AM CST, law enforcement reported a semi-trailer truck blown over along Interstate 80 in Iowa County. A south wind gusted between 40 and 50 mph over a small part of east central Iowa during the early morning, including the Washington, Cedar Rapids, and Iowa City areas. The area of strong winds moved north of the area by 1 AM CST. Then from late morning through sunset, a southwest wind gusted over 40 mph across eastern Iowa. The highest measured speed was 48 mph at the Clinton Municipal Airport. No signficant damage was reported.
## 436781 Levels: \t \t\t ... Zones 22 and 23 were added to the high wind warning of January 26. Peak winds Sitka 55MPH, Cape Decision 58MPH, and Cape Spencer 64MPH.\n
filteredData[filteredData$EVTYPE=="WHIRLWIND",c("REMARKS")]
## [1] A sky diver inadvertently crossed paths with a dust devil which aused his parachute to collapse about 30 feet above the ground. The man died from injuries sustained in the resulting fall. M39OU
## [2] Strong winds from a dust devil ripped through a mobile home park in North Las Vegas tearing awnings and shingles from residences and hurling them hundreds of feet through the air.
## [3] A "whirlwind", occurring when a fairly strong sea breeze interacted with mechanical atmospheric mixing under clear, dry conditions, produced considerable damage at a home in the Town 'N' Country section of Hillsborough County. Three cement support poles were lifted from the ground, and a metal rod was reportedly driven through the home's roof.\n
## 436781 Levels: \t \t\t ... Zones 22 and 23 were added to the high wind warning of January 26. Peak winds Sitka 55MPH, Cape Decision 58MPH, and Cape Spencer 64MPH.\n
filteredData[filteredData$EVTYPE=="MARINE ACCIDENT",c("REMARKS")]
## [1] Heavy seas capsized a fishing vessel off of Fern Canyon when it was caught abeam by a large wave. Two crew members were rescued but one drown. M35BO
## 436781 Levels: \t \t\t ... Zones 22 and 23 were added to the high wind warning of January 26. Peak winds Sitka 55MPH, Cape Decision 58MPH, and Cape Spencer 64MPH.\n
filteredData$EVTYPE[grepl("^THUNDERSTORM WIND$|GUSTY THUNDERSTORM WINDS|GUSTY THUNDERSTORM WIND|G40",filteredData$EVTYPE)]<-"THUNDERSTORM WIND"
filteredData$EVTYPE[grepl("WHIRLWIND",filteredData$EVTYPE)]<-"MARINE STRONG WIND"
filteredData$EVTYPE[grepl("MARINE ACCIDENT",filteredData$EVTYPE)]<-"MARINE HIGH WIND"
filteredData$EVTYPE[grepl("SEAS$|ROGUE WAVE",filteredData$EVTYPE)]<-"MARINE THUNDERSTORM WIND"
filteredData$EVTYPE[grepl("HIGH WIND|^WINDS$|GRADIENT WIND|GUSTY WIND|WIND DAMAGE|GUSTY WINDS|HIGH WINDS|HIGH WIND (G40)|WIND ADVISORY|WIND GUSTS|GUSTY LAKE WIND|WIND AND WAVE|DOWNBURST|MICROBURST",filteredData$EVTYPE)]<-"HIGH WIND"
filteredData$EVTYPE[grepl("^WIND$|^ WIND$|STRONG WINDS|STRONG WIND GUST|WND|WAKE LOW WIND",filteredData$EVTYPE)]<-"STRONG WIND"
To identify event types (as indicated in the EVTYPE variable) that are most dangerous economicaly wise, a summarization of the total damage in US Dollars by event type for property damage, crop damage and total damage creating a new data frame to enable each plot.
econ_impact <- summarise(group_by(filteredData, EVTYPE), TOTAL_PROPDMG = sum(PROPDMGDOL), TOTAL_CROPDMG = sum(CROPDMGDOL))
econ_impact <- mutate(econ_impact, TOTAL_ECON_LOSS = TOTAL_PROPDMG + TOTAL_CROPDMG)
arrange(econ_impact, desc(TOTAL_PROPDMG))[1:10, 1:2]
## # A tibble: 10 × 2
## EVTYPE TOTAL_PROPDMG
## <chr> <dbl>
## 1 HURRICANE/TYPHOON 69305840000
## 2 STORM SURGE 43193536000
## 3 FLOOD 29014833550
## 4 TORNADO 24616945710
## 5 FLASH FLOOD 15222203910
## 6 HAIL 14595143420
## 7 HURRICANE 11812819010
## 8 THUNDERSTORM WIND 7913573880
## 9 TROPICAL STORM 7642475550
## 10 HIGH WIND 5252476370
ggplot(data=arrange(econ_impact, desc(TOTAL_ECON_LOSS))[1:10, c(1, 4)],
aes(x=factor(EVTYPE), y=TOTAL_ECON_LOSS)) +
geom_bar(stat="identity") +
xlab("Storm Event") +
ylab("Total Damage (USD)") +
ggtitle("Economic Impact") +
theme_light()
To determine which types of Events are more harmful to the population health, The total amount of injuries and fatalities by event type where agregated. . The top 10 Events with the highest amount of total injuries and fatalities were subsetted and plotted.
fatalities <- aggregate(FATALITIES ~ EVTYPE,filteredData, sum)
injuries <- aggregate(INJURIES ~ EVTYPE,filteredData, sum)
storm_health <- join_all(list(fatalities, injuries), by = "EVTYPE")
storm_health$TOTAL <- storm_health$FATALITIES + storm_health$INJURIES
storm_health <- storm_health[order(-storm_health$TOTAL),][1:10,]
storm_health <- melt(storm_health, id=c("EVTYPE"), measure.vars=c("FATALITIES","INJURIES"))
storm_health$EVTYPE <- as.factor(storm_health$EVTYPE)
ggplot(data=storm_health, aes(EVTYPE, value, fill =variable)) +
geom_bar(stat="identity")+
xlab("Storm Event") +
ylab("Total") +
ggtitle("Injuries and Fatalities by Storm Event") +
theme_light()
1.Hurricanes are the most economical danegerous event.
1.Tornadoes are the most fatal event caused.