Analysis on Weather and Economic Impact

NOTICE

Analysis on Weather and Economic Impact
Copyright © 2017 Michael Garcia. All Rights Reserved.
Inquiries: mgar_datascience at protonmail.com

This program is distributed  WITHOUT ANY WARRANTY; without even the 
implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR 
PURPOSE.

"ggplot2"" is used under GNU GENERAL PUBLIC LICENSE Version 2 license.
See https://cran.r-project.org/web/licenses/GPL-2 for full text.
H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009.

Main Report Section

Synopsis

This report utilizes NOAA storm weather data to explain weather patterns. The weather in research are disastrous conditions, and the attributes of the weather. The analysis will also seek to answer cost and damage to the economy and people. The analysis reveals that the most frequent events are not necessarily the most damaging. Events that are most economically costly, are not the events that cause the most damage to population health.

Tornadoe, Heat, and Flash Floods are the top 3 causes of most fatalities. Hail, thunderstorms windws and tornadoes are the top 3 most frequent. Flood, Hurricane, and tornadoes are the top 3 most costly events.

Data Processing

The data was fetched from NOAA (see Citation for url). The source to pull process and retain the data can be see in the code below. The code for data tables and graphcs is provided within the R Chunks. There are a few values within my statements that are inline R chunks. The RMD file will show which sentences use inline calculations.

### Week 1 Project ###
#library(data.table)
webURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destURL <- file.path(getwd(),paste("repdata_data_StormData.csv",".bz2",sep = ""))
download.file(webURL,destfile = destURL)
trying URL 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
Content type 'application/bzip2' length 49177144 bytes (46.9 MB)
==================================================
downloaded 46.9 MB
#unzip(destURL)
#localFileURL <- file.path(getwd(),paste("repdata_data_StormData",".csv",sep = ""))
repdata <- read.csv(destURL)

Results

Summary of Each Variable.

Events most harmful to population health

This chunk transforms and prepares the data to sort by the frequency or quantity of events.

eventFreq <- aggregate(repdata$EVTYPE, list(repdata$EVTYPE), length)
names(eventFreq) <- c("EVTYPE","Qty")
eventFatal <- aggregate(repdata$FATALITIES, list(repdata$EVTYPE), sum)
names(eventFatal) <- c("EVTYPE","FATALITIES")
eventsInjur <- aggregate(repdata$INJURIES, list(repdata$EVTYPE), sum)
names(eventsInjur) <- c("EVTYPE","INJURIES")
#a <- eventFreq[order(eventFreq$Qty, decreasing = TRUE),]
eventDamag <- merge(eventFreq,eventFatal, by.x = "EVTYPE", all = TRUE)
eventDamag <- merge(eventDamag, eventsInjur, by.x = "EVTYPE", all = TRUE)
eventDamag <- eventDamag[order(eventDamag$"Qty", decreasing = TRUE),]
eventDamag
#plot(eventFreq$EVTYPE,eventFreq$Qty,type ="h")

This table shows the event types ordered by frequency. This depicts that the most frequent types HAIL at 288661 and WND at 1 are not the most frequent.

This table sorts all the events by the total deaths by event type.

ftl <- eventDamag[order(eventDamag$FATALITIES, decreasing = TRUE),]
ftl

TORNADO event is the top event causing fatalities at 5633 fatalities. WND is the event with the least number of fatalities at 0

This table displayes the top ten events that are most frequent. HAIL is the event that occurs the most at 288661 number of events. HEAVY SNOW is the least top frequent occuring event coming at 15708 number of events.

Events have the greatest economic consequences

This chunk tansforms the abbreviated Property Damage Cost and Crop Damage Cost to full dollar amounts. It uses the given Letter for how many zeros it will use. This then displays a table sorted by Total Damage Cost from Greatest to Least. This data frame is used farther down in code snippers.

#a <- head(eventDamag[order(eventDamag$"Qty", decreasing = TRUE),],5)
#plot(a$EVTYPE, as.numeric(a$Qty/1000), type = "h")
#   “K” for thousands, “M” for millions, and “B” for billions
#PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
      # totaldmg doesnt work since the exp is different for prop and crop
##totalDmg <- aggregate((repdata$PROPDMG+repdata$CROPDMG), list(repdata$EVTYPE,repdata$PROPDMGEXP), sum)
##colnames(totalDmg) <- c("EVTYPE","TOTALEXP","TOTALDMG")
# This chunk transforms the damage values into 
bil <- 1000000000
mil <- 1000000
thous <- 1000
repDataConv <- repdata
propDmg <- aggregate((ifelse(repdata$PROPDMGEXP == "B",repdata$PROPDMG*bil
                             ,ifelse(repdata$PROPDMGEXP == "M",repdata$PROPDMG*mil
                                     , ifelse(repdata$PROPDMGEXP == "K"
                                              ,repdata$PROPDMG*thous
                                              ,0
                                              )
                                     )
                          )
                      )
                             , list(repdata$EVTYPE
                                    #,repdata$PROPDMGEXP
                                    ), sum)
#colnames(propDmg) <- c("EVTYPE","EXP","PROPDMG")
colnames(propDmg) <- c("EVTYPE","PROPDMG")
cropDmg <- aggregate((ifelse(repdata$CROPDMGEXP == "B",repdata$CROPDMG*bil
                             ,ifelse(repdata$CROPDMGEXP == "M",repdata$CROPDMG*mil
                                     , ifelse(repdata$CROPDMGEXP == "K",repdata$CROPDMG*thous
                                              ,0
                                              )
                                     )
                          )
                      ), list(repdata$EVTYPE
                              #,repdata$CROPDMGEXP
                              ), sum)
#colnames(cropDmg) <- c("EVTYPE","EXP","CROPDMG")
colnames(cropDmg) <- c("EVTYPE","CROPDMG")
eventDamagProp <- merge(eventDamag, propDmg,by.x = "EVTYPE", all = TRUE)
eventDmgPrp <- merge(eventDamagProp, cropDmg,by.x = "EVTYPE", all = FALSE)
                                                  #c("EVTYPE","EXP")
eventDmgPrp$TOTLDMG <- (eventDmgPrp$PROPDMG + eventDmgPrp$CROPDMG)
# this is the Millions
 #a <- eventDmgPrp[which(eventDmgPrp$EXP == "B"),]
# head(a[order(a$TOTLDMG, decreasing = TRUE),],10)
 #head(eventDmgPrp[order(eventDmgPrp$TOTLDMG, decreasing = TRUE),],10)
eventDmgPrp$PROPDMG <- round(eventDmgPrp$PROPDMG/1000,1)
eventDmgPrp$CROPDMG <- round(eventDmgPrp$CROPDMG/1000,1)
eventDmgPrp$TOTLDMG <- round(eventDmgPrp$TOTLDMG/1000,1)
eventDmgPrp[order(eventDmgPrp$TOTLDMG, decreasing = TRUE),]

This table is sorted by the total property damage cost from most to least and represented in thousands (’000). The values were converted to full dollar amount based on the EXP value.

This chart shows the top 10 most economically damaging events, from most to least. The Flood is the most costly and the Ice Storm is the least.

This chunk aggregates the results for plotting by injury and fatalities into a panel plot. The final data frame in this chunk is used for the codes in time plots further down.

Results

This chunk prepares the data to panel plot Fatalities and Injuries by Month and Year.

BGN_DATEP <- as.POSIXct(unlist(tmEventFtlInj$BGN_DATE), tz = "GMT")
BGN_MONTH <- strftime(BGN_DATEP, "%m")
BGN_YEAR <- strftime(BGN_DATEP, "%Y")
BGN_DATES <- data.frame(BGN_MONTH,BGN_YEAR,BGN_DATEP)
tmeventSeries <- cbind(BGN_DATES ,tmEventFtlInj)
tmSeriesAgF <- aggregate(FATALITIES~ BGN_MONTH+ BGN_YEAR, tmeventSeries, FUN = sum)
tmSeriesAgI <- aggregate(INJURIES~ BGN_MONTH+ BGN_YEAR, tmeventSeries, FUN = sum)
tmSeriesAg <- merge(tmSeriesAgF,tmSeriesAgI, by = c("BGN_MONTH", "BGN_YEAR"))
tmSeriesAg$MOYR <- as.POSIXct(paste(tmSeriesAg$BGN_YEAR, tmSeriesAg$BGN_MONTH, "01", sep = "-"))
#plot(tmSeriesAg$MOYR[order(tmSeriesAg$MOYR)], tmSeriesAg$INJURIES, col = "green", type = "l",pch = 20
#       , xlab = "Event Date", ylab = "Qty of People")
# lines(tmSeriesAg$MOYR[order(tmSeriesAg$MOYR)],tmSeriesAg$FATALITIES, col = "red",type = "l", pch = 20)
 
 #par(mfrow=c(1,2), mai = c(.5, .5, 0.2, 0.1))
 par(mfrow=c(2,1), mai = c(0.80, 0.80, 0.2, 0.1))
 plot(tmSeriesAg$MOYR[order(tmSeriesAg$MOYR)], tmSeriesAg$INJURIES, col = "dodger blue", type = "l",pch = 20
       , xlab = "", ylab = "Number Injuries")
 plot(tmSeriesAg$MOYR[order(tmSeriesAg$MOYR)], tmSeriesAg$FATALITIES, col = "green", type = "l",pch = 20
       , xlab = "Event Date", ylab = "Number Fatalities")
 #plot(tmSeriesAg$MOYR[order(tmSeriesAg$MOYR)], tmSeriesAg$INJURIES, col = "green", type = "l",pch = 20
#       , xlab = "Event Date", ylab = "Injuries and Fatalities")
# lines(tmSeriesAg$MOYR[order(tmSeriesAg$MOYR)],tmSeriesAg$FATALITIES, col = "red",type = "l", pch = 20)
 
 mtext("Fatalities and Injuries", side=3, outer=TRUE, line=-1, col = "blue",font = 2) 

This plot presents top 10 costly events.

pl <- head(eventDmgPrp[order(eventDmgPrp$TOTLDMG, decreasing = TRUE),],10)
pl$TOTLDMG <- as.numeric(pl$TOTLDMG)
pl$EVTYPE <- factor(pl$EVTYPE)
#plot(pl$EVTYPE,(pl$TOTLDMG/1000), type = "h", xlab = "EVTYPE", ylab = "Total Damage Amount")
library(ggplot2)
 ggplot(pl, aes(x=reorder(EVTYPE, -TOTLDMG), y=format(TOTLDMG, scientific = FALSE), group=1)) +
          ggtitle("Top 10 Costly Events")+
          geom_bar(stat = "identity", fill = "#0099FF")+ 
            theme(axis.text.x = element_text(angle = 45, hjust = 1)
                               ,axis.title.x = element_text(face="bold", size=12)
                               ,legend.position="none")+ 
                                    labs(x="Event Type",y="Damage Amount (000's)") 

                                    
# plot(activityWkend$interval, activityWkend$avgSteps, type = "l",xlab = "interval", ylab = "WkEnd Avg Steps")

This plot displays the top 10 occuring events

dm <- head(eventDamag[order(eventDamag$Qty, decreasing = TRUE),],10)
dm$Qty <- as.numeric(dm$Qty)
dm$EVTYPE <- factor(dm$EVTYPE)
#plot(pl$EVTYPE,(pl$TOTLDMG/1000), type = "h", xlab = "EVTYPE", ylab = "Total Damage Amount")
library(ggplot2)
ggplot(dm, aes(x=reorder(EVTYPE, -Qty), y=format(Qty, scientific = FALSE), group=1)) + 
            ggtitle("Top 10 Frequent Events")+
            geom_bar(stat = "identity", fill = "#0099FF")+ 
            theme(axis.text.x = element_text(angle = 45, hjust = 1)
                               ,axis.title.x = element_text(face="bold", size=12)
                               ,legend.position="none")+
                                    labs(x="Event Type",y="Event Frequency") 

This depicts that the most frequent types HAIL at 288661 and WND at 1 are not the most frequent.

TORNADO event is the top event causing fatalities at 5633 fatalities. WND is the event with the least number of fatalities at 0

This table displayes the top ten events that are most frequent. HAIL is the event that occurs the most at 288661 number of events. HEAVY SNOW is the least top frequent occuring event coming at 15708 number of events.

Source Citation and Resource:

NATIONAL WEATHER SERVICE INSTRUCTION  10-1605 
AUGUST 17, 2007 
Operations and Services 
Performance, NWSPD 10-16
STORM DATA PREPARATION


Storm Data Documentation: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf
FAQ: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf

