##Synopsis

The following code and figures show the most damaging weather events in the U.S. Starting with a download of the data the code sorts the events and provides a summation of fatalities, injuries, property damage and crop damage by each weather event. The figure 1 provide the top ten events for death and injury. Figure 2 provides the top ten weather events relating to property and crop damage by value. Figure 3 plots the distribution of fatalities across the U.S. using the timezones as a proxy for location.

Load libraries and zipped file

The unzipped file was uploaded directly into “R”

library(data.table)
library(formattable)
library(readr)
library(dplyr)
library(ggplot2)

Storm.data <- read.csv("repdata_data_StormData.csv.bz2")

Transformation of the dataset

The raw data was manipulated to reduce the dataset but not change any of the underlying numbers. The number of variables is reduced to include only data required for analysis.

##na.omit(Storm.data)
storm_dmg <- Storm.data[ ,c(1,4,7,8,23:25,27,34)]

Analyse the data for health impacts by event type The dataset was grouped by event and then the sum of deaths, Injuries property and crop damage derived by the code in “R”.

To see if fatalities were evenly distributed throughout the U.S. regions the fatalities were plotted against the timezone information.

plotdat <- storm_dmg %>% group_by(EVTYPE) %>% summarise(sum(FATALITIES))
## `summarise()` ungrouping output (override with `.groups` argument)
plotdat1 <- storm_dmg%>% group_by(EVTYPE) %>% summarise(sum(INJURIES))
## `summarise()` ungrouping output (override with `.groups` argument)
##reorder 
table_1 <- plotdat[order(-plotdat[,2]),]
table_1 <- head(table_1, 50)
table_2 <- plotdat1[order(-plotdat1[,2]),]
table_2 <- head(table_2, 50)

Analyse the data for economic damage

plotdmg <- storm_dmg %>% group_by(EVTYPE) %>% summarise(sum(PROPDMG))
## `summarise()` ungrouping output (override with `.groups` argument)
plotcrp <- storm_dmg%>% group_by(EVTYPE) %>% summarise(sum(CROPDMG))
## `summarise()` ungrouping output (override with `.groups` argument)
##reorder 
table_3 <- plotdmg[order(-plotdmg[,2]),]
table_3 <- head(table_3, 50)
table_4 <- plotcrp[order(-plotcrp[,2]),]
table_4 <- head(table_4, 50)

#merge
mergeCols <- c("EVTYPE")
inner_dplyr <- inner_join(table_1, table_2, by = mergeCols)
plot_1<- inner_dplyr[1:12,]

mergeCols <- c("EVTYPE")
inner_dplyr2 <- inner_join(table_3, table_4, by = mergeCols)
plot_2<- inner_dplyr2[1:12,]

Output

FIGURE 1 EVENTS IMPACTING HEALTH; DEATH AND INJURY ACROSS THE US

formattable(plot_1)
EVTYPE sum(FATALITIES) sum(INJURIES)
TORNADO 5633 91346
EXCESSIVE HEAT 1903 6525
FLASH FLOOD 978 1777
HEAT 937 2100
LIGHTNING 816 5230
TSTM WIND 504 6957
FLOOD 470 6789
RIP CURRENT 368 232
HIGH WIND 248 1137
AVALANCHE 224 170
WINTER STORM 206 1321
RIP CURRENTS 204 297

FIGURE 2 EVENTS CAUSING ECONOMIC DAMAGE:PROPERTY DAMAGE AND CROP DAMAGE IN $ ACROSS THE U.S.

formattable(plot_2)
EVTYPE sum(PROPDMG) sum(CROPDMG)
TORNADO 3212258.16 100018.52
FLASH FLOOD 1420124.59 179200.46
TSTM WIND 1335965.61 109202.60
FLOOD 899938.48 168037.88
THUNDERSTORM WIND 876844.17 66791.45
HAIL 688693.38 579596.28
LIGHTNING 603351.78 3580.61
THUNDERSTORM WINDS 446293.18 18684.93
HIGH WIND 324731.56 17283.21
WINTER STORM 132720.59 1978.99
HEAVY SNOW 122251.99 2165.72
WILDFIRE 84459.34 4364.20

Data Processing cont`d To see if fatalities were evenly distributed throughout the U.S. regions the fatalities were plotted against the timezone information.

Figure 3 DISTRIBUTION OF FATALITIES ACROSS U.S. TIMEZONES

Group by timezone and print a plot

fattz <- group_by(storm_dmg,TIME_ZONE)
data.plot <- summarise(fattz, fatalities1 = sum(FATALITIES)/100)
## `summarise()` ungrouping output (override with `.groups` argument)
g <- ggplot(data.plot, aes(TIME_ZONE, fatalities1)) +
        geom_point()+
        ggtitle("Distribution of weather event fatalities across         the U.S. by Timezone ")+
        labs(x = "Timezone (A-Z)", y = "Fatalities('00's)")

print(g)

Results

From the data we can see that the weather event with the highest negative impact on health and associated economic cost is the Tornado. The distribution of fatalities across the U.S. due to weather events shows the highest concentration is in the Central Standard Time zone.