##Synopsis
The following code and figures show the most damaging weather events in the U.S. Starting with a download of the data the code sorts the events and provides a summation of fatalities, injuries, property damage and crop damage by each weather event. The figure 1 provide the top ten events for death and injury. Figure 2 provides the top ten weather events relating to property and crop damage by value. Figure 3 plots the distribution of fatalities across the U.S. using the timezones as a proxy for location.
The unzipped file was uploaded directly into “R”
library(data.table)
library(formattable)
library(readr)
library(dplyr)
library(ggplot2)
Storm.data <- read.csv("repdata_data_StormData.csv.bz2")
The raw data was manipulated to reduce the dataset but not change any of the underlying numbers. The number of variables is reduced to include only data required for analysis.
##na.omit(Storm.data)
storm_dmg <- Storm.data[ ,c(1,4,7,8,23:25,27,34)]
Analyse the data for health impacts by event type The dataset was grouped by event and then the sum of deaths, Injuries property and crop damage derived by the code in “R”.
To see if fatalities were evenly distributed throughout the U.S. regions the fatalities were plotted against the timezone information.
plotdat <- storm_dmg %>% group_by(EVTYPE) %>% summarise(sum(FATALITIES))
## `summarise()` ungrouping output (override with `.groups` argument)
plotdat1 <- storm_dmg%>% group_by(EVTYPE) %>% summarise(sum(INJURIES))
## `summarise()` ungrouping output (override with `.groups` argument)
##reorder
table_1 <- plotdat[order(-plotdat[,2]),]
table_1 <- head(table_1, 50)
table_2 <- plotdat1[order(-plotdat1[,2]),]
table_2 <- head(table_2, 50)
Analyse the data for economic damage
plotdmg <- storm_dmg %>% group_by(EVTYPE) %>% summarise(sum(PROPDMG))
## `summarise()` ungrouping output (override with `.groups` argument)
plotcrp <- storm_dmg%>% group_by(EVTYPE) %>% summarise(sum(CROPDMG))
## `summarise()` ungrouping output (override with `.groups` argument)
##reorder
table_3 <- plotdmg[order(-plotdmg[,2]),]
table_3 <- head(table_3, 50)
table_4 <- plotcrp[order(-plotcrp[,2]),]
table_4 <- head(table_4, 50)
#merge
mergeCols <- c("EVTYPE")
inner_dplyr <- inner_join(table_1, table_2, by = mergeCols)
plot_1<- inner_dplyr[1:12,]
mergeCols <- c("EVTYPE")
inner_dplyr2 <- inner_join(table_3, table_4, by = mergeCols)
plot_2<- inner_dplyr2[1:12,]
FIGURE 1 EVENTS IMPACTING HEALTH; DEATH AND INJURY ACROSS THE US
formattable(plot_1)
| EVTYPE | sum(FATALITIES) | sum(INJURIES) |
|---|---|---|
| TORNADO | 5633 | 91346 |
| EXCESSIVE HEAT | 1903 | 6525 |
| FLASH FLOOD | 978 | 1777 |
| HEAT | 937 | 2100 |
| LIGHTNING | 816 | 5230 |
| TSTM WIND | 504 | 6957 |
| FLOOD | 470 | 6789 |
| RIP CURRENT | 368 | 232 |
| HIGH WIND | 248 | 1137 |
| AVALANCHE | 224 | 170 |
| WINTER STORM | 206 | 1321 |
| RIP CURRENTS | 204 | 297 |
FIGURE 2 EVENTS CAUSING ECONOMIC DAMAGE:PROPERTY DAMAGE AND CROP DAMAGE IN $ ACROSS THE U.S.
formattable(plot_2)
| EVTYPE | sum(PROPDMG) | sum(CROPDMG) |
|---|---|---|
| TORNADO | 3212258.16 | 100018.52 |
| FLASH FLOOD | 1420124.59 | 179200.46 |
| TSTM WIND | 1335965.61 | 109202.60 |
| FLOOD | 899938.48 | 168037.88 |
| THUNDERSTORM WIND | 876844.17 | 66791.45 |
| HAIL | 688693.38 | 579596.28 |
| LIGHTNING | 603351.78 | 3580.61 |
| THUNDERSTORM WINDS | 446293.18 | 18684.93 |
| HIGH WIND | 324731.56 | 17283.21 |
| WINTER STORM | 132720.59 | 1978.99 |
| HEAVY SNOW | 122251.99 | 2165.72 |
| WILDFIRE | 84459.34 | 4364.20 |
Data Processing cont`d To see if fatalities were evenly distributed throughout the U.S. regions the fatalities were plotted against the timezone information.
Figure 3 DISTRIBUTION OF FATALITIES ACROSS U.S. TIMEZONES
Group by timezone and print a plot
fattz <- group_by(storm_dmg,TIME_ZONE)
data.plot <- summarise(fattz, fatalities1 = sum(FATALITIES)/100)
## `summarise()` ungrouping output (override with `.groups` argument)
g <- ggplot(data.plot, aes(TIME_ZONE, fatalities1)) +
geom_point()+
ggtitle("Distribution of weather event fatalities across the U.S. by Timezone ")+
labs(x = "Timezone (A-Z)", y = "Fatalities('00's)")
print(g)
From the data we can see that the weather event with the highest negative impact on health and associated economic cost is the Tornado. The distribution of fatalities across the U.S. due to weather events shows the highest concentration is in the Central Standard Time zone.