This report explores the NOAA Storm Database and answers some basic questions about severe weather events. We use the database to answer the questions below and show the code used for the entire analysis. The analysis can consist of tables, figures and other summaries. The fllowing packedges were used to support our analysis: dplyr, reshape2, data.table and ggplot2.
The data analysis addresses the following questions:
The results show that more people are killed by tornados than by any other weather event, it also causes more damage to property than any other weather event. Hail causes more damage to crop than any other weather event.
Here we read the data into R using read.csv and save it to “storm”
setwd("~/courses/05_ReproducibleResearch")
storm<-read.csv("repdata-data-StormData.csv.bz2")
We then use group_by, mutate and summrize in dplyr to create a dataset that contains the total of all fatalities and injuries by weather type and then a variable with the sum of all negative health outcomes due to weather. We then use melt from reshape2 to turn this dataset into long form.
require(dplyr)
Health<-select(storm, EVTYPE, FATALITIES, INJURIES)#
Health<-tbl_df(Health)
TotalHealth<-Health %>% group_by(EVTYPE) %>% summarize(FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES)) %>% mutate(Sum=FATALITIES + INJURIES)
TotalHealth <- TotalHealth[order(TotalHealth$FATALITIES, decreasing = TRUE), ]
TotalHealth<- TotalHealth[1:15,]
require(reshape2)
TotalHealth1 <- melt(TotalHealth, id=c("Sum","EVTYPE"))
We then use group_by, mutate and summrize in dplyr to create a dataset that contains the total of all damage by weather type and then a variable with the sum of all negative economic outcomes (sum of both property and crop damage) due to weather. We then use melt from reshape2 to turn this dataset into long form.
require(dplyr)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Damage<-select(storm, EVTYPE, PROPDMG, CROPDMG)
Damage<-tbl_df(Damage)
TotalDamage<-Damage %>% group_by(EVTYPE) %>% summarize(PROPDMG = sum(PROPDMG), CROPDMG = sum(CROPDMG)) %>% mutate(Sum=PROPDMG + CROPDMG)
TotalDamage <- TotalDamage[order(TotalDamage$PROPDMG, decreasing = TRUE), ]
require(data.table)
## Loading required package: data.table
##
## Attaching package: 'data.table'
##
## The following objects are masked from 'package:dplyr':
##
## between, last
setnames(TotalDamage, c("PROPDMG", "CROPDMG"), c("Property", "Crop"))
TotalDamage<- TotalDamage[1:15,]
TotalDamage1 <- melt(TotalDamage, id=c("Sum","EVTYPE"))
Here we see a list of the top 15 negative health outcomes based on weather events in a descending order. Tornados cause most damage.
TotalHealth
## Source: local data frame [15 x 4]
##
## EVTYPE FATALITIES INJURIES Sum
## (fctr) (dbl) (dbl) (dbl)
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 FLASH FLOOD 978 1777 2755
## 4 HEAT 937 2100 3037
## 5 LIGHTNING 816 5230 6046
## 6 TSTM WIND 504 6957 7461
## 7 FLOOD 470 6789 7259
## 8 RIP CURRENT 368 232 600
## 9 HIGH WIND 248 1137 1385
## 10 AVALANCHE 224 170 394
## 11 WINTER STORM 206 1321 1527
## 12 RIP CURRENTS 204 297 501
## 13 HEAT WAVE 172 309 481
## 14 EXTREME COLD 160 231 391
## 15 THUNDERSTORM WIND 133 1488 1621
Here is a Bar Plot of the same results of the top 15 negative health outcomes based on weather events.
require(ggplot2)
## Loading required package: ggplot2
ggplot(TotalHealth1, aes(x=EVTYPE, y=value, fill=variable)) +
geom_bar(position=position_dodge(), stat="identity",
colour="black", # Use black outlines,
size=.1) + # Thinner lines
scale_fill_brewer(palette="Set1")+
xlab("Weather Event") +
ylab("Deaths, Injuries") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
ggtitle("Death & Injuries from Weather Events")
Here we see a list of the top 15 negative economic outcomes based on weather events in a descending order. Again tornados top the list. But floods also cause huge damage.
TotalDamage
## Source: local data frame [15 x 4]
##
## EVTYPE Property Crop Sum
## (fctr) (dbl) (dbl) (dbl)
## 1 TORNADO 3212258.16 100018.52 3312276.68
## 2 FLASH FLOOD 1420124.59 179200.46 1599325.05
## 3 TSTM WIND 1335965.61 109202.60 1445168.21
## 4 FLOOD 899938.48 168037.88 1067976.36
## 5 THUNDERSTORM WIND 876844.17 66791.45 943635.62
## 6 HAIL 688693.38 579596.28 1268289.66
## 7 LIGHTNING 603351.78 3580.61 606932.39
## 8 THUNDERSTORM WINDS 446293.18 18684.93 464978.11
## 9 HIGH WIND 324731.56 17283.21 342014.77
## 10 WINTER STORM 132720.59 1978.99 134699.58
## 11 HEAVY SNOW 122251.99 2165.72 124417.71
## 12 WILDFIRE 84459.34 4364.20 88823.54
## 13 ICE STORM 66000.67 1688.95 67689.62
## 14 STRONG WIND 62993.81 1616.90 64610.71
## 15 HIGH WINDS 55625.00 1759.60 57384.60
Here is a Bar Plot of the same results of the top 15 negative economic outcomes based on weather events.
g<-ggplot(TotalDamage1, aes(x=EVTYPE, y=value, fill=variable))
g <- g +geom_bar(position=position_dodge(), stat="identity",
colour="black", # Use black outlines,
size=.1) # Thinner lines
g <- g+ scale_fill_brewer(palette="Set3")
g <- g + xlab("Weather Event")
g <- g +ylab("Property, Crop Damage")
g <- g +theme(axis.text.x = element_text(angle = 90, hjust = 1))
g <- g + labs(fill="Damage Type")
g <- g +ggtitle("Economic Impact of Weather Events")
g