Storms are bad. If you can’t stop them, try to live better with them.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The 985 event labels will be be grouped together into 7 Broad Categories. The costs for each Broad Category will be totaled and presented for comparison.
Questions This data analysis addresses the following questions:
1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
The Dataset has columns for “FATALITIES” and “INJURIES”. Those will be the categories for population health.
2. Across the United States, which types of events have the greatest economic consequences?
The Dataset has columns for Property Damage (“PROPDMG”) and Crop Damge (“CROPDMG”). Those will be the categories for population health.
902297 seperate events recorded, asssigned an Event Type. 985 different “Event” Type names.
Broadly speaking the Events can be categorized into 7 categories. Seven is a much more reasonable number compared to 985. Some of the 985 event names are typos or redundancies, evidence of data entry error. The importance of a broad category could be lost if data entry was divided accross categories that should be joined.
Get the data from the internet.
file_loc <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
filename <-'Storm_data.csv.bz2'
download.file(file_loc,filename)
data <- read.csv(filename)
ev_names <- data$EVTYPE
dim(data)
## [1] 902297 37
Most of the columns are useless for this activity. Pair down to just the possibly needed variables.
var_names <- names(data)
names_to_pair <- var_names[c(7,8,22,23,24,25,27)]
# [1] "STATE" "EVTYPE" "MAG" "FATALITIES"
# [5] "INJURIES" "PROPDMG" "CROPDMG"
data <- data[names_to_pair]
902297 seperate events recorded, asssigned an Event Type. 985 different “Event” Type names. Broadly speaking the Events can be categorized into 7 categories. Seven is a much more reasonable number compared to 985. Some of the 985 event names are typos or redundancies, evidence of data entry error. The importance of a broad category could be lost if data entry was divided accross categories that should be joined.
var_names <- names(data)
names_to_pair <- var_names[c(7,8,22,23,24,25,27)]
storm_names <-c("wind", "tstm", "tornado", "tropical", "storm", "torrent", "thunder", "hurric", "blizzard")
big_wind_names <-c("tornado", "hurric", "gustnado", "funnel")
precip_names <-c("hail", "rain", "snow", "shower", "precip", "sleet")
mud_names <- c("mud")
flood_names <-c("flood","surf","surge","current")
temper_names <- c("cold","heat","hot","blizzard","freez","frost")
Add a new Category to data data.frame, based on 7 category names, and “Other” if it doesn’t fit. Use a function, get_tf, to compare different possible names in category.
get_tf <-function(sub_names,ev_names){
tf <- logical(length(ev_names))
for(n in sub_names){
#print(n)
temp <- grepl(n,tolower(ev_names))
tf <-tf + temp
}
tf <- tf >0
}
storm_tf <- get_tf(storm_names,ev_names)
big_wind_tf <- get_tf(big_wind_names,ev_names)
precip_tf <- get_tf(precip_names,ev_names)
mud_tf <- get_tf(mud_names,ev_names)
flood_tf <- get_tf(flood_names,ev_names)
temper_tf <- get_tf(temper_names,ev_names)
categ <- character(length(ev_names))
categ[storm_tf] <-"storm"
categ[big_wind_tf] <- "big_wind"
categ[precip_tf] <- "precip"
categ[mud_tf] <- "mud"
categ[flood_tf] <- "flood"
categ[temper_tf] <- "temper"
categ[!nzchar(categ)] <- "other"
data$Broad_categ <- categ
Split up the data based on the broad categories. A “pairs” plot on the original “data” takes forever, and might be useless. “pairs” plots on the seperate broad categories also take a long time. Save them to file if you wish. Entire “for”loop took more than 45 minutes to make files.
data_sub_categ_split <- split(data,data$Broad_categ)
categ_names <- names(data_sub_categ_split)
#
# for(n in categ_names){
# com <-paste0("pairs(data_sub_categ_split$",n,")")
# plot_name <- paste0("plot_",n,".png")
# png(file=plot_name)
# eval(parse(text = com))
# dev.off()
# }
# pairs(data_sub_categ_split$other)
Now sum INJURIES, FATALITIES, PROPDMG and CROPDMG for the 7 subcategories
FATALITIES <- data.frame(Category = categ_names)
FATALITIES$Total <-0
row.names(FATALITIES) <- categ_names
INJURIES <- FATALITIES
PROPDMG <- FATALITIES
CROPDMG <- FATALITIES
for(n in categ_names){
FATALITIES[n,"Total"] <- sum(data_sub_categ_split[[n]]$FATALITIES)
INJURIES[n,"Total"] <- sum(data_sub_categ_split[[n]]$INJURIES)
PROPDMG[n,"Total"] <- sum(data_sub_categ_split[[n]]$PROPDMG)
CROPDMG[n,"Total"] <- sum(data_sub_categ_split[[n]]$CROPDMG)
}
Create Bar charts for the 4 variables of interest. Point out interesting things in the Analysis section.
library(ggplot2)
install.packages("gridExtra",repos = "http://cran.us.r-project.org")
## package 'gridExtra' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\brian.liswell\AppData\Local\Temp\RtmpMVVetn\downloaded_packages
library(gridExtra)
p<-ggplot(data=FATALITIES, aes(x=Category, y=Total)) +
geom_bar(stat="identity")+ggtitle("FATALITIES")
p2<-ggplot(data=INJURIES, aes(x=Category, y=Total)) +
geom_bar(stat="identity")+ggtitle("INJURIES")
grid.arrange(p,p2,nrow = 1)
Figure 1: Fatalities and Injuries Bar Charts
p3<-ggplot(data=PROPDMG, aes(x=Category, y=Total)) +
geom_bar(stat="identity")+ggtitle("PROPDMG")
p4<-ggplot(data=CROPDMG, aes(x=Category, y=Total)) +
geom_bar(stat="identity")+ggtitle("CROPDMG")
grid.arrange(p3,p4,nrow = 1)
Figure 2:Property Bar Charts
In Figure 1 we see that “Big Wind” type events, like Tornado’s and Hurricanes, are the leading cause of Fatalities, followed by extreme “Temperature” type weather events (with labels such as “Cold”, “Blizzard” and “Hot”).
Meanwhile, in Figure 1, “Big Wind” type events are the clear leading cause of weather induced Injuries.
In Figure 2 we see that Property Damage is caused by various things, such as “Big Wind” type events or “Storms” (wind storm, thunderstorms, troppical storms). I suspect that those two event types are basically the same thing, with just a difference in degree. Individual “Storm” events probably cause less damage than “Big Wind” events, but are probably more frequent.
We also see in Figure 2 that Precipitation events (hail, rain, snow, shower, precip, sleet) are the leading cause of Crop Damage, with Flood damage being a distant second.