The U.S. National Oceaninc and Atmospheric Administration’s (NOAA) storm database contains record of many characteristics from severe weather events that have been observed accross the United State’s from 1950 - Novemeber 2011. In my analysis, I used this database to formulate conclussions derived from two simple questions: 1)Across the U.S., which types of events are most harmful with respect to human health? 2)Across the U.S., which types of events have the greatest economic consequences? To answer these questions I analized the fatalities, injuries, property damage, and crop damage recorded by the NOAA.
The first step is downloading the file containing the NOAA data and saving to my working directory.
if(!"Raw_Data.bz2" %in% dir()){
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
"Raw_Data.bz2")
}
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
I then read in the data from the report.
weather_data <- read.csv("Raw_Data.bz2")
Now, to answer the first question I will construct a for loop based on all of the different weather events provided in the original data set. For each weather event the loop will subset the data to one specific event class and then sum the fatalities and injuries recorded for thos events. These totals are then written into a data frame next to the name of the event class.
Event_sums <- data.frame("Event"=NA, "Fatalities"=NA,"Injuries"=NA)
count <- 0
for(event in unique(weather_data$EVTYPE)){
count <- count + 1
temp_key <- weather_data$EVTYPE == event
temp_data <- weather_data[temp_key,]
fat_sum <- sum(temp_data$FATALITIES)
inj_sum <- sum(temp_data$INJURIES)
Event_sums[count,] <- c(event, fat_sum, inj_sum)
Event_sums[,2] <- as.numeric(Event_sums[,2])
Event_sums[,3] <- as.numeric(Event_sums[,3])
}
I will use this new data frame to determine an answer to the first question.
Next, to answer the second question I will first need to combine the Prop/crop damage with the factor that is included in the original data. This will allow me to determine the entire value of the damge for each instance.
exp_key <- data.frame("0"=1, "1"=10, "2"=100, "3"=10^3, "4"=10^4, "5"=10^5, "6"=10^6, "7"=10^7, "8"=10^8, "9"=10^9, "H"=100, "K"=1000, "M"=10^6, "B"=10^9)
exp_lbl<- c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "H", "K", "M", "B")
I can now use these elements to remove factors that are meaningless, or have no value.
remove_na <- as.character(weather_data$PROPDMGEXP) %in% exp_lbl
weather_dmg <- weather_data[remove_na,]
remove_na <- as.character(weather_data$CROPDMGEXP) %in% exp_lbl
weather_dmg <- weather_dmg[remove_na,]
Finally, for this analysis I will again use a for loop to reduce the data to one specific event type and then calculate the total costs of property and crop damages resulting from the event type. In the final for loop I utilize the “exp_key” variable from above to convert the listed factor into the proper multiple to obtain the correct damage amount.
Event_dmgs <- data.frame("Event"=NA, "Property_Damages"=NA,"Crop_Damage"=NA)
count <- 0
for(event in unique(weather_data$EVTYPE)){
count <- count + 1
temp_key <- weather_dmg$EVTYPE == event
temp_data <- weather_dmg[temp_key,]
temp_data <- temp_data[!is.na(temp_data[,25]),]
prop_sum <- 0
crop_sum <- 0
if(!is.na(temp_data[1,1])){
for(n in 1:nrow(temp_data)){
prop_sum <- prop_sum + weather_dmg[n , "PROPDMG"] * exp_key[ , as.character(temp_data[n , "PROPDMGEXP"])]
crop_sum <- crop_sum + weather_dmg[n, "CROPDMG"]*exp_key[,as.character(temp_data[n, "CROPDMGEXP"])]
}
}
Event_dmgs[count,] <- c(event, prop_sum, crop_sum)
}
bad<-is.na(Event_dmgs[,2])
Event_dmgs<-Event_dmgs[!bad,]
To interpret the results of my analysis I first take the data frames I hvae created and arrange them so the largest fatalitites and injuries are at the top, which will allow us to see which event causes the greatest number oif these occurances.
fat <- arrange(Event_sums, desc(Fatalities))
inj <- arrange(Event_sums, desc(Injuries))
From this analysis I determined that TORNADO causes the most fatalities and TORNADO causes the most injuries. I also included a graph illustrating the top 15 events for each of these catagories.
par(mfrow = c(2,1),mar=c(3,13,1,1),las=2)
barplot(fat[1:15,"Fatalities"], names.arg = fat[1:15,"Event"], horiz = TRUE, main = "Fatalities")
barplot(inj[1:15,"Injuries"], names.arg = inj[1:15,"Event"], horiz = TRUE, main = "Injuries")
It is also important to look at which events cause large numbers of combined fatalities and injuries which is illustrated below.
sum<- data.frame("Event" = Event_sums$Event, "total"= Event_sums$Fatalities+ Event_sums$Injuries)
sum <- arrange(sum, desc(total))
par(mfrow = c(1,1),mar=c(3,13,1,1),las=2)
barplot(sum[1:15,2], names.arg = sum[1:15,"Event"], horiz = TRUE, main = "Injuries & Fatalities")
And as you can see, TORNADO is the most devestating.
Finally, I combined the total costs for each event and plotted the results.
Event_dmgs[,2]<- as.numeric(Event_dmgs[,2])
Event_dmgs[,3]<- as.numeric(Event_dmgs[,3])
prop <- arrange(Event_dmgs, desc(Property_Damages))
crop <- arrange(Event_dmgs, desc(Crop_Damage))
sum <- data.frame("Event" = Event_dmgs$Event, "total"= Event_dmgs$Property_Damages+ Event_dmgs$Crop_Damage)
sum <- arrange(sum, desc(total))
par(mfrow = c(1,1),mar=c(3,13,1,1),las=2)
barplot(sum[1:15,2], names.arg = sum[1:15,"Event"], horiz = TRUE, main = "Total Damage Cost")
This illustrates that FLASH FLOOD has caused the most expensive damages.