In this project we are trying to answer 2 main questions - 1. Which types of events are most harmful to population health? 2. Which types of events have the greatest economic consequences? We are using an extract from the National Oceanic and Atmospheric Administration (NOAA) Storm Database will be used to answer these questions. After the analysis we have found that Tornados have caused the highest number of both fatalities and injuries, and that Floods has caused the highest value of property/crop damage.
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. The data can be downloaded from the following link: Storm Data
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
National Weather Service Storm Data Documentation National Climatic Data Center Storm Events FAQ The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(RColorBrewer)
The following code reads the data into R, selects the required column and then converts the character value of Property Damage exponent into numeric values according to corresponding magnitude
#The data is loaded into R using the read.csv function.
stormdata<- read.csv("repdata_data_StormData.csv.bz2")
storm_selected_data<- stormdata[c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "K"] <- 1000
storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "M"] <- 1000000
storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "" ] <- 1
storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "B"] <- 1000000000
storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "m"] <- 1000000
storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "+"] <- 0
storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "0"] <- 1
storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "5"] <- 100000
storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "6"] <- 1000000
storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "?"] <- 0
storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "4"] <- 10000
storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "2"] <- 100
storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "3"] <- 1000
storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "h"] <- 100
storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "7"] <- 10000000
storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "H"] <- 100
storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "-"] <- 0
storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "1"] <- 10
storm_selected_data$PROPEXP[storm_selected_data$PROPDMGEXP == "8"] <- 100000000
storm_selected_data$CROPEXP[storm_selected_data$CROPDMGEXP == ""] <- 1
storm_selected_data$CROPEXP[storm_selected_data$CROPDMGEXP == "M"] <- 1000000
storm_selected_data$CROPEXP[storm_selected_data$CROPDMGEXP == "K"] <- 1000
storm_selected_data$CROPEXP[storm_selected_data$CROPDMGEXP == "m"] <- 1000000000
storm_selected_data$CROPEXP[storm_selected_data$CROPDMGEXP == "B"] <- 1000000
storm_selected_data$CROPEXP[storm_selected_data$CROPDMGEXP == "?"] <- 0
storm_selected_data$CROPEXP[storm_selected_data$CROPDMGEXP == "0"] <- 1
storm_selected_data$CROPEXP[storm_selected_data$CROPDMGEXP == "k"] <- 1000
storm_selected_data$CROPEXP[storm_selected_data$CROPDMGEXP == "2"] <- 100
The following code finds the number of fatalities and injuries according to each EVTYPE in the selected data and arranges them in decreasing order selecting the top 5.
fatal <- aggregate(FATALITIES ~ EVTYPE, data = storm_selected_data, FUN = sum)
injury <- aggregate(INJURIES ~ EVTYPE, data = storm_selected_data, FUN = sum)
fatal10 <- fatal[order(-fatal$FATALITIES),][1:5, ]
injury10 <- injury[order(-injury$INJURIES),][1:5, ]
The following code draws 2 barplots corresponding to 1st question depicting the most events with most fatalities and injuries
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), las=3,cex = 0.8)
barplot(fatal10$FATALITIES, names.arg=fatal10$EVTYPE, ylim= c(0,8000),
col=brewer.pal(5, "Purples"),ylab="Number of Fatalities",
main=" Top 10 Events with Highest Fatalities")
barplot(injury10$INJURIES, names.arg=injury10$EVTYPE,ylim= c(0,90000),
col=brewer.pal(5, "YlGn"), ylab="Number of Injuries",
main=" Top 10 Events with Highest Injuries")
The following code finds the total property/crop damage according to each EVTYPE in the selected data and arranges them in decreasing order selecting the top 10.
storm_selected_data$PROPDMGVAL <- storm_selected_data$PROPDMG * storm_selected_data$PROPEXP
storm_selected_data$CROPDMGVAL <- storm_selected_data$CROPDMG * storm_selected_data$CROPEXP
storm_selected_data$ALLDMGVAL <- storm_selected_data$PROPDMGVAL + storm_selected_data$CROPDMGVAL
propcropdmg <- aggregate(ALLDMGVAL ~ EVTYPE, data = storm_selected_data, FUN = sum)
propcropdmg10<-propcropdmg[order(-propcropdmg$ALLDMGVAL), ][1:10,]
The following code draws a barplot corresponding to 2nd question depicting the most events with most property/crop damage
par(mfrow = c(1, 1), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), las=3,cex = 0.8, cex.main = 0.9)
barplot((propcropdmg10$ALLDMGVAL)/(1*1000000000),
names.arg=propcropdmg10$EVTYPE,
col=brewer.pal(10, "Set3"),
ylab=" Cost of Property Damage($ billions)",
main="Top 10 Events Causing Highest Property/Crop Damage Value")
After the analysis we have found that Tornados have caused the highest number of both fatalities and injuries, and that Floods has caused the highest value of property/crop damage.