Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key cocern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The data analysis in this document addresses the following questions: 1. Which types of events are most harmful with respect to population health? 2. Which types of events have the greatest economic consequences?
suppressWarnings(library(dplyr))
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
suppressWarnings(library(ggplot2))
suppressWarnings(library(knitr))
The data for this assignment come in the form of a comma-separated-value file from National Weather Service Storm Data Documentation. The events in the database start in the year 1950 and end in November 2011.
Loading data
data <- read.csv("StormData.csv.bz2")
Subsetting required data
colnames <- c("BGN_DATE","EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
Storm_data <- data.frame(data[, colnames])
Tidying data for eevent types
length(unique(Storm_data$EVTYPE))
## [1] 985
This shows about 985 types of event while the Storm documentation list only 48 major types. Following code groups the major events.
Storm_data$EVTYPE <- toupper(Storm_data$EVTYPE)
Storm_data[which(Storm_data$EVTYPE=="FLOOD" | Storm_data$EVTYPE=="RIVER FLOOD"), "EVTYPE"] <- "FLOOD"
Storm_data[which(Storm_data$EVTYPE=="HURRICANE/TYPHOON" | Storm_data$EVTYPE=="HURRICANE"| Storm_data$EVTYPE=="HURRICANE OPAL"), "EVTYPE"] <- "HURRICANE/TYPHOON"
Storm_data[which(Storm_data$EVTYPE=="RIP CURRENT" | Storm_data$EVTYPE=="RIP CURRENTS"), "EVTYPE"] <- "RIP CURRENT"
Storm_data[which(Storm_data$EVTYPE=="STORM SURGE/TIDE" | Storm_data$EVTYPE=="STORM SURGE"), "EVTYPE"] <- "STORM SURGE/TIDE"
Storm_data[which(Storm_data$EVTYPE=="THUNDERSTORM WINDS" |Storm_data$EVTYPE=="THUNDERTORM WINDS" |Storm_data$EVTYPE=="TSTM WIND (G45)" | Storm_data$EVTYPE=="TSTM WIND"), "EVTYPE"] <- "THUNDERSTORM WIND"
Storm_data[which(Storm_data$EVTYPE=="TORNADOES, TSTM WIND, HAIL"), "EVTYPE"] <- "TORNADO"
The above was done after multiple runs of codes and analysing results so that Top 10 events are grouped together.
Refining data for Damage cost calculation
Storm_data$PROPDMGEXP <- as.character(Storm_data$PROPDMGEXP)
Storm_data$PROPDMGEXP[Storm_data$PROPDMGEXP %in% c("","?","-")] <- "0"
Storm_data$PROPDMGEXP[Storm_data$PROPDMGEXP %in% c("+")] <- "1"
Storm_data$PROPDMGEXP[Storm_data$PROPDMGEXP %in% c("h","H")] <- "100"
Storm_data$PROPDMGEXP[Storm_data$PROPDMGEXP %in% c("k","K")] <- "1000"
Storm_data$PROPDMGEXP[Storm_data$PROPDMGEXP %in% c("m","M")] <- "1000000"
Storm_data$PROPDMGEXP[Storm_data$PROPDMGEXP %in% c("b","B")] <- "1000000000"
Storm_data$PROPDMGEXP <- as.numeric(Storm_data$PROPDMGEXP)
Storm_data$CROPDMGEXP <- as.character(Storm_data$CROPDMGEXP)
Storm_data$CROPDMGEXP[Storm_data$CROPDMGEXP %in% c("","?","-")] <- "0"
Storm_data$CROPDMGEXP[Storm_data$CROPDMGEXP %in% c("+")] <- "1"
Storm_data$CROPDMGEXP[Storm_data$CROPDMGEXP %in% c("h","H")] <- "100"
Storm_data$CROPDMGEXP[Storm_data$CROPDMGEXP %in% c("k","K")] <- "1000"
Storm_data$CROPDMGEXP[Storm_data$CROPDMGEXP %in% c("m","M")] <- "1000000"
Storm_data$CROPDMGEXP[Storm_data$CROPDMGEXP %in% c("b","B")] <- "1000000000"
Storm_data$CROPDMGEXP <- as.numeric(Storm_data$CROPDMGEXP)
injury <- aggregate(INJURIES ~ EVTYPE, data = Storm_data, FUN = sum)
fatality <- aggregate(FATALITIES ~ EVTYPE, data = Storm_data, FUN = sum)
Total_health_data<-merge(injury,fatality,by="EVTYPE")
#sum of total harm
Total_health_data$sum<-Total_health_data$INJURIES+Total_health_data$FATALITIES
Total_health_data <- Total_health_data[order(Total_health_data$sum, decreasing = T), ]
#Top 10 events
Total_health_harm <- Total_health_data[1:10, ]
Total_health_harm
## EVTYPE INJURIES FATALITIES sum
## 751 TORNADO 91346 5658 97004
## 680 THUNDERSTORM WIND 9356 702 10058
## 116 EXCESSIVE HEAT 6525 1903 8428
## 154 FLOOD 6791 472 7263
## 416 LIGHTNING 5230 816 6046
## 243 HEAT 2100 937 3037
## 138 FLASH FLOOD 1777 978 2755
## 385 ICE STORM 1975 89 2064
## 878 WINTER STORM 1321 206 1527
## 370 HURRICANE/TYPHOON 1322 126 1448
From the above result it can be inferred that Tornado is the most harmful for both injuries and fatalities.
Plot of total population health harm
g<- ggplot(Total_health_harm, aes(EVTYPE, sum)) +
labs(title="Total Harm to Population Health") +
xlab("Top 10 population harm events") + ylab("Sum of injuries and fatalities")
plot1<- g + geom_bar(stat="identity",aes(fill = EVTYPE)) + theme(axis.text.x = element_text(angle = 90, hjust = 1))
plot1
Storm_data$PROPDMG <- Storm_data$PROPDMGEXP * Storm_data$PROPDMG
Storm_data$CROPDMG <- Storm_data$CROPDMGEXP * Storm_data$CROPDMG
#sum of total damage
Storm_data$Total_damage_sum<- Storm_data$PROPDMG+Storm_data$CROPDMG
Total_damage <- aggregate(Total_damage_sum ~ EVTYPE, data = Storm_data, FUN = sum)
Total_damage <- Total_damage[order(Total_damage$Total_damage_sum, decreasing = T), ]
#Top 10 events
Total_damage<- Total_damage[1:10, ]
Total_damage
## EVTYPE Total_damage_sum
## 154 FLOOD 160468082750
## 370 HURRICANE/TYPHOON 89715787810
## 751 TORNADO 58954614161
## 595 STORM SURGE/TIDE 47965579000
## 212 HAIL 18758221820
## 138 FLASH FLOOD 17562129187
## 84 DROUGHT 15018672000
## 680 THUNDERSTORM WIND 10864367818
## 385 ICE STORM 8967041310
## 764 TROPICAL STORM 8382236550
From the above it can be inferred that Flood causes maxium economic damage
Plot of total economic damage
p <- qplot(EVTYPE, Total_damage_sum, data = Total_damage, stat='identity',geom = "bar", fill= EVTYPE,xlab="Top 10 economic consequences events",ylab="Economic
damage",main="Economic damage due to severe weather events\nin the U.S from 1950-2011")
p + theme(axis.text.x = element_text(angle = 90,hjust = 1))
We can conclude that Tornado causes most population hamr and Flood causes most ecomomic harm.