Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The data come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file Here The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
library(downloader)
download("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", dest = "storm_data.zip", mode = "wb")
#This step will take about 3 minutes to read the datset (depending on your system speed)
storm_data <- read.csv("storm_data.zip")
In this section we will use the dplyr package to take a glimpse of the data and subset the data to include only the columns needed for this study.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
glimpse(storm_data)
## Observations: 902,297
## Variables: 37
## $ STATE__ <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ BGN_DATE <fctr> 4/18/1950 0:00:00, 4/18/1950 0:00:00, 2/20/1951 0:...
## $ BGN_TIME <fctr> 0130, 0145, 1600, 0900, 1500, 2000, 0100, 0900, 20...
## $ TIME_ZONE <fctr> CST, CST, CST, CST, CST, CST, CST, CST, CST, CST, ...
## $ COUNTY <dbl> 97, 3, 57, 89, 43, 77, 9, 123, 125, 57, 43, 9, 73, ...
## $ COUNTYNAME <fctr> MOBILE, BALDWIN, FAYETTE, MADISON, CULLMAN, LAUDER...
## $ STATE <fctr> AL, AL, AL, AL, AL, AL, AL, AL, AL, AL, AL, AL, AL...
## $ EVTYPE <fctr> TORNADO, TORNADO, TORNADO, TORNADO, TORNADO, TORNA...
## $ BGN_RANGE <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ BGN_AZI <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ BGN_LOCATI <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ END_DATE <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ END_TIME <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ COUNTY_END <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ COUNTYENDN <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ END_RANGE <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ END_AZI <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ END_LOCATI <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ LENGTH <dbl> 14.0, 2.0, 0.1, 0.0, 0.0, 1.5, 1.5, 0.0, 3.3, 2.3, ...
## $ WIDTH <dbl> 100, 150, 123, 100, 150, 177, 33, 33, 100, 100, 400...
## $ F <int> 3, 2, 2, 2, 2, 2, 2, 1, 3, 3, 1, 1, 3, 3, 3, 4, 1, ...
## $ MAG <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ FATALITIES <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 4, 0, ...
## $ INJURIES <dbl> 15, 0, 2, 2, 2, 6, 1, 0, 14, 0, 3, 3, 26, 12, 6, 50...
## $ PROPDMG <dbl> 25.0, 2.5, 25.0, 2.5, 2.5, 2.5, 2.5, 2.5, 25.0, 25....
## $ PROPDMGEXP <fctr> K, K, K, K, K, K, K, K, K, K, M, M, K, K, K, K, K,...
## $ CROPDMG <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ CROPDMGEXP <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ WFO <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ STATEOFFIC <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ ZONENAMES <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ LATITUDE <dbl> 3040, 3042, 3340, 3458, 3412, 3450, 3405, 3255, 333...
## $ LONGITUDE <dbl> 8812, 8755, 8742, 8626, 8642, 8748, 8631, 8558, 874...
## $ LATITUDE_E <dbl> 3051, 0, 0, 0, 0, 0, 0, 0, 3336, 3337, 3402, 3404, ...
## $ LONGITUDE_ <dbl> 8806, 0, 0, 0, 0, 0, 0, 0, 8738, 8737, 8644, 8640, ...
## $ REMARKS <fctr> , , , , , , , , , , , , , , , , , , , , , , , ,
## $ REFNUM <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, ...
storm_data_subset <- select(storm_data, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
sum(is.na(storm_data_subset))
## [1] 0
I will use the sum of casualties and fatalities to show the impact of the events on the population health
library(plyr)
## -------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## -------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
casualties <- ddply(storm_data_subset, .(EVTYPE), summarize,
sumFatalities = sum(FATALITIES),
sumInjuries = sum(INJURIES))
# There are many events in the dataset with no or little effect so I will choose the top 10
fatal <- head(casualties[order(casualties$sumFatalities, decreasing = T), ], 20)
barplot(fatal$sumFatalities,names.arg = fatal$EVTYPE, horiz=TRUE, xlab="Caualties",main="Fatalities by Event", ylab="Event Types",las=1,col="blue",cex.names=0.5,xlim=c(0,5^5.5))
Now we will do the same for injuries
# There are many events in the dataset with no or little effect so I will choose the top 10
injury <- head(casualties[order(casualties$sumInjuries, decreasing = T), ], 10)
barplot(injury$sumInjuries,names.arg = injury$EVTYPE, horiz=TRUE, xlab="Caualties",main="Injuries by Event", ylab="Event Types",las=1,col="blue",cex.names=0.5,xlim=c(0,5^5.5))
Before working on this answer we must deal with the values of PROPDMGEXP, and CROPDMGEXP The property damage is represented in two fields, a dollar figure in column PROPDMG in dollars and an exponent, PROPDMGEXP, to this figure. Similarly, the crop damage is represented using two fields, CROPDMG and CROPDMGEXP. For each event calculate the property damage cost and crop damage cost.
#credit for this function goes to Ramaji Ganesh
transformExp <- function(e) {
# h -> hundred, k -> thousand, m -> million, b -> billion
if (e %in% c('h', 'H'))
return(2)
else if (e %in% c('k', 'K'))
return(3)
else if (e %in% c('m', 'M'))
return(6)
else if (e %in% c('b', 'B'))
return(9)
else if (!is.na(as.numeric(e))) # if a digit
return(as.numeric(e))
else if (e %in% c('', '-', '?', '+'))
return(0)
else {
stop("Invalid exponent value.")
}
}
Apply the function created above to the data set
PROPDMGEXPO <- sapply(storm_data_subset$PROPDMGEXP, FUN=transformExp)
storm_data_subset$PROPDMGCOST <- storm_data_subset$PROPDMG * (10 ** PROPDMGEXPO)
CROPDMGEXPO <- sapply(storm_data_subset$CROPDMGEXP, FUN=transformExp)
storm_data_subset$CROPDMGCOST <- storm_data_subset$CROPDMG * (10 ** CROPDMGEXPO)
Calculate property damage
loss <- ddply(storm_data_subset, .(EVTYPE), summarize,
sumPropDamage = sum(PROPDMGCOST),
sumCropDamage = sum(CROPDMGCOST),
sumPropAndCropDamage=sum(PROPDMGCOST+CROPDMGCOST))
cost <- head(loss[order(loss$sumPropDamage, decreasing = T), ], 20)
barplot(cost$sumPropDamage/(10^9), names.arg = cost$EVTYPE, horiz=TRUE, xlab = "Damages", main = "Property Damage Cost in Billions", ylab = "Events", las=1, col="red", cex.names=0.5,xlim=c(0,5^7.1))
Calculate crop damage
cost_crop <- head(loss[order(loss$sumCropDamage, decreasing = T), ], 20)
barplot(cost_crop$sumCropDamage/10^7, names.arg = cost_crop$EVTYPE, horiz = TRUE, xlab= "Damages", main = "Crop Damage by event in Millions", ylab = "Events",las = 1,col ="red",cex.names=0.5,xlim=c(0,5^4.6))
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? Tornadoes has the highest fatalities and injuries in the united states (Check Injuries by Event, and Fatalities by event graph) for more info
Across the United States, which types of events have the greatest economic consequences? Flash Floods has the highest impact by causing the highest property damage(See Property Damage Cost graph above) and droughts caused the most impact on the economy for it’s effect on the crops(See Crop Damage by event) graph for more info