Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Our analysis want to answer two basic questions:
read.csv())# Load the required libraries
library(xtable)
library(data.table)
library(reshape2)
library(ggplot2)
# Read the data
file.con <- bzfile("repdata_data_StormData.csv.bz2", open = "r")
stormData <- read.csv(file.con, stringsAsFactors = FALSE)
close(file.con)
# Convert the names to lowercase
old.names <- names(stormData)
new.names <- tolower(old.names)
setnames(stormData, old.names, new.names)
We are centrated in the next variables:
And group the 'evetype' variable summarizing (by counting) the number of fatalities, injuries, propdmg and cropdmg. But we are only interested in the case where this variables are greater than zero.
# Agreggate by events the sum of fatalities
fatalities <- sapply(split(stormData$fatalities, stormData$evtype), sum)
# Agreggate by events the sum of injuries
injuries <- sapply(split(stormData$injuries, stormData$evtype), sum)
# Agreggate by events the sum of propdmg
propdmg <- sapply(split(stormData$propdmg, stormData$evtype), sum)
# Agreggate by events the sum of cropdmg
cropdmg <- sapply(split(stormData$cropdmg, stormData$evtype), sum)
# Create a single data frame
eventsDmg <- as.data.frame(cbind(fatalities, injuries, propdmg, cropdmg))
eventsDmg <- cbind(events = rownames(eventsDmg), eventsDmg)
rownames(eventsDmg) <- NULL
# Subset where fatalities > 0 | injuries > 0
dmgHealth <- subset(eventsDmg, fatalities > 0 | injuries > 0)
dmgHealth <- dmgHealth[with(dmgHealth, order(fatalities, injuries, decreasing = T)),
]
# Subset where propdmg > 0 | cropdmg > 0
dmgEcomcs <- subset(eventsDmg, propdmg > 0 | cropdmg > 0)
dmgEcomcs <- dmgEcomcs[with(dmgEcomcs, order(propdmg, cropdmg, decreasing = T)),
]
We want to answer:
For this we show the barplot of the 5 most harmful events cosidering for this the number of fatilities and injuries
top5Health <- head(dmgHealth, n = 5)[, c(1, 2, 3)]
top5HealthMelt <- melt(top5Health, id.vars = "events")
levels(top5HealthMelt$variable) <- c("fatalities", "injuries")
ggplot(top5HealthMelt, aes(x = events, y = value)) + geom_bar(stat = "identity",
position = "stack") + facet_grid(variable ~ .) + geom_text(aes(label = value,
vjust = -0.4), size = 3) + labs(x = "Event", y = "Count")
Answer:
Event: TORNADO
Fatalities: 5633
Injuries: 9.1346 × 104
For this we show the barplot of the 5 events with the most economic consequences cosidering for this the amount of property damage and the amount of crop damage
top5Ecomcs <- head(dmgEcomcs, n = 5)[, c(1, 4, 5)]
top5EcomcsMelt <- melt(top5Ecomcs, id.vars = "events")
levels(top5EcomcsMelt$variable) <- c("propdmg", "cropdmg")
ggplot(top5EcomcsMelt, aes(x = events, y = value)) + geom_bar(stat = "identity",
position = "dodge") + facet_grid(variable ~ .) + geom_text(aes(label = value,
vjust = -0.4), size = 3) + labs(x = "Event", y = "Count")
Answer:
Event: TORNADO
Property Damage: 3.2123 × 106
Crop Damage: 1.0002 × 105