The dataset consists of storm dat arcoss the United States of America from 1950 to November 2011. The data includes a number of variables of which we have included only those relating to damage to health and damage to proerty in this analysis. Analysis seems to indicate that tornados have been the most common event between 1950 and 2011 and are most likely to kill you. Heat waves on average cause the most injuries but are less common. Property damage is caused by floods and crop damage by droughts.
Download the data from URL and read into R
if(!file.exists("./data")) {dir.create("./data")}
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, destfile = "./data/repdata-data-StormData.csv", method="curl")
Storm <- read.csv("./data/repdata-data-StormData.csv")
Create new dataframe for population health variables only
pop <- data.frame(Storm$EVTYPE, Storm$FATALITIES, Storm$INJURIES)
Create new dataframe for economic variables only (one for property damage and one for crop damage)
eco_prop <- data.frame(Storm$EVTYPE, Storm$PROPDMG, Storm$PROPDMGEXP)
eco_crop <- data.frame(Storm$EVTYPE, Storm$CROPDMG, Storm$CROPDMGEXP)
Note. the following analysis is based on total fatalities and injuries rather then an average across events. While this does not account for which are the most dangerous individual events it does tell us which events have caused the most damage across the US from 1950 to 2011. So it tells us which events we need to do something about. You are better off protecting yourself against the vast number of 'x' that cause a few fatatlities then the unlikely event 'y' that kills more on average.
Calculate total fatalities by event and order by decreasing value
options(scipen = 1, digits = 2)
FbE <- aggregate(list(fatalities=pop$Storm.FATALITIES), list(event=pop$Storm.EVTYPE), sum)
FbE <- FbE[order(FbE$fatalities, decreasing=TRUE),]
Calculate total injuries by event and order by decreasing value
options(scipen = 1, digits = 2)
IbE <- aggregate(list(injuries=pop$Storm.INJURIES), list(event=pop$Storm.EVTYPE), sum)
IbE <- IbE[order(IbE$injuries, decreasing=TRUE),]
Draw graphs for top 10 harm events
fbe <- FbE[1:10,]
ibe <- IbE[1:10,]
par(mfrow=c(2,1))
barplot(ibe$injuries, col="blue", main="total injuries (blue) total fatalities (red) by event", ylab="injuries", names.arg = ibe$event, las=3)
barplot(fbe$fatalities, col="red", ylab="fatalities", names.arg = ibe$event, las=3)
It is clear from the data that tornados cause the most damage across the US. This is because their are more of them but in this analysis we are only looking at total fatalities and injuries.
Having said that if we take a quick look at average injuries it shows us something interesting.
options(scipen = 1, digits = 2)
IbEAvg <- aggregate(list(injuries=pop$Storm.INJURIES), list(event=pop$Storm.EVTYPE), mean)
IbEAvg <- IbEAvg[order(IbEAvg$injuries, decreasing=TRUE),]
head(IbEAvg)
## event injuries
## 277 Heat Wave 70
## 851 TROPICAL STORM GORDON 43
## 954 WILD FIRES 38
## 821 THUNDERSTORMW 27
## 366 HIGH WIND AND SEAS 20
## 656 SNOW/HIGH WINDS 18
It is clear that each heat wave causes a significant number of injuries.
Standardise the column values for damage for both crop and property.
eco_crop$times <- 0
eco_crop$times[grep("B", eco_crop$Storm.CROPDMGEXP)] <- as.numeric(1000000000)
eco_crop$times[grep("M", eco_crop$Storm.CROPDMGEXP)] <- as.numeric(1000000)
eco_crop$times[grep("m", eco_crop$Storm.CROPDMGEXP)] <- as.numeric(1000000)
eco_crop$times[grep("K", eco_crop$Storm.CROPDMGEXP)] <- as.numeric(1000)
eco_crop$times[grep("k", eco_crop$Storm.CROPDMGEXP)] <- as.numeric(1000)
eco_crop$Storm.CROPDMG <- eco_crop$Storm.CROPDMG*eco_crop$times
eco_prop$times <- 0
eco_prop$times[grep("B", eco_prop$Storm.PROPDMGEXP)] <- as.numeric(1000000000)
eco_prop$times[grep("M", eco_prop$Storm.PROPDMGEXP)] <- as.numeric(1000000)
eco_prop$times[grep("m", eco_prop$Storm.PROPDMGEXP)] <- as.numeric(1000000)
eco_prop$times[grep("K", eco_prop$Storm.PROPDMGEXP)] <- as.numeric(1000)
eco_prop$times[grep("k", eco_prop$Storm.PROPDMGEXP)] <- as.numeric(1000)
eco_prop$Storm.PROPDMG <- eco_prop$Storm.PROPDMG*eco_prop$times
Calculate totals for events costs
options(scipen = 1, digits = 2)
eco_crop <- aggregate(list(cost=eco_crop$Storm.CROPDMG), list(event=eco_crop$Storm.EVTYPE), sum)
eco_crop <- eco_crop[order(eco_crop$cost, decreasing=TRUE),]
options(scipen = 1, digits = 2)
eco_prop <- aggregate(list(cost=eco_prop$Storm.PROPDMG), list(event=eco_prop$Storm.EVTYPE), sum)
eco_prop <- eco_prop[order(eco_prop$cost, decreasing=TRUE),]
Draw panel graph for both property and crop damage for top 10 events
crops <- eco_crop[1:10,]
props <- eco_prop[1:10,]
par(mfrow=c(2,1))
barplot(crops$cost, col="yellow", main="damage to crops (yellow) and property (green) by event", ylab="cost", names.arg = crops$event, las=3)
barplot(props$cost, col="green", ylab="cost", names.arg = props$event, las=3)
It seems clear that droughts damage crops and floods damage property.