This project examines the impact of severe weather in the United States and covers the time period from 1950 through November 2011. The analysis aims to investigate which different types of severe weather events are most harmful on the population’s health in respect of general injuries and fatalities. The economic consequences are analyzed by exploring the financial damage done to both general property and agriculture.
## Load libraries into R
library(plyr)
## Warning: package 'plyr' was built under R version 3.4.3
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.3
Read the data into R, and then subset your data to include just the variables necessary. For this analysis just include the following columns: EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP
stormyWeather<- read.csv(bzfile("repdata%2Fdata%2FStormData.csv.bz2"))
stormyWeathertoStudy<- stormyWeather[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
These next commands will summarize and order the top health-related events (death and injury).
hurtbyNature<- ddply(stormyWeathertoStudy, .(EVTYPE), summarize, fatalities=sum(FATALITIES), injuries=sum(INJURIES))
deadly<- hurtbyNature[order(hurtbyNature$fatalities, decreasing =TRUE), ]
injury<- hurtbyNature[order(hurtbyNature$injuries, decreasing =TRUE),]
## Here is a summary of the top 6 harmful events
head(injury)
## EVTYPE fatalities injuries
## 834 TORNADO 5633 91346
## 856 TSTM WIND 504 6957
## 170 FLOOD 470 6789
## 130 EXCESSIVE HEAT 1903 6525
## 464 LIGHTNING 816 5230
## 275 HEAT 937 2100
## Here is a nice plot of the injury data
ggplot(injury[1:6, ], aes(EVTYPE, injuries, fill = EVTYPE)) + geom_bar(stat = "identity") +
xlab("Event Type") + ylab("Number of Injuries") + ggtitle("Injuries by Event type") + coord_flip()
Tornados cause the most injuries.
Here is the plot for deaths:
ggplot(deadly[1:6, ], aes(EVTYPE, fatalities, fill = EVTYPE)) + geom_bar(stat = "identity") +
xlab("Event Type") + ylab("Number of Deaths") + ggtitle("Deaths by Event type") + coord_flip()
Tornados cause both the most deaths and the most injuries.
We examine the structure of our Propery Damage and Crop Damage variables and see that the data needs some additional cleansing so that numerics are all presented in the same fashion. We will make lower characters uppercase and replace symbolic characters with one and blanks with zero.
unique(stormyWeathertoStudy$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
##See what I mean? CROPDMGEXP has the same type of issue
stormyWeathertoStudy$PROPDMGEXP <- toupper(stormyWeathertoStudy$PROPDMGEXP)
stormyWeathertoStudy$PROPDMGEXP[stormyWeathertoStudy$PROPDMGEXP %in% c("", "+", "-", "?")] = "1"
stormyWeathertoStudy$CROPDMGEXP <- toupper(stormyWeathertoStudy$CROPDMGEXP)
stormyWeathertoStudy$CROPDMGEXP[stormyWeathertoStudy$CROPDMGEXP %in% c("", "+", "-", "?")] = "1"
Change the letter representation for exponents to numeric for both property and crop damages.
stormyWeathertoStudy$PROPDMGEXP[stormyWeathertoStudy$PROPDMGEXP %in% c("B")] = "9"
stormyWeathertoStudy$PROPDMGEXP[stormyWeathertoStudy$PROPDMGEXP %in% c("M")] = "6"
stormyWeathertoStudy$PROPDMGEXP[stormyWeathertoStudy$PROPDMGEXP %in% c("K")] = "3"
stormyWeathertoStudy$PROPDMGEXP[stormyWeathertoStudy$PROPDMGEXP %in% c("H")] = "2"
stormyWeathertoStudy$CROPDMGEXP[stormyWeathertoStudy$CROPDMGEXP %in% c("B")] = "9"
stormyWeathertoStudy$CROPDMGEXP[stormyWeathertoStudy$CROPDMGEXP %in% c("M")] = "6"
stormyWeathertoStudy$CROPDMGEXP[stormyWeathertoStudy$CROPDMGEXP %in% c("K")] = "3"
stormyWeathertoStudy$CROPDMGEXP[stormyWeathertoStudy$CROPDMGEXP %in% c("H")] = "2"
Multiply the property and crop damage columns by the appropriate corresponding exponent.
stormyWeathertoStudy$PROPDMGEXP<-10^(as.numeric(stormyWeathertoStudy$PROPDMGEXP))
stormyWeathertoStudy$CROPDMGEXP<-10^(as.numeric(stormyWeathertoStudy$CROPDMGEXP))
stormyWeathertoStudy[is.na(stormyWeathertoStudy$PROPDMG), "PROPDMG"]<- 0
stormyWeathertoStudy[is.na(stormyWeathertoStudy$CROPDMG), "CROPDMG"]<- 0
damage.property = stormyWeathertoStudy$PROPDMG *stormyWeathertoStudy$PROPDMGEXP
data=as.data.frame(cbind(stormyWeathertoStudy,damage.property))
Damage.property = ddply(data, .(EVTYPE), summarize, damage.property = sum(damage.property, na.rm = TRUE))
Damage.property = Damage.property[order(Damage.property$damage.property, decreasing = T), ]
head(Damage.property)
## EVTYPE damage.property
## 170 FLOOD 144657709870
## 411 HURRICANE/TYPHOON 69305840000
## 834 TORNADO 56947381244
## 670 STORM SURGE 43323536000
## 153 FLASH FLOOD 16822675842
## 244 HAIL 15735268026
Create a plot for property damage
##Only 3 figures allowed in this analysis; but here is the code
##ggplot(Damage.property[1:6, ], aes(EVTYPE, damage.property, fill = EVTYPE, alpha=0.5)) + geom_bar(stat = "identity") +
##xlab("Event Type") + ylab("Property damages") + ggtitle("Property damages by Event type") + coord_flip()
Now, follow a similar procedure with our data to look at strictly crop damage.
damage.crop = stormyWeathertoStudy$CROPDMG *stormyWeathertoStudy$CROPDMGEXP
data2=as.data.frame(cbind(stormyWeathertoStudy,damage.crop))
Damage.crop = ddply(data2, .(EVTYPE), summarize, damage.crop = sum(damage.crop, na.rm = TRUE))
Damage.crop = Damage.crop[order(Damage.crop$damage.crop, decreasing = T), ]
head(Damage.crop)
## EVTYPE damage.crop
## 95 DROUGHT 13972566000
## 170 FLOOD 5661968450
## 590 RIVER FLOOD 5029459000
## 427 ICE STORM 5022113500
## 244 HAIL 3025954500
## 402 HURRICANE 2741910000
Drought appears to be the most damaging event in regards to agriculture. Here is the plot:
##ggplot(Damage.crop[1:6, ], aes(EVTYPE, damage.crop, fill = EVTYPE)) + geom_bar(stat = "identity") +
## xlab("Event Type") + ylab("Crop damages") + ggtitle("Crop damages by Event type") + coord_flip()
We can combine property damage and crop damage to determine total damages:
stormyWeathertoStudy <- within(stormyWeathertoStudy, TOTALDMG <- PROPDMG * PROPDMGEXP + CROPDMG * CROPDMGEXP)
DamageByType <- aggregate(stormyWeathertoStudy$TOTALDMG, by = list(EVTYPE = stormyWeathertoStudy$EVTYPE),
FUN = sum)
DamageByType <- DamageByType[order(DamageByType$x, decreasing = TRUE), ]
##Display the top 6 most damaging event types
head(DamageByType)
## EVTYPE x
## 170 FLOOD 150319678320
## 411 HURRICANE/TYPHOON 71913712800
## 834 TORNADO 57362334514
## 670 STORM SURGE 43323541000
## 244 HAIL 18761222526
## 153 FLASH FLOOD 18243992942
When combining property and crop damages, flood leaves drought in the dust.
TopDamage <- DamageByType[1:5, ]
ggplot(TopDamage, aes(EVTYPE, y = x, fill=EVTYPE)) + geom_bar(stat = "identity") + xlab("Event Type") +
ylab("Damage in Dollars") + ggtitle("Damage by Event type") + coord_flip()
Floods cause about $150B in damages, followed next by hurricanes and typhoons.
In summary, this basic analysis reveals that tornados have the greatest effect on health with 5,633 deaths and 91,346 injuries. Floods have the greatest overall economic impact, with approximately $150B in damages.