Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. According to our analysis, among all weather events, tornado has caused the maximum level of fatalities and injuries across the U.s. On the other hand, the maximum level of economic damages has caused by flood.
We need to load two required packages for our analysis.
library(plyr)
library(ggplot2)
The data for this project come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. We can download the file from the course web site:
*Storm Data
We are assuming that the data is already in the working directory. Without changing the file in the first step, we read the data into a data.frame called “main”.
main=read.csv("repdata-data-StormData.csv")
We need to ckeck the first few rows in the dataset.
dim(main)
## [1] 902297 37
head(main[1:3,1:15])
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## COUNTYENDN
## 1 NA
## 2 NA
## 3 NA
To undrestand the effects of weather events on public health we need to create two datasets. The first one, called “result1” is the result of doing summation on the fatalities column with regard to events types:
result1=ddply(main,.(EVTYPE),summarise,fatalities=sum(FATALITIES))
The second one is the result of doing summation on the injuries column with regard to events types:
result2=ddply(main,.(EVTYPE),summarize,injuries=sum(INJURIES))
To explore the effects of weather events on economy we need to restrict ourself to the columns relataed to event types, property damages and crop damages. We also need to replace the “NA” values by one and empty values by 0. In addition, we need to replace m,M,k,K,b,B,h,H,+,? and - by the appripriate numbers in PROPDMGEXP and CROPDMGEXP columns.
DMG=main[,c(8,25,26,27,28)]
DMG[is.na(DMG)]=1
DMG[DMG==""]=0
DMG$PROPDMGEXP=as.vector(DMG$PROPDMGEXP)
DMG$CROPDMGEXP=as.vector(DMG$CROPDMGEXP)
DMG[DMG=="M"|DMG=="m"]="1000000"
DMG[DMG=="K"|DMG=="k"]="1000"
DMG[DMG=="B"|DMG=="b"]="1000000000"
DMG[DMG=="H"|DMG=="h"]="100"
DMG[DMG=="+"|DMG=="-"|DMG=="?"]="0"
First we want to know which weather event has caused the greatest fatalities. To do that, we need to arrange result1 with regard to the fatalities column in the descending order. Then we take the first 20 rows of the resulting dataset.
res1=head(arrange(result1,desc(fatalities)),20)
head(res1)
## EVTYPE fatalities
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
As yu can see, Tornado has caused the greatest fatalities. Then we want to know which weather event has caused the greatest injuries. To do that, we need to arrange result2 with regard to the injuries column in the descending order. Then we take the first 20 rows of the resulting dataset. resulting dataset.
res2=head(arrange(result2,desc(injuries)),20)
head(res2)
## EVTYPE injuries
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
As yu can see, Tornado has caused the greatest injuries. Now we need to add the number of fatalities and injuries to find out which event has had the greatest effect on the public health.
result=data.frame(EVTYPE=result1$EVTYPE,All=result1$fatalities+result2$injuries)
res=head(arrange(result,desc(All)),20)
head(res)
## EVTYPE All
## 1 TORNADO 96979
## 2 EXCESSIVE HEAT 8428
## 3 TSTM WIND 7461
## 4 FLOOD 7259
## 5 LIGHTNING 6046
## 6 HEAT 3037
We want to plot all three datasets in a single facet.
mainResult=head(arrange(cbind(result1,injuries=result2$injuries,All=result$All),desc(All)),25)
ggplot(mainResult,aes(x=EVTYPE,y=value,color = variable,group=1))+geom_line(aes(y=fatalities,col="fatalities"))+geom_line(aes(y=injuries,col="injuries"))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))+geom_line(aes(y=All,col="All"))
To explore the economic impact, first we need to add three columns to DMG dataset. The new column names are propert.damage,crop.damage and TotalDamage which are the result of multiplication of the number of property damages and their extensions, multiplication of the number of crop damages and their extensions and the addition of both property and crop damages.
DMG$propert.damage=DMG$PROPDMG*as.numeric(DMG$PROPDMGEXP)
DMG$crop.damage=DMG$CROPDMG*as.numeric(DMG$CROPDMGEXP)
DMG$TotalDamage=DMG$propert.damage+DMG$crop.damage
Now we need to apply summation on the columns of DMG that we are interested in and then store the first 25 rows of the result.
EconDamage1=ddply(DMG,.(EVTYPE),summarize,PropertyDamage=sum(propert.damage),CropDamage=sum(crop.damage),Total.Damage=sum(TotalDamage))
EconDamage=head(arrange(EconDamage1,desc(Total.Damage)),25)
If we arrange EconDamage dataset with regard to propert damages we would have:
head(arrange(EconDamage,desc(PropertyDamage)))
## EVTYPE PropertyDamage CropDamage Total.Damage
## 1 FLOOD 144657709800 5661968450 150319678250
## 2 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 3 TORNADO 56937160991 414953110 57352114101
## 4 STORM SURGE 43323536000 5000 43323541000
## 5 FLASH FLOOD 16140812087 1421317100 17562129187
## 6 HAIL 15732267370 3025954450 18758221820
As you can see, flood has caused the greatest property damages. If we arrange EconDamage dataset with regard to crop damages we would have:
head(arrange(EconDamage,desc(CropDamage)))
## EVTYPE PropertyDamage CropDamage Total.Damage
## 1 DROUGHT 1046106000 13972566000 15018672000
## 2 FLOOD 144657709800 5661968450 150319678250
## 3 RIVER FLOOD 5118945500 5029459000 10148404500
## 4 ICE STORM 3944927810 5022113500 8967041310
## 5 HAIL 15732267370 3025954450 18758221820
## 6 HURRICANE 11868319010 2741910000 14610229010
As you can see, drought has caused the greatest crop damages. If we arrange EconDamage dataset with regard to total damages we would have:
head(arrange(EconDamage,desc(Total.Damage)))
## EVTYPE PropertyDamage CropDamage Total.Damage
## 1 FLOOD 144657709800 5661968450 150319678250
## 2 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 3 TORNADO 56937160991 414953110 57352114101
## 4 STORM SURGE 43323536000 5000 43323541000
## 5 HAIL 15732267370 3025954450 18758221820
## 6 FLASH FLOOD 16140812087 1421317100 17562129187
As you can see, flood has caused the greatest economic damages. Now we can plot the property,crop and total damages on a single facet.
ggplot(EconDamage,aes(x=EVTYPE,y=value,color = variable,group=1))+geom_line(aes(y=PropertyDamage,col="PropertyDamage"))+geom_line(aes(y=CropDamage,col="CropDamage"))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))+geom_line(aes(y=Total.Damage,col="Total.Damage"))