Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. According to our analysis, among all weather events, tornado has caused the maximum level of fatalities and injuries across the U.s. On the other hand, the maximum level of economic damages has caused by flood.

Loading and Processing the Raw Data

We need to load two required packages for our analysis.

library(plyr)
library(ggplot2)

The data for this project come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. We can download the file from the course web site:
*Storm Data
We are assuming that the data is already in the working directory. Without changing the file in the first step, we read the data into a data.frame called “main”.

main=read.csv("repdata-data-StormData.csv")

We need to ckeck the first few rows in the dataset.

dim(main)
## [1] 902297     37
head(main[1:3,1:15])
##   STATE__          BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1 4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1 4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1 2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
##   COUNTYENDN
## 1         NA
## 2         NA
## 3         NA

To undrestand the effects of weather events on public health we need to create two datasets. The first one, called “result1” is the result of doing summation on the fatalities column with regard to events types:

result1=ddply(main,.(EVTYPE),summarise,fatalities=sum(FATALITIES))

The second one is the result of doing summation on the injuries column with regard to events types:

result2=ddply(main,.(EVTYPE),summarize,injuries=sum(INJURIES))

To explore the effects of weather events on economy we need to restrict ourself to the columns relataed to event types, property damages and crop damages. We also need to replace the “NA” values by one and empty values by 0. In addition, we need to replace m,M,k,K,b,B,h,H,+,? and - by the appripriate numbers in PROPDMGEXP and CROPDMGEXP columns.

DMG=main[,c(8,25,26,27,28)]
DMG[is.na(DMG)]=1
DMG[DMG==""]=0
DMG$PROPDMGEXP=as.vector(DMG$PROPDMGEXP)
DMG$CROPDMGEXP=as.vector(DMG$CROPDMGEXP)
DMG[DMG=="M"|DMG=="m"]="1000000"
DMG[DMG=="K"|DMG=="k"]="1000"
DMG[DMG=="B"|DMG=="b"]="1000000000"
DMG[DMG=="H"|DMG=="h"]="100"
DMG[DMG=="+"|DMG=="-"|DMG=="?"]="0"

Result

Public Health Damages

First we want to know which weather event has caused the greatest fatalities. To do that, we need to arrange result1 with regard to the fatalities column in the descending order. Then we take the first 20 rows of the resulting dataset.

res1=head(arrange(result1,desc(fatalities)),20)
head(res1)
##           EVTYPE fatalities
## 1        TORNADO       5633
## 2 EXCESSIVE HEAT       1903
## 3    FLASH FLOOD        978
## 4           HEAT        937
## 5      LIGHTNING        816
## 6      TSTM WIND        504

As yu can see, Tornado has caused the greatest fatalities. Then we want to know which weather event has caused the greatest injuries. To do that, we need to arrange result2 with regard to the injuries column in the descending order. Then we take the first 20 rows of the resulting dataset. resulting dataset.

res2=head(arrange(result2,desc(injuries)),20)
head(res2)
##           EVTYPE injuries
## 1        TORNADO    91346
## 2      TSTM WIND     6957
## 3          FLOOD     6789
## 4 EXCESSIVE HEAT     6525
## 5      LIGHTNING     5230
## 6           HEAT     2100

As yu can see, Tornado has caused the greatest injuries. Now we need to add the number of fatalities and injuries to find out which event has had the greatest effect on the public health.

result=data.frame(EVTYPE=result1$EVTYPE,All=result1$fatalities+result2$injuries)
res=head(arrange(result,desc(All)),20)
head(res)
##           EVTYPE   All
## 1        TORNADO 96979
## 2 EXCESSIVE HEAT  8428
## 3      TSTM WIND  7461
## 4          FLOOD  7259
## 5      LIGHTNING  6046
## 6           HEAT  3037

We want to plot all three datasets in a single facet.

mainResult=head(arrange(cbind(result1,injuries=result2$injuries,All=result$All),desc(All)),25)
ggplot(mainResult,aes(x=EVTYPE,y=value,color = variable,group=1))+geom_line(aes(y=fatalities,col="fatalities"))+geom_line(aes(y=injuries,col="injuries"))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))+geom_line(aes(y=All,col="All"))

Econimic Impacts

To explore the economic impact, first we need to add three columns to DMG dataset. The new column names are propert.damage,crop.damage and TotalDamage which are the result of multiplication of the number of property damages and their extensions, multiplication of the number of crop damages and their extensions and the addition of both property and crop damages.

DMG$propert.damage=DMG$PROPDMG*as.numeric(DMG$PROPDMGEXP)
DMG$crop.damage=DMG$CROPDMG*as.numeric(DMG$CROPDMGEXP)
DMG$TotalDamage=DMG$propert.damage+DMG$crop.damage

Now we need to apply summation on the columns of DMG that we are interested in and then store the first 25 rows of the result.

EconDamage1=ddply(DMG,.(EVTYPE),summarize,PropertyDamage=sum(propert.damage),CropDamage=sum(crop.damage),Total.Damage=sum(TotalDamage))
EconDamage=head(arrange(EconDamage1,desc(Total.Damage)),25)

If we arrange EconDamage dataset with regard to propert damages we would have:

head(arrange(EconDamage,desc(PropertyDamage)))
##              EVTYPE PropertyDamage CropDamage Total.Damage
## 1             FLOOD   144657709800 5661968450 150319678250
## 2 HURRICANE/TYPHOON    69305840000 2607872800  71913712800
## 3           TORNADO    56937160991  414953110  57352114101
## 4       STORM SURGE    43323536000       5000  43323541000
## 5       FLASH FLOOD    16140812087 1421317100  17562129187
## 6              HAIL    15732267370 3025954450  18758221820

As you can see, flood has caused the greatest property damages. If we arrange EconDamage dataset with regard to crop damages we would have:

head(arrange(EconDamage,desc(CropDamage)))
##        EVTYPE PropertyDamage  CropDamage Total.Damage
## 1     DROUGHT     1046106000 13972566000  15018672000
## 2       FLOOD   144657709800  5661968450 150319678250
## 3 RIVER FLOOD     5118945500  5029459000  10148404500
## 4   ICE STORM     3944927810  5022113500   8967041310
## 5        HAIL    15732267370  3025954450  18758221820
## 6   HURRICANE    11868319010  2741910000  14610229010

As you can see, drought has caused the greatest crop damages. If we arrange EconDamage dataset with regard to total damages we would have:

head(arrange(EconDamage,desc(Total.Damage)))
##              EVTYPE PropertyDamage CropDamage Total.Damage
## 1             FLOOD   144657709800 5661968450 150319678250
## 2 HURRICANE/TYPHOON    69305840000 2607872800  71913712800
## 3           TORNADO    56937160991  414953110  57352114101
## 4       STORM SURGE    43323536000       5000  43323541000
## 5              HAIL    15732267370 3025954450  18758221820
## 6       FLASH FLOOD    16140812087 1421317100  17562129187

As you can see, flood has caused the greatest economic damages. Now we can plot the property,crop and total damages on a single facet.

ggplot(EconDamage,aes(x=EVTYPE,y=value,color = variable,group=1))+geom_line(aes(y=PropertyDamage,col="PropertyDamage"))+geom_line(aes(y=CropDamage,col="CropDamage"))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))+geom_line(aes(y=Total.Damage,col="Total.Damage"))