Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities.This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Questions to be addressed

1-Across the United States, which types of events are most harmful with respect to population health?
2-Across the United States, which types of events have the greatest economic consequences?

Data Processing

Data Source
Except for the ‘event type’, there are 6 variables related to the two questions.
They’re’FATALITIES’,‘INJURIES’,‘PROPDMG’,‘PROPDMGEXP’,‘CROPDMG’,‘CROPDMGEXP’ respectively, the meaning of which is explained as follows.
‘FATALITIES’: The weather-related fatalities, the parameter of population health.
‘INJURIES’: The weather-related injuries, the parameter of population health.
‘PROPDMG’: The property damage, the parameter of economic consequences.
‘PROPDMGEXP’: The property damage exponent.
‘CROPDMG’: The crop damage, the parameter of economic consequences.
‘CROPDMGEXP’: The crop damage exponent.
1- For the 1st question on the effects of events type on populaiton health, the total fatalities and injuries are calculated for top 10 events type.
2- For the 2nd question on the effects of events type on economic consequences, the total economic damage is calculated by the sum of
property damage and crop damage for top 10 events type.

1-Load and read the Storm file from the course website

There are 902297 objects and 37 variables in total for the data frame

setwd("E:/Cousera-Data Science/Reproducible Research/CourseProject2")
data<-read.csv("repdata-data-StormData.csv.bz2",sep=",",stringsAsFactors = FALSE,quote="")
str(data)
## 'data.frame':    1773320 obs. of  37 variables:
##  $ X.STATE__.   : chr  "1.00" "1.00" "1.00" "1.00" ...
##  $ X.BGN_DATE.  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ X.BGN_TIME.  : chr  "\"0130\"" "\"0145\"" "\"1600\"" "\"0900\"" ...
##  $ X.TIME_ZONE. : chr  "\"CST\"" "\"CST\"" "\"CST\"" "\"CST\"" ...
##  $ X.COUNTY.    : chr  "97.00" "3.00" "57.00" "89.00" ...
##  $ X.COUNTYNAME.: chr  "\"MOBILE\"" "\"BALDWIN\"" "\"FAYETTE\"" "\"MADISON\"" ...
##  $ X.STATE.     : chr  "\"AL\"" "\"AL\"" "\"AL\"" "\"AL\"" ...
##  $ X.EVTYPE.    : chr  "\"TORNADO\"" "\"TORNADO\"" "\"TORNADO\"" "\"TORNADO\"" ...
##  $ X.BGN_RANGE. : chr  "0.00" "0.00" "0.00" "0.00" ...
##  $ X.BGN_AZI.   : chr  "" "" "" "" ...
##  $ X.BGN_LOCATI.: chr  "" "" "" "" ...
##  $ X.END_DATE.  : chr  "" "" "" "" ...
##  $ X.END_TIME.  : chr  "" "" "" "" ...
##  $ X.COUNTY_END.: chr  "0.00" "0.00" "0.00" "0.00" ...
##  $ X.COUNTYENDN.: chr  "" "" "" "" ...
##  $ X.END_RANGE. : chr  "0.00" "0.00" "0.00" "0.00" ...
##  $ X.END_AZI.   : chr  "" "" "" "" ...
##  $ X.END_LOCATI.: chr  "" "" "" "" ...
##  $ X.LENGTH.    : chr  "14.00" "2.00" "0.10" "0.00" ...
##  $ X.WIDTH.     : chr  "100.00" "150.00" "123.00" "100.00" ...
##  $ X.F.         : chr  "\"3\"" "\"2\"" "\"2\"" "\"2\"" ...
##  $ X.MAG.       : chr  "0.00" "0.00" "0.00" "0.00" ...
##  $ X.FATALITIES.: chr  "0.00" "0.00" "0.00" "0.00" ...
##  $ X.INJURIES.  : chr  "15.00" "0.00" "2.00" "2.00" ...
##  $ X.PROPDMG.   : chr  "25.00" "2.50" "25.00" "2.50" ...
##  $ X.PROPDMGEXP.: chr  "\"K\"" "\"K\"" "\"K\"" "\"K\"" ...
##  $ X.CROPDMG.   : chr  "0.00" "0.00" "0.00" "0.00" ...
##  $ X.CROPDMGEXP.: chr  "" "" "" "" ...
##  $ X.WFO.       : chr  "" "" "" "" ...
##  $ X.STATEOFFIC.: chr  "" "" "" "" ...
##  $ X.ZONENAMES. : chr  "" "" "" "" ...
##  $ X.LATITUDE.  : chr  "3040.00" "3042.00" "3340.00" "3458.00" ...
##  $ X.LONGITUDE. : chr  "8812.00" "8755.00" "8742.00" "8626.00" ...
##  $ X.LATITUDE_E.: chr  "3051.00" "0.00" "0.00" "0.00" ...
##  $ X.LONGITUDE_.: chr  "8806.00" "0.00" "0.00" "0.00" ...
##  $ X.REMARKS.   : chr  "" "" "" "" ...
##  $ X.REFNUM.    : chr  "1.00" "2.00" "3.00" "4.00" ...

2-Select only the records on “event type”,“fatalities”,“injuries”,“property damage”,“property damage exponent”,“crop damage” and “crop damage exponent”

data2<-select(data,EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)
head(data2)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0

In the report, the most top 10 frequently happened events are focused.

EVtype<-head(sort(table(data2$EVTYPE),decreasing=TRUE),10)
EVtype
## 
##               HAIL          TSTM WIND  THUNDERSTORM WIND 
##             288661             219940              82563 
##            TORNADO        FLASH FLOOD              FLOOD 
##              60652              54277              25326 
## THUNDERSTORM WINDS          HIGH WIND          LIGHTNING 
##              20843              20212              15754 
##         HEAVY SNOW 
##              15708

Sum the fatalities for each event type and the top 10 fatalities are presented and plotted with ggplot method.

res_F<-ddply(data2,.(EVTYPE),function(x) sum(x$FATALITIES))
ord_F<-head(res_F[with(res_F,order(V1,decreasing=TRUE)),],10)
colnames(ord_F)<-c("EVTYPE","Total_Fatalities")
rownames(ord_F)<-NULL
ord_F$EVTYPE<-factor(ord_F$EVTYPE,levels=unique(ord_F$EVTYPE))
p1<-qplot(EVTYPE,Total_Fatalities,data=ord_F)+geom_line(group=1)+labs(x="The Type of Events",y="The Total Fatalities for each type")+ggtitle("The Effect of top 10 Event Type on Population Fatilities")
ord_F
##            EVTYPE Total_Fatalities
## 1         TORNADO             5633
## 2  EXCESSIVE HEAT             1903
## 3     FLASH FLOOD              978
## 4            HEAT              937
## 5       LIGHTNING              816
## 6       TSTM WIND              504
## 7           FLOOD              470
## 8     RIP CURRENT              368
## 9       HIGH WIND              248
## 10      AVALANCHE              224

Sum the injuries for each event type and the top 10 injuries are presented and plotted with ggplot method.

res_I<-ddply(data2,.(EVTYPE),function(x) sum(x$INJURIES))
ord_I<-head(res_I[with(res_I,order(V1,decreasing=TRUE)),],10)
colnames(ord_I)<-c("EVTYPE","Total_Injuries")
rownames(ord_I)<-NULL
ord_I$EVTYPE<-factor(ord_I$EVTYPE,levels=unique(ord_I$EVTYPE))
p2<-qplot(EVTYPE,Total_Injuries,data=ord_I)+geom_line(group=1)+labs(x="The Type of Events",y="The Total Injuries for each type")+ggtitle("The Effect of top 10 Event Type on Population Injuries")
ord_I
##               EVTYPE Total_Injuries
## 1            TORNADO          91346
## 2          TSTM WIND           6957
## 3              FLOOD           6789
## 4     EXCESSIVE HEAT           6525
## 5          LIGHTNING           5230
## 6               HEAT           2100
## 7          ICE STORM           1975
## 8        FLASH FLOOD           1777
## 9  THUNDERSTORM WIND           1488
## 10              HAIL           1361

3- Result1: The effect of events type on population health

grid.arrange(p1,p2,nrow=2)

4- Data preprocessing before calculating economic consequences

Before calculating the economic consequences, the variable of ‘PROPDMGEXP’ and ‘CROPDMGEXP’ need to be preprocessed.

4.1- Preprocessing the property damage data

The symbols of property damage exponents are as follows.

unique(data2$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"

Replace the symbols with numbers for the convenience of calculating.
“M” or “m”: Million=1000000 (exp=6)
“K”:Thousand=1000 (exp=3)
“B”:Billion=1000000000 (exp=9)
“h” or “H”: Hundred=100 (exp=2)
“+”,“”,“0”,“?”,“-”: something unknown (exp=0)

data2$PROPDMGEXP[data2$PROPDMGEXP %in% c("M","m")]<-6
data2$PROPDMGEXP[data2$PROPDMGEXP %in% c("K")]<-3
data2$PROPDMGEXP[data2$PROPDMGEXP %in% c("B")]<-9
data2$PROPDMGEXP[data2$PROPDMGEXP %in% c("h","H")]<-2
data2$PROPDMGEXP[data2$PROPDMGEXP %in% c("+","","0","?","-")]<-0

4.2- Preprocessing the crop damage data

The symbols of crop exponents are as follows.

unique(data2$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

Replace the symbols with numbers for the convenience of calculating.
“M” or “m”: Million=1000000 (exp=6)
“K” or “k:Thousand=1000 (exp=3)
“B”:Billion=1000000000 (exp=9)
“0”,“?”,“”: something unknown (exp=0)

data2$CROPDMGEXP[data2$CROPDMGEXP %in% c("M","m")]<-6
data2$CROPDMGEXP[data2$CROPDMGEXP %in% c("K","k")]<-3
data2$CROPDMGEXP[data2$CROPDMGEXP %in% c("B")]<-9
data2$CROPDMGEXP[data2$CROPDMGEXP %in% c("0","?","")]<-0

Calculate the total economy damage for each items.

EconomyDamage= (PropertyDamage) * 10^ (PropertyDamageExponent) + (CropDamage)*10^(CropDamageExponent)

data2$PROPDMGEXP<-as.numeric(data2$PROPDMGEXP)
data2$CROPDMGEXP<-as.numeric(data2$CROPDMGEXP)
Economy_DMG<-data2$PROPDMG*10^data2$PROPDMGEXP+data2$CROPDMG*10^data2$CROPDMGEXP

Calculate the most top 10 Economy Damage with different events type and the unit is converted into billions.

data2$EconomyDMG<-Economy_DMG
res_E<-ddply(data2,.(EVTYPE),function(x) sum(x$EconomyDMG))
Eco_DMG_order<-head(res_E[with(res_E,order(V1,decreasing=TRUE)),],10)
Eco_DMG_order$V1<-Eco_DMG_order$V1/10^9
colnames(Eco_DMG_order)<-c("EVTYPE","Eco_DMG")
rownames(Eco_DMG_order)<-NULL
Eco_DMG_order
##               EVTYPE    Eco_DMG
## 1              FLOOD 150.319678
## 2  HURRICANE/TYPHOON  71.913713
## 3            TORNADO  57.362334
## 4        STORM SURGE  43.323541
## 5               HAIL  18.761222
## 6        FLASH FLOOD  18.243991
## 7            DROUGHT  15.018672
## 8          HURRICANE  14.610229
## 9        RIVER FLOOD  10.148404
## 10         ICE STORM   8.967041

5- Result2: The effect of events type on economic consequences

The events type with top 10 economic consequences.

Eco_DMG_order$EVTYPE<-factor(Eco_DMG_order$EVTYPE,levels=unique(Eco_DMG_order$EVTYPE))
qplot(EVTYPE,Eco_DMG,data=Eco_DMG_order)+geom_line(group=1)+labs(x="The Type of Events",y="The Total Economy Damage (unit:billion $)")+ggtitle("The Effect of top 10 Event Type on Economic Consequences across the United States")