Storms and other severe weather events can cause both public health and economic problems for communities and municipalities.This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
1-Across the United States, which types of events are most harmful with respect to population health?
2-Across the United States, which types of events have the greatest economic consequences?
Data Source
Except for the ‘event type’, there are 6 variables related to the two questions.
They’re’FATALITIES’,‘INJURIES’,‘PROPDMG’,‘PROPDMGEXP’,‘CROPDMG’,‘CROPDMGEXP’ respectively, the meaning of which is explained as follows.
‘FATALITIES’: The weather-related fatalities, the parameter of population health.
‘INJURIES’: The weather-related injuries, the parameter of population health.
‘PROPDMG’: The property damage, the parameter of economic consequences.
‘PROPDMGEXP’: The property damage exponent.
‘CROPDMG’: The crop damage, the parameter of economic consequences.
‘CROPDMGEXP’: The crop damage exponent.
1- For the 1st question on the effects of events type on populaiton health, the total fatalities and injuries are calculated for top 10 events type.
2- For the 2nd question on the effects of events type on economic consequences, the total economic damage is calculated by the sum of
property damage and crop damage for top 10 events type.
There are 902297 objects and 37 variables in total for the data frame
setwd("E:/Cousera-Data Science/Reproducible Research/CourseProject2")
data<-read.csv("repdata-data-StormData.csv.bz2",sep=",",stringsAsFactors = FALSE,quote="")
str(data)
## 'data.frame': 1773320 obs. of 37 variables:
## $ X.STATE__. : chr "1.00" "1.00" "1.00" "1.00" ...
## $ X.BGN_DATE. : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ X.BGN_TIME. : chr "\"0130\"" "\"0145\"" "\"1600\"" "\"0900\"" ...
## $ X.TIME_ZONE. : chr "\"CST\"" "\"CST\"" "\"CST\"" "\"CST\"" ...
## $ X.COUNTY. : chr "97.00" "3.00" "57.00" "89.00" ...
## $ X.COUNTYNAME.: chr "\"MOBILE\"" "\"BALDWIN\"" "\"FAYETTE\"" "\"MADISON\"" ...
## $ X.STATE. : chr "\"AL\"" "\"AL\"" "\"AL\"" "\"AL\"" ...
## $ X.EVTYPE. : chr "\"TORNADO\"" "\"TORNADO\"" "\"TORNADO\"" "\"TORNADO\"" ...
## $ X.BGN_RANGE. : chr "0.00" "0.00" "0.00" "0.00" ...
## $ X.BGN_AZI. : chr "" "" "" "" ...
## $ X.BGN_LOCATI.: chr "" "" "" "" ...
## $ X.END_DATE. : chr "" "" "" "" ...
## $ X.END_TIME. : chr "" "" "" "" ...
## $ X.COUNTY_END.: chr "0.00" "0.00" "0.00" "0.00" ...
## $ X.COUNTYENDN.: chr "" "" "" "" ...
## $ X.END_RANGE. : chr "0.00" "0.00" "0.00" "0.00" ...
## $ X.END_AZI. : chr "" "" "" "" ...
## $ X.END_LOCATI.: chr "" "" "" "" ...
## $ X.LENGTH. : chr "14.00" "2.00" "0.10" "0.00" ...
## $ X.WIDTH. : chr "100.00" "150.00" "123.00" "100.00" ...
## $ X.F. : chr "\"3\"" "\"2\"" "\"2\"" "\"2\"" ...
## $ X.MAG. : chr "0.00" "0.00" "0.00" "0.00" ...
## $ X.FATALITIES.: chr "0.00" "0.00" "0.00" "0.00" ...
## $ X.INJURIES. : chr "15.00" "0.00" "2.00" "2.00" ...
## $ X.PROPDMG. : chr "25.00" "2.50" "25.00" "2.50" ...
## $ X.PROPDMGEXP.: chr "\"K\"" "\"K\"" "\"K\"" "\"K\"" ...
## $ X.CROPDMG. : chr "0.00" "0.00" "0.00" "0.00" ...
## $ X.CROPDMGEXP.: chr "" "" "" "" ...
## $ X.WFO. : chr "" "" "" "" ...
## $ X.STATEOFFIC.: chr "" "" "" "" ...
## $ X.ZONENAMES. : chr "" "" "" "" ...
## $ X.LATITUDE. : chr "3040.00" "3042.00" "3340.00" "3458.00" ...
## $ X.LONGITUDE. : chr "8812.00" "8755.00" "8742.00" "8626.00" ...
## $ X.LATITUDE_E.: chr "3051.00" "0.00" "0.00" "0.00" ...
## $ X.LONGITUDE_.: chr "8806.00" "0.00" "0.00" "0.00" ...
## $ X.REMARKS. : chr "" "" "" "" ...
## $ X.REFNUM. : chr "1.00" "2.00" "3.00" "4.00" ...
data2<-select(data,EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)
head(data2)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
In the report, the most top 10 frequently happened events are focused.
EVtype<-head(sort(table(data2$EVTYPE),decreasing=TRUE),10)
EVtype
##
## HAIL TSTM WIND THUNDERSTORM WIND
## 288661 219940 82563
## TORNADO FLASH FLOOD FLOOD
## 60652 54277 25326
## THUNDERSTORM WINDS HIGH WIND LIGHTNING
## 20843 20212 15754
## HEAVY SNOW
## 15708
Sum the fatalities for each event type and the top 10 fatalities are presented and plotted with ggplot method.
res_F<-ddply(data2,.(EVTYPE),function(x) sum(x$FATALITIES))
ord_F<-head(res_F[with(res_F,order(V1,decreasing=TRUE)),],10)
colnames(ord_F)<-c("EVTYPE","Total_Fatalities")
rownames(ord_F)<-NULL
ord_F$EVTYPE<-factor(ord_F$EVTYPE,levels=unique(ord_F$EVTYPE))
p1<-qplot(EVTYPE,Total_Fatalities,data=ord_F)+geom_line(group=1)+labs(x="The Type of Events",y="The Total Fatalities for each type")+ggtitle("The Effect of top 10 Event Type on Population Fatilities")
ord_F
## EVTYPE Total_Fatalities
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
Sum the injuries for each event type and the top 10 injuries are presented and plotted with ggplot method.
res_I<-ddply(data2,.(EVTYPE),function(x) sum(x$INJURIES))
ord_I<-head(res_I[with(res_I,order(V1,decreasing=TRUE)),],10)
colnames(ord_I)<-c("EVTYPE","Total_Injuries")
rownames(ord_I)<-NULL
ord_I$EVTYPE<-factor(ord_I$EVTYPE,levels=unique(ord_I$EVTYPE))
p2<-qplot(EVTYPE,Total_Injuries,data=ord_I)+geom_line(group=1)+labs(x="The Type of Events",y="The Total Injuries for each type")+ggtitle("The Effect of top 10 Event Type on Population Injuries")
ord_I
## EVTYPE Total_Injuries
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
grid.arrange(p1,p2,nrow=2)
Before calculating the economic consequences, the variable of ‘PROPDMGEXP’ and ‘CROPDMGEXP’ need to be preprocessed.
The symbols of property damage exponents are as follows.
unique(data2$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
Replace the symbols with numbers for the convenience of calculating.
“M” or “m”: Million=1000000 (exp=6)
“K”:Thousand=1000 (exp=3)
“B”:Billion=1000000000 (exp=9)
“h” or “H”: Hundred=100 (exp=2)
“+”,“”,“0”,“?”,“-”: something unknown (exp=0)
data2$PROPDMGEXP[data2$PROPDMGEXP %in% c("M","m")]<-6
data2$PROPDMGEXP[data2$PROPDMGEXP %in% c("K")]<-3
data2$PROPDMGEXP[data2$PROPDMGEXP %in% c("B")]<-9
data2$PROPDMGEXP[data2$PROPDMGEXP %in% c("h","H")]<-2
data2$PROPDMGEXP[data2$PROPDMGEXP %in% c("+","","0","?","-")]<-0
The symbols of crop exponents are as follows.
unique(data2$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
Replace the symbols with numbers for the convenience of calculating.
“M” or “m”: Million=1000000 (exp=6)
“K” or “k:Thousand=1000 (exp=3)
“B”:Billion=1000000000 (exp=9)
“0”,“?”,“”: something unknown (exp=0)
data2$CROPDMGEXP[data2$CROPDMGEXP %in% c("M","m")]<-6
data2$CROPDMGEXP[data2$CROPDMGEXP %in% c("K","k")]<-3
data2$CROPDMGEXP[data2$CROPDMGEXP %in% c("B")]<-9
data2$CROPDMGEXP[data2$CROPDMGEXP %in% c("0","?","")]<-0
Calculate the total economy damage for each items.
EconomyDamage= (PropertyDamage) * 10^ (PropertyDamageExponent) + (CropDamage)*10^(CropDamageExponent)
data2$PROPDMGEXP<-as.numeric(data2$PROPDMGEXP)
data2$CROPDMGEXP<-as.numeric(data2$CROPDMGEXP)
Economy_DMG<-data2$PROPDMG*10^data2$PROPDMGEXP+data2$CROPDMG*10^data2$CROPDMGEXP
Calculate the most top 10 Economy Damage with different events type and the unit is converted into billions.
data2$EconomyDMG<-Economy_DMG
res_E<-ddply(data2,.(EVTYPE),function(x) sum(x$EconomyDMG))
Eco_DMG_order<-head(res_E[with(res_E,order(V1,decreasing=TRUE)),],10)
Eco_DMG_order$V1<-Eco_DMG_order$V1/10^9
colnames(Eco_DMG_order)<-c("EVTYPE","Eco_DMG")
rownames(Eco_DMG_order)<-NULL
Eco_DMG_order
## EVTYPE Eco_DMG
## 1 FLOOD 150.319678
## 2 HURRICANE/TYPHOON 71.913713
## 3 TORNADO 57.362334
## 4 STORM SURGE 43.323541
## 5 HAIL 18.761222
## 6 FLASH FLOOD 18.243991
## 7 DROUGHT 15.018672
## 8 HURRICANE 14.610229
## 9 RIVER FLOOD 10.148404
## 10 ICE STORM 8.967041
The events type with top 10 economic consequences.
Eco_DMG_order$EVTYPE<-factor(Eco_DMG_order$EVTYPE,levels=unique(Eco_DMG_order$EVTYPE))
qplot(EVTYPE,Eco_DMG,data=Eco_DMG_order)+geom_line(group=1)+labs(x="The Type of Events",y="The Total Economy Damage (unit:billion $)")+ggtitle("The Effect of top 10 Event Type on Economic Consequences across the United States")