Severe weather events can cause both public health and economic problems for communities and municipalities. The main contribution of this report is to analyze (1) the most harmful events with respect to population health, and (2) the main contributed events for economics, with the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.
Firstly, it is needed to load necessary packages and set the environment for the global Rmarkdown file.
library(knitr)
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(gridExtra)
## Loading required package: grid
knitr::opts_chunk$set(fig.width=12,fig.height=15,echo=TRUE,warning=FALSE,message=FALSE)
Then let’s load the raw data. Note that cache = TRUE option is needed since preprocessing is time-consuming.
dat <- read.csv("repdata-data-StormData.csv",sep=",") #set the working directory the same folder with the data
dat <- na.omit(dat)
head(dat,3)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1.00 4/18/1950 0:00:00 0130 CST 97.00 MOBILE AL
## 2 1.00 4/18/1950 0:00:00 0145 CST 3.00 BALDWIN AL
## 3 1.00 2/20/1951 0:00:00 1600 CST 57.00 FAYETTE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0.00 0.00
## 2 TORNADO 0.00 0.00
## 3 TORNADO 0.00 0.00
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 0.00 14.00 100.00 3 0.00 0.00
## 2 0.00 2.00 150.00 2 0.00 0.00
## 3 0.00 0.10 123.00 2 0.00 0.00
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15.00 25.00 K 0.00
## 2 0.00 2.50 K 0.00
## 3 2.00 25.00 K 0.00
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040.00 8812.00 3051.00 8806.00 1.00
## 2 3042.00 8755.00 0.00 0.00 2.00
## 3 3340.00 8742.00 0.00 0.00 3.00
Let’s creat another dataframe called dat1 containing column Fatalities and Injuries, and sum of them which indicates the level of public Health.
dat$FATALITIES2 <- as.numeric(as.character(dat$FATALITIES))
dat$INJURIES2 <- as.numeric(as.character(dat$INJURIES))
dat1 <- aggregate(cbind(FATALITIES2,INJURIES2)~EVTYPE,data=dat,FUN=sum,na.rm=TRUE)
dat1$HEALTH <- dat1$FATALITIES2+dat1$INJURIES2
topfa <- arrange(dat1,desc(FATALITIES2))[1:20,1:2] #top 20 contributed events to fatalities
topfa
## EVTYPE
## 1 TORNADO
## 2 EXCESSIVE HEAT
## 3 158 homes were declared uninhabitable by building inspectors due to contamination and lack of utilities. Another two homes and one business were condemned.,642867.00\n51.00,6/25/2006 0:00:00,09:43:00 PM,EST,63.00,FLOYD,VA,FLOOD,0.00,,EAST PORTION,6/27/2006 0:00:00,03:30:00 PM,0.00,,0.00,,COUNTYWIDE,0.00,0.00,,0.00,0.00,0.00,155.00,K,0.00,,RNK,VIRGINIA
## 4 HEAT
## 5 LIGHTNING
## 6 FLASH FLOOD
## 7 TSTM WIND
## 8 FLOOD
## 9 RIP CURRENTS
## 10 HIGH WIND
## 11 HEAT WAVE
## 12 WINTER STORM
## 13 EXTREME COLD
## 14 AVALANCHE
## 15 HEAVY SNOW
## 16 EXTREME HEAT
## 17 BLIZZARD
## 18 RIP CURRENT
## 19 HEAVY RAIN
## 20 ICE STORM
## FATALITIES2
## 1 4658
## 2 1416
## 3 709
## 4 708
## 5 562
## 6 559
## 7 471
## 8 258
## 9 204
## 10 194
## 11 172
## 12 170
## 13 160
## 14 115
## 15 112
## 16 96
## 17 92
## 18 82
## 19 76
## 20 76
topin <- arrange(dat1,desc(INJURIES2))[1:20,c(1,3)] #top 20 contributed events to injuries
topin
## EVTYPE
## 1 TORNADO
## 2 FLOOD
## 3 TSTM WIND
## 4 EXCESSIVE HEAT
## 5 LIGHTNING
## 6 ICE STORM
## 7 FLASH FLOOD
## 8 WINTER STORM
## 9 HAIL
## 10 HURRICANE/TYPHOON
## 11 HIGH WIND
## 12 THUNDERSTORM WINDS
## 13 HEAT
## 14 HEAVY SNOW
## 15 BLIZZARD
## 16 FOG
## 17 158 homes were declared uninhabitable by building inspectors due to contamination and lack of utilities. Another two homes and one business were condemned.,642867.00\n51.00,6/25/2006 0:00:00,09:43:00 PM,EST,63.00,FLOYD,VA,FLOOD,0.00,,EAST PORTION,6/27/2006 0:00:00,03:30:00 PM,0.00,,0.00,,COUNTYWIDE,0.00,0.00,,0.00,0.00,0.00,155.00,K,0.00,,RNK,VIRGINIA
## 18 WILD/FOREST FIRE
## 19 DUST STORM
## 20 East, ,3952.00,7536.00,3951.00,7535.00,EPISODE NARRATIVE: An intense low pressure system that moved through the Great Lakes on the 9th produced a strong southerly flow of moist air from the Gulf of Mexico and Atlantic Ocean. Rain
## INJURIES2
## 1 80084
## 2 6499
## 3 6452
## 4 4354
## 5 3628
## 6 1959
## 7 1407
## 8 1238
## 9 1154
## 10 1114
## 11 919
## 12 908
## 13 878
## 14 861
## 15 802
## 16 734
## 17 610
## 18 545
## 19 378
## 20 323
tophe <- arrange(dat1,desc(HEALTH))[1:20,c(1,4)] #top 20 contributed events to sum of fatalities and injuries (health)
tophe
## EVTYPE
## 1 TORNADO
## 2 TSTM WIND
## 3 FLOOD
## 4 EXCESSIVE HEAT
## 5 LIGHTNING
## 6 ICE STORM
## 7 FLASH FLOOD
## 8 HEAT
## 9 WINTER STORM
## 10 158 homes were declared uninhabitable by building inspectors due to contamination and lack of utilities. Another two homes and one business were condemned.,642867.00\n51.00,6/25/2006 0:00:00,09:43:00 PM,EST,63.00,FLOYD,VA,FLOOD,0.00,,EAST PORTION,6/27/2006 0:00:00,03:30:00 PM,0.00,,0.00,,COUNTYWIDE,0.00,0.00,,0.00,0.00,0.00,155.00,K,0.00,,RNK,VIRGINIA
## 11 HAIL
## 12 HURRICANE/TYPHOON
## 13 HIGH WIND
## 14 HEAVY SNOW
## 15 THUNDERSTORM WINDS
## 16 BLIZZARD
## 17 FOG
## 18 WILD/FOREST FIRE
## 19 RIP CURRENTS
## 20 HEAT WAVE
## HEALTH
## 1 84742
## 2 6923
## 3 6757
## 4 5770
## 5 4190
## 6 2035
## 7 1966
## 8 1586
## 9 1408
## 10 1319
## 11 1166
## 12 1144
## 13 1113
## 14 973
## 15 972
## 16 894
## 17 796
## 18 557
## 19 501
## 20 481
From dat1 we can obtain the top 20 contributed events with respect to public fatalities, injuries and health. Note that We shorten the event “158 homes were declared uninhabitable by building inspectors due to contamination and lack of utilities…” into “CONTAMINATION”; and “An intense low pressure system that moved through the Great Lakes on the 9th produced a strong southerly flow of moist air from the Gulf of Mexico and Atlantic Ocean…” into “LOW PRESSURE SYSTEM”.The barplots are illustrated as below.
topfa$EVTYPE <- as.character(topfa$EVTYPE)
topin$EVTYPE <- as.character(topin$EVTYPE)
tophe$EVTYPE <- as.character(tophe$EVTYPE)
topfa$EVTYPE[grep("158 homes were declared",topfa$EVTYPE)] <- "CONTAMINATION"
topin$EVTYPE[grep("158 homes were declared",topin$EVTYPE)] <- "CONTAMINATION"
topin$EVTYPE[grep("An intense low pressure system",topin$EVTYPE)] <- "LOW PRESSURE SYSTEM"
tophe$EVTYPE[grep("158 homes were declared",tophe$EVTYPE)] <- "CONTAMINATION"
a=ggplot(topfa,aes(reorder(EVTYPE,FATALITIES2),FATALITIES2))+geom_bar(stat="identity")+coord_flip()+labs(y="Total fatalities",x="Climatic events",title="Top 20 harmful events on fatalities")
b=ggplot(topin,aes(reorder(EVTYPE,INJURIES2),INJURIES2))+geom_bar(stat="identity")+coord_flip()+labs(y="Total injuries",x="Climatic events",title="Top 20 harmful events on injuries")
c=ggplot(tophe,aes(reorder(EVTYPE,HEALTH),HEALTH))+geom_bar(stat="identity")+coord_flip()+labs(y="Total sum of injuries and fatalities",x="Climatic events",title="Top 20 harmful events on sum of injuries and fatalities")
grid.arrange(a,b,c,ncol=1)
Firstly, it is needed to convert the value in column “PROPDMG” and “CROPDMG” by magnitude in colume “PROPDMGEXP” and “CROPDMGEXP”. Note that the characters used to magnitude include “k” for thousand, “M” for million, and “B” for biliion.
propmag <- ifelse(dat$PROPDMGEXP=="K",1000,ifelse(dat$PROPDMGEXP=="M",1000000,ifelse(dat$PROPDMGEXP=="B",1000000000,1)))
dat$PROPDMG2 <- as.numeric(as.character(dat$PROPDMG))*propmag
cropmag <- ifelse(dat$CROPDMGEXP=="K",1000,ifelse(dat$CROPDMGEXP=="M",1000000,ifelse(dat$CROPDMGEXP=="B",1000000000,1)))
dat$CROPDMG2 <- as.numeric(as.character(dat$CROPDMG))*cropmag
Then, let’s creat another dataframe called dat2 containing column PROPDMG2 and CROPDMG2, and sum of them which indicates the level of national economics.
dat2 <- aggregate(cbind(PROPDMG2,CROPDMG2)~EVTYPE,data=dat,FUN=sum,na.rm=TRUE)
dat2$ECONOMICS <- dat2$PROPDMG2+dat2$CROPDMG2
topprop <- arrange(dat2,desc(PROPDMG2))[1:20,1:2] #top 20 contributed events to propety damage
topprop
## EVTYPE PROPDMG2
## 1 TORNADO 40966206600
## 2 HURRICANE/TYPHOON 19403415000
## 3 FLOOD 12338106477
## 4 HURRICANE 9400719010
## 5 FLASH FLOOD 8100968357
## 6 HAIL 7617562203
## 7 TROPICAL STORM 7127402000
## 8 WINTER STORM 5484019751
## 9 RIVER FLOOD 5118945500
## 10 HIGH WIND 3888506055
## 11 TSTM WIND 3604209365
## 12 HURRICANE OPAL 3152846020
## 13 WILD/FOREST FIRE 3001829500
## 14 ICE STORM 2953061060
## 15 HEAVY RAIN/SEVERE WEATHER 2500000000
## 16 WILDFIRE 2353816030
## 17 THUNDERSTORM WINDS 1733459026
## 18 TORNADOES, TSTM WIND, HAIL 1600000000
## 19 SEVERE THUNDERSTORM 1205360000
## 20 DROUGHT 845298000
topcrop <- arrange(dat2,desc(CROPDMG2))[1:20,c(1,3)] #top 20 contributed events to crop damage
topcrop
## EVTYPE CROPDMG2
## 1 DROUGHT 9860245000
## 2 RIVER FLOOD 5029459000
## 3 ICE STORM 5021998500
## 4 HURRICANE 2561400000
## 5 FLOOD 2333949550
## 6 HAIL 1945373390
## 7 EXTREME COLD 1292973000
## 8 FLASH FLOOD 642780600
## 9 HURRICANE/TYPHOON 586770800
## 10 TSTM WIND 510661800
## 11 HIGH WIND 503941300
## 12 FREEZE 446225000
## 13 HEAVY RAIN 423217800
## 14 HEAT 401285000
## 15 TROPICAL STORM 286135000
## 16 DAMAGING FREEZE 262100000
## 17 TORNADO 216014370
## 18 THUNDERSTORM WINDS 190650792
## 19 EXCESSIVE WETNESS 142000000
## 20 HURRICANE ERIN 136010000
topeco <- arrange(dat2,desc(ECONOMICS))[1:20,c(1,4)] #top 20 contributed events to sum of property and crop damege (economics)
topeco
## EVTYPE ECONOMICS
## 1 TORNADO 41182220970
## 2 HURRICANE/TYPHOON 19990185800
## 3 FLOOD 14672056027
## 4 HURRICANE 11962119010
## 5 DROUGHT 10705543000
## 6 RIVER FLOOD 10148404500
## 7 HAIL 9562935593
## 8 FLASH FLOOD 8743748957
## 9 ICE STORM 7975059560
## 10 TROPICAL STORM 7413537000
## 11 WINTER STORM 5510370751
## 12 HIGH WIND 4392447355
## 13 TSTM WIND 4114871165
## 14 HURRICANE OPAL 3161846030
## 15 WILD/FOREST FIRE 3108626330
## 16 HEAVY RAIN/SEVERE WEATHER 2500000000
## 17 WILDFIRE 2385281530
## 18 THUNDERSTORM WINDS 1924109818
## 19 TORNADOES, TSTM WIND, HAIL 1602500000
## 20 EXTREME COLD 1360710400
From dat2 we can obtain the top 20 harmful events with respect to propety, crop and sum of them. The barplots are illustrated as below.
a2=ggplot(topprop,aes(reorder(EVTYPE,PROPDMG2),PROPDMG2))+geom_bar(stat="identity")+coord_flip()+labs(y="Propety damages",x="Climatic events",title="Top 20 harmful events on propety damages")
b2=ggplot(topcrop,aes(reorder(EVTYPE,CROPDMG2),CROPDMG2))+geom_bar(stat="identity")+coord_flip()+labs(y="Crop damages",x="Climatic events",title="Top 20 harmful events on crop damages")
c2=ggplot(topeco,aes(reorder(EVTYPE,ECONOMICS),ECONOMICS))+geom_bar(stat="identity")+coord_flip()+labs(y="Total sum of propety and crop damages",x="Climatic events",title="Top 20 harmful events on sum of propety and crop damages")
grid.arrange(a2,b2,c2,ncol=1)
We conducted research on impact of various climatic events on public health and national economics in the U.S. Here some conlusions can be drawn as follows.
As for public health:
1. The top 3 events on public fatality are TORNADO, EXCESSIVE HEAT and CONTAMINATION
2. The top 3 events on public injury are TORNADO, FLOOD and TSTM WIND
3. The top 3 events on the sum of fatality and injury are TORNADO, TSTM WIND and FLOOD
4. TORNADO is the most harmful event, significanly, with respect to fatality, injury and the sum of them
As for national economics:
1. The top 3 events on propety damage are TORNADO, HURRICANE/TYPHOON and FLOOD
2. The top 3 events on crop damage are DROUGHT, RIVER FLOOD and ICE STORM
3. The top 3 events on the sum of fatality and injury damage are TORNADO, HURRICANE/TYPHOON and FLOOD
Generally, TORNADO is the most harmful event, with respect to both public health and national economics.