The following document describes the analysis for determining (1) the types of events that are most harful to population health and (2) the types of events that have the greatest economic consequences. Raw data were obtained from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database from 1950 to 2011. For these data, it was found that across the United States, tornados are most harmful with respect to population health (both fatalities and injuries), while floods are most harmful with respect to property damage, and droughts and floods are two most harmful weather events with respect to crop damage.
Two research questions are:
(1) Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
(2) Across the United States, which types of events have the greatest economic consequences?
Raw data are obtained from the comma-separated-value file compressed via the bzip2 algorithm downloaded from the course website.
knitr::opts_chunk$set(echo = TRUE,cache=TRUE)
df<-read.csv("repdata_data_StormData.csv.bz2",header=TRUE,as.is=TRUE,
na.strings=",")
After reading the data, the next step is to explore the data set.
str(df)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
To answer the two reserach questions indicated above, the National Weather Service Storm Data Document was explored. The data set has to be reduced to include only relevant columns. More specifically,
EVTYPE - types of weather events
FATALITIES - number of fatalities
INJURIES - number of injuries
PROPDMG - property damage in USD (should be combined with the next column)
PROPDMGEXP - see section X
CROPDMG - crop damage in USD (should be combined with the next column)
CROPDMGEXP - see section X
The new data set includes 7 relevant columns.
data<-df[,c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
head(data)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
For estimating which types of events are most harmful with respect to population health, two columns are considered separately FATALITIES and INJURIES.
First, fatalities are considered. To reduce irrelevant data, the new data set, where FATALITIES equal to 0 are eliminated, is obtained.
dataF<-data[data$FATALITIES!=0,]
head(dataF)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 9 TORNADO 1 14 25.0 K 0
## 13 TORNADO 1 26 250.0 K 0
## 16 TORNADO 4 50 25.0 K 0
## 26 TORNADO 1 8 25.0 K 0
## 34 TORNADO 6 195 2.5 M 0
## 36 TORNADO 7 12 250.0 K 0
Next, data are grouped by the weather event type (EVTYPE), and new data set with the total number of fatalities per weather event type is obtained. Columns are renamed. Top ten weather events with the highest fatality numbers are displayed.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
byEVENT<-group_by(dataF,EVTYPE)
fatal<-as.data.frame(summarize(byEVENT,sum(FATALITIES)))
names(fatal)<-c("eventtype","total")
head(fatal[order(fatal$total,decreasing=TRUE),],10)
## eventtype total
## 141 TORNADO 5633
## 26 EXCESSIVE HEAT 1903
## 35 FLASH FLOOD 978
## 57 HEAT 937
## 97 LIGHTNING 816
## 145 TSTM WIND 504
## 40 FLOOD 470
## 116 RIP CURRENT 368
## 75 HIGH WIND 248
## 2 AVALANCHE 224
New variable with top five events that classify (according to the National Weather Service Storm Data Document) is created.
fatalcat<-c("Tornado","Heat","Flood","Lightning","Thunderstorm Wind")
Since there are many lines with event types that might contain the same key word (such as, for example, “tornado”), data are pulled into the corresponding categories with function called grep.
tornado<-fatal[grep("TORNADO|tornado|torn|Tornado",fatal$eventtype),]
tornado
## eventtype total
## 141 TORNADO 5633
## 142 TORNADOES, TSTM WIND, HAIL 25
## 155 WATERSPOUT/TORNADO 3
tornadoT<-sum(tornado$total)
heat<-fatal[grep("HEAT|heat|Heat",fatal$eventtype),]
heatT<-sum(heat$total)
flood<-fatal[grep("FLOOD|flood|floo|Flood",fatal$eventtype),]
floodT<-sum(flood$total)
lightning<-fatal[grep("LIGHTNING|LIGHTN|Lightn|lightn",fatal$eventtype),]
lightningT<-sum(lightning$total)
tstm<-fatal[grep("THUNDERSTORM|TSTM|Thunderstorm|tstm",fatal$eventtype),]
tstmT<-sum(tstm$total)
Aggregated results from the total number of fatalities per event type are saved in a new variable. Data frame fatalities with the event type and total number of fatalities is created. See graph in the result section below (question 1).
totalfatal<-c(tornadoT,heatT,floodT,lightningT,tstmT)
fatalities<-data.frame(fatalcat,totalfatal)
fatalities
## fatalcat totalfatal
## 1 Tornado 5661
## 2 Heat 3138
## 3 Flood 1525
## 4 Lightning 817
## 5 Thunderstorm Wind 754
Now, injuries are considered. To reduce irrelevant data, the new data set, where INJURIES equal to 0 are eliminated, is obtained.
dataI<-data[data$INJURIES!=0,]
head(dataI)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
## 7 TORNADO 0 1 2.5 K 0
Next, data are grouped by the weather event type (EVTYPE), and new data set with the total number of injuries per weather event type is obtained. Columns are renamed. Top ten weather events with the highest injury numbers are displayed.
byEVENT2<-group_by(dataI,EVTYPE)
injur<-as.data.frame(summarize(byEVENT2,sum(INJURIES)))
names(injur)<-c("eventtype","total")
head(injur[order(injur$total,decreasing=TRUE),],10)
## eventtype total
## 129 TORNADO 91346
## 135 TSTM WIND 6957
## 30 FLOOD 6789
## 20 EXCESSIVE HEAT 6525
## 85 LIGHTNING 5230
## 47 HEAT 2100
## 79 ICE STORM 1975
## 28 FLASH FLOOD 1777
## 121 THUNDERSTORM WIND 1488
## 45 HAIL 1361
New variable with top five events that classify (according to the National Weather Service Storm Data Document) is created.
injurcat<-c("Tornado","Thunderstorm Wind","Flood","Heat","Lightning")
Since there are many lines with event types that might contain the same key word, data are pulled into the corresponding categories with function called grep.
tornado2<-injur[grep("TORNADO|tornado|torn|Tornado",injur$eventtype),]
tornado2T<-sum(tornado2$total)
tstm2<-injur[grep("TSTM|tstm|thunder|Thunder|THUNDER",injur$eventtype),]
tstm2T<-sum(tstm2$total)
flood2<-injur[grep("FLOOD|flood|floo|Flood",injur$eventtype),]
flood2T<-sum(flood2$total)
heat2<-injur[grep("HEAT|heat|Heat",injur$eventtype),]
heat2T<-sum(heat2$total)
lightning2<-injur[grep("LIGHTNING|LIGHTN|Lightn|lightn",injur$eventtype),]
lightning2T<-sum(lightning2$total)
Aggregated results from the total number of injuries per event type are saved in a new variable. Data frame injuries with the event type and total number of injuries is created. See graph in the result section below (question 1).
totalinjur<-c(tornado2T,tstm2T,flood2T,heat2T,lightning2T)
injuries <-data.frame(injurcat,totalinjur)
injuries
## injurcat totalinjur
## 1 Tornado 91407
## 2 Thunderstorm Wind 9545
## 3 Flood 8604
## 4 Heat 9224
## 5 Lightning 5232
For estimating which types of events have the greatest economic consequences, four columns are considered in related pairs PROPDMG with PROPDMGEXP,and CROPDMG with CROPDMGEXP.
The information on the meaning of these columns was found at the following link How To Handle Exponent Value of PROPDMGEXP and CROPDMGEXP created by Soesilo Wijono on February 9, 2015.
From this post, the meaning of values from PROPDMGEXP and CROPDMGEXP columns can be obtained.
H,h - hundreds (a multiplier of 100)
K,k - thousands (a multiplier of 1000)
M,m - millions (a multipler of 10^6)
B - billions (a multipler of 10^9)
0..8 - numeric (a multipler of 10)
+ - plus (a multipler of 1)
- - minus (a multipler of 0)
? - question mark (a multipler of 0)
(blank) - blank (a multipler of 0)
The next steps will include replacing above values with corresponding numbers and combining the pairs of columns (e.g. PROPDMG*PROPDMGEXP) for furhter analysis.
First, the data with 0 (USD) in property damage (column PROPDMG) are removed.
dataP<-data[data$PROPDMG!=0,]
head(dataP)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
Unique values in column PROPDMGEXP are identified.
unique(data$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
The percentage of data that has (+), a multiplier of 1, and numbers, a multiplier of 10, in the PROPDMGEXP data is estimated.
nrow(dataP[grep("\\+",dataP$PROPDMGEXP),])
## [1] 5
nrow(dataP[grep("[0-9]",dataP$PROPDMGEXP),])
## [1] 238
Next, PROPDMGEXP values that are equal to (-),(blank), a multiplier of 0, are excluded. The data with (+) and numbers in PROPDMGEXP are excluded as well, since the % of data that has these values in the column is very small (see output above).
dataP1<-dataP[dataP$PROPDMGEXP!="+"&dataP$PROPDMGEXP!="-"&dataP$PROPDMGEXP!=8&
dataP$PROPDMGEXP!=7&dataP$PROPDMGEXP!=6&dataP$PROPDMGEXP!=5&
dataP$PROPDMGEXP!=4&dataP$PROPDMGEXP!=3&dataP$PROPDMGEXP!=2&
dataP$PROPDMGEXP!=1&dataP$PROPDMGEXP!=0&dataP$PROPDMGEXP!=""
,]
Unique values in PROPDMGEXP column are verified - only letters are left.
unique(dataP1$PROPDMGEXP)
## [1] "K" "M" "B" "m" "h" "H"
Replace the letters in column PROPDMGEXP with the corresponding values and convert the column to numeric.
dataP1$PROPDMGEXP<-gsub("[Hh]",100,dataP1$PROPDMGEXP)
dataP1$PROPDMGEXP<-gsub("K",1000,dataP1$PROPDMGEXP)
dataP1$PROPDMGEXP<-gsub("[Mm]",1000000,dataP1$PROPDMGEXP)
dataP1$PROPDMGEXP<-gsub("B",1000000000,dataP1$PROPDMGEXP)
dataP1$PROPDMGEXP<-as.numeric(dataP1$PROPDMGEXP)
To estimate the property damage, a new column is created. It combines the number in PROPDMG and the corresponding multiplier to get the total USD.
library(dplyr)
dataP1<-mutate(dataP1,propdamage=PROPDMG*PROPDMGEXP)
tail(dataP1)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 238849 WINTER STORM 0 0 2.0 1000 0 K
## 238850 WINTER STORM 0 0 5.0 1000 0 K
## 238851 STRONG WIND 0 0 0.6 1000 0 K
## 238852 STRONG WIND 0 0 1.0 1000 0 K
## 238853 DROUGHT 0 0 2.0 1000 0 K
## 238854 HIGH WIND 0 0 7.5 1000 0 K
## propdamage
## 238849 2000
## 238850 5000
## 238851 600
## 238852 1000
## 238853 2000
## 238854 7500
Next, data are grouped by the weather event type, and the new data set with the total cost in USD per weather event type is obtained. Columns are renamed. Top ten weather events with the highest cost are displayed.
byEVENT3<-group_by(dataP1,EVTYPE)
propertydamage<-as.data.frame(summarize(byEVENT3,sum(propdamage)))
names(propertydamage)<-c("eventtype","totalUSD")
head(propertydamage[order(propertydamage$totalUSD,decreasing=TRUE),],10)
## eventtype totalUSD
## 62 FLOOD 144657709800
## 178 HURRICANE/TYPHOON 69305840000
## 330 TORNADO 56937160480
## 278 STORM SURGE 43323536000
## 50 FLASH FLOOD 16140811510
## 103 HAIL 15732267220
## 170 HURRICANE 11868319010
## 338 TROPICAL STORM 7703890550
## 395 WINTER STORM 6688497250
## 155 HIGH WIND 5270046260
New variable with top six events that classify (according to the National Weather Service Storm Data Document) is created. Six events are used (instead of five similar to previous analysis above), because their cost is above 10 billion USD (11 digits, others have 10 or less).
propertycat<-c("Flood","Hurricane","Tornado","Storm Surge","Flash Flood","Hail")
Since there are many lines with event types that might contain the same key word, data are pulled into the corresponding categories with the grep function.
flood3<-propertydamage[grep("^Flood$|^FLOOD$|^Flood$",propertydamage$eventtype),]
flood3T<-sum(flood3$totalUSD)
hurricane3<-propertydamage[grep("Hurricane|HURRICANE|Hurricane|Typhoon|TYPHOON
|typhoon",propertydamage$eventtype),]
hurricane3T<-sum(hurricane3$totalUSD)
tornado3<-propertydamage[grep("TORNADO|Tornado|tornado",propertydamage$eventtype),]
tornado3T<-sum(tornado3$totalUSD)
stormsurge3<-propertydamage[grep("Storm Surge|Tide|STORM|Storm|storm|TIDE",propertydamage$eventtype),]
stormsurge3T<-sum(stormsurge3$totalUSD)
fflood3<-propertydamage[grep("^Flash Flood$|^FLASH FLOOD$",propertydamage$eventtype),]
fflood3T<-sum(fflood3$totalUSD)
hail3<-propertydamage[grep("HAIL",propertydamage$eventtype),]
hail3T<-sum(hail3$totalUSD)
Aggregated results from the total cost in USD for property damage per event type are saved in a new variable. Data frame property with the event type and total cost in USD is created. See graph in the result section below (question 2).
totalprdamage<-c(flood3T,hurricane3T,tornado3T,stormsurge3T,fflood3T,hail3T)
property<-data.frame(propertycat,totalprdamage)
property
## propertycat totalprdamage
## 1 Flood 144657709800
## 2 Hurricane 84756180010
## 3 Tornado 58593097730
## 4 Storm Surge 73064803400
## 5 Flash Flood 16140811510
## 6 Hail 17619991220
First, the data with 0 (USD) in crop (column CROPDMG) are removed.
dataC<-data[data$CROPDMG!=0,]
head(dataC)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 187566 HURRICANE OPAL/HIGH WINDS 2 0 0.1 B 10
## 187571 THUNDERSTORM WINDS 0 0 5.0 M 500
## 187581 HURRICANE ERIN 0 0 25.0 M 1
## 187583 HURRICANE OPAL 0 0 48.0 M 4
## 187584 HURRICANE OPAL 0 0 20.0 m 10
## 187653 THUNDERSTORM WINDS 0 0 50.0 K 50
## CROPDMGEXP
## 187566 M
## 187571 K
## 187581 M
## 187583 M
## 187584 m
## 187653 K
Column CROPDMGEXP has the following unique values.
unique(dataC$CROPDMGEXP)
## [1] "M" "K" "m" "B" "k" "0" ""
The percentage of data that has 0, a multiplier of 10, in the CROPDMGEXP column is estimated.
nrow(dataC[grep("0",dataC$CROPDMGEXP),])
## [1] 12
Next, exclude the data that have 0, a multiplier of 10 (column CROPDMGEXP), since the % of data that has these values in CROPDMGEXP column is very small. Also, exlude the data with (blank), a multiplier of 0.
dataC1<-dataC[dataC$CROPDMGEXP!=0&dataC$CROPDMGEXP!="",]
Veryfy the unique values in PROPDMGEXP column. Only letters are left.
unique(dataC1$CROPDMGEXP)
## [1] "M" "K" "m" "B" "k"
Replace the letters in column CROPDMGEXP with the corresponding values and convert the column to numeric.
dataC1$CROPDMGEXP<-gsub("[Mm]",1000000,dataC1$CROPDMGEXP)
dataC1$CROPDMGEXP<-gsub("[Kk]",1000,dataC1$CROPDMGEXP)
dataC1$CROPDMGEXP<-gsub("B",1000000000,dataC1$CROPDMGEXP)
dataC1$CROPDMGEXP<-as.numeric(dataC1$CROPDMGEXP)
To estimate the crop damage, a new column is created. It combines the number in CROPDMG and the corresponding multiplier in CROPDMGEXP to get the total USD.
dataC1<-mutate(dataC1,cropdamage=CROPDMG*CROPDMGEXP)
tail(dataC1)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 22079 FLOOD 0 0 1 K 1 1000
## 22080 FLOOD 0 0 1 K 1 1000
## 22081 FLOOD 0 0 1 K 1 1000
## 22082 STRONG WIND 0 0 0 K 20 1000
## 22083 STRONG WIND 0 0 0 K 2 1000
## 22084 STRONG WIND 0 0 0 K 1 1000
## cropdamage
## 22079 1000
## 22080 1000
## 22081 1000
## 22082 20000
## 22083 2000
## 22084 1000
Next, data are grouped by the weather event type, and the new data set with the total cost in USD per weather event type is obtained. Columns are renamed. Top ten weather events with the highest cost are displayed.
byEVENT4<-group_by(dataC1,EVTYPE)
CROPdamage<-as.data.frame(summarize(byEVENT4,sum(cropdamage)))
names(CROPdamage)<-c("eventtype","totalUSD")
head(CROPdamage[order(CROPdamage$totalUSD,decreasing=TRUE),],10)
## eventtype totalUSD
## 10 DROUGHT 13972566000
## 27 FLOOD 5661968450
## 78 RIVER FLOOD 5029459000
## 72 ICE STORM 5022113500
## 42 HAIL 3025954450
## 64 HURRICANE 2741910000
## 69 HURRICANE/TYPHOON 2607872800
## 23 FLASH FLOOD 1421317100
## 19 EXTREME COLD 1292973000
## 37 FROST/FREEZE 1094086000
New variable with top five events that classify (according to the National Weather Service Storm Data Document) is created.
cropcat<-c("Drought","Flood","Ice Storm","Hail","Hurricane")
Next, since there are many lines with event types that might contain the same key word, data are pulled into the corresponding categories with the grep function.
drought4<-CROPdamage[grep("Drought|DROUGHT",CROPdamage$eventtype),]
drought4T<-sum(drought4$totalUSD)
flood4<-CROPdamage[grep("FLOOD|Flood",CROPdamage$eventtype),]
flood4T<-sum(flood4$totalUSD)
ice4<-CROPdamage[grep("Ice Storm|ICE STORM|ICE",CROPdamage$eventtype),]
ice4T<-sum(ice4$totalUSD)
hail4<-CROPdamage[grep("HAIL",CROPdamage$eventtype),]
hail4T<-sum(hail4$totalUSD)
hurricane4<-CROPdamage[grep("HURRICANE|Typhoon",CROPdamage$eventtype),]
hurricane4T<-sum(hurricane4$totalUSD)
Aggregated results from the total cost in USD for crop damage per event type are saved in a new variable. Data frame crop the event type and total cost in USD is created. See graph in the result section below (question 2).
totalcropdamage<-c(drought4T,flood4T,ice4T,hail4T,hurricane4T)
crop<-data.frame(cropcat,totalcropdamage)
crop
## cropcat totalcropdamage
## 1 Drought 13972621780
## 2 Flood 12380109100
## 3 Ice Storm 5027114300
## 4 Hail 3114212850
## 5 Hurricane 5515292800
Total numbers of fatalities, injuries and USD spent on property and crop damage are presented below.
The following code creates a ggplot bar chart with the total number of weather-related fatalities in the US from 1950 to 2011.
library(ggplot2)
library(stringr)
g<-ggplot(data=fatalities,aes(x=fatalcat,y=totalfatal))+
geom_bar(stat="identity",color="blue",fill="white")+
geom_text(aes(label=totalfatal), vjust=-0.3, size=3.5)+
ggtitle("Weather-related Fatalities\n in the US in 1950-2011")+
scale_x_discrete(labels = function(x) str_wrap(x, width = 10))+
labs(x="Event Type",y="Total fatalities")+
theme(axis.text.x = element_text(angle = 90, hjust = 1,vjust=0.5),
plot.title = element_text(size = 12,hjust=0.5,face="bold"))
The following code creates a ggplot bar chart with the total number of weather-related injuries in the US from 1950 to 2011.
g2<-ggplot(data=injuries,aes(x=injurcat,y=totalinjur))+
geom_bar(stat="identity",color="green",fill="white")+
geom_text(aes(label=totalinjur), vjust=-0.3, size=3.5)+
scale_x_discrete(labels = function(x) str_wrap(x, width = 10))+
ggtitle("Weather-related Injuries\n in the US in 1950-2011")+
labs(x="Event Type",y="Total Injuries")+
theme(axis.text.x = element_text(angle = 90, hjust = 1,vjust=0.5),
plot.title = element_text(size = 12,hjust=0.5,face="bold"))
To show the types of events that are most harmful with respect to population health, both graphs (for the total number of fatalities and injuries) are displayed side by side.
require(gridExtra)
## Loading required package: gridExtra
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
grid.arrange(g,g2,ncol=2)
The total cost in USD is very high both for the property and crop damage. Therefore, the variables with the aggregated results are changed to scientific format with two decimal places, so that they can be used as a label on the bars of the graph in a compact form.
newprop<-formatC(totalprdamage,format="e",digits=2)
newcrop<-formatC(totalcropdamage,format="e",digits=2)
The following code creates a ggplot bar chart with the total cost in USD for property damage in the US from 1950 to 2011.
g3<-ggplot(data=property,aes(x=propertycat,y=totalprdamage))+
geom_bar(stat="identity",color="red",fill="white")+
geom_text(aes(label=newprop), vjust=-0.3, size=3)+
ggtitle("Weather-related Property Damage\n in the US in 1950-2011")+
scale_x_discrete(labels = function(x) str_wrap(x, width = 10))+
labs(x="Event Type",y="Total USD")+
theme(axis.text.x = element_text(angle = 90, hjust = 1,vjust=0.5),
plot.title = element_text(size = 12,hjust=0.5,face="bold"))
The following code creates a ggplot bar chart with the total cost in USD for crop damage in the US from 1950 to 2011.
g4<-ggplot(data=crop,aes(x=cropcat,y=totalcropdamage))+
geom_bar(stat="identity",color="yellow",fill="white")+
geom_text(aes(label=newcrop), vjust=-0.3, size=3)+
ggtitle("Weather-related Crop Damage\n in the US in 1950-2011")+
scale_x_discrete(labels = function(x) str_wrap(x, width = 10))+
labs(x="Event Type",y="Total USD")+
theme(axis.text.x = element_text(angle = 90, hjust = 1,vjust=0.5),
plot.title = element_text(size = 12,hjust=0.5,face="bold"))
To show the types of events that have the greatest economic consequences, both graphs (for the total cost in USD for property and crop damage) are displayed side by side.
require(gridExtra)
grid.arrange(g3,g4,ncol=2)