Most harmful severe weather events for public health and economy

By YT

SYNOPSIS

The following document describes the analysis for determining (1) the types of events that are most harful to population health and (2) the types of events that have the greatest economic consequences. Raw data were obtained from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database from 1950 to 2011. For these data, it was found that across the United States, tornados are most harmful with respect to population health (both fatalities and injuries), while floods are most harmful with respect to property damage, and droughts and floods are two most harmful weather events with respect to crop damage.

DATA PROCESSING

Two research questions are:
(1) Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
(2) Across the United States, which types of events have the greatest economic consequences?

Load and preprocess the data

Raw data are obtained from the comma-separated-value file compressed via the bzip2 algorithm downloaded from the course website.

knitr::opts_chunk$set(echo = TRUE,cache=TRUE)
df<-read.csv("repdata_data_StormData.csv.bz2",header=TRUE,as.is=TRUE,
             na.strings=",")

After reading the data, the next step is to explore the data set.

str(df)

## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

To answer the two reserach questions indicated above, the National Weather Service Storm Data Document was explored. The data set has to be reduced to include only relevant columns. More specifically,
EVTYPE - types of weather events
FATALITIES - number of fatalities
INJURIES - number of injuries
PROPDMG - property damage in USD (should be combined with the next column)
PROPDMGEXP - see section X
CROPDMG - crop damage in USD (should be combined with the next column)
CROPDMGEXP - see section X

The new data set includes 7 relevant columns.

data<-df[,c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
head(data)

##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0

RESEARCH QUESTION 1 Data Preparation and Analysis

For estimating which types of events are most harmful with respect to population health, two columns are considered separately FATALITIES and INJURIES.

QUESTION 1.1 FATALITIES

First, fatalities are considered. To reduce irrelevant data, the new data set, where FATALITIES equal to 0 are eliminated, is obtained.

dataF<-data[data$FATALITIES!=0,]
head(dataF)

##     EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 9  TORNADO          1       14    25.0          K       0           
## 13 TORNADO          1       26   250.0          K       0           
## 16 TORNADO          4       50    25.0          K       0           
## 26 TORNADO          1        8    25.0          K       0           
## 34 TORNADO          6      195     2.5          M       0           
## 36 TORNADO          7       12   250.0          K       0

Next, data are grouped by the weather event type (EVTYPE), and new data set with the total number of fatalities per weather event type is obtained. Columns are renamed. Top ten weather events with the highest fatality numbers are displayed.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

byEVENT<-group_by(dataF,EVTYPE)
fatal<-as.data.frame(summarize(byEVENT,sum(FATALITIES)))
names(fatal)<-c("eventtype","total")
head(fatal[order(fatal$total,decreasing=TRUE),],10)

##          eventtype total
## 141        TORNADO  5633
## 26  EXCESSIVE HEAT  1903
## 35     FLASH FLOOD   978
## 57            HEAT   937
## 97       LIGHTNING   816
## 145      TSTM WIND   504
## 40           FLOOD   470
## 116    RIP CURRENT   368
## 75       HIGH WIND   248
## 2        AVALANCHE   224

New variable with top five events that classify (according to the National Weather Service Storm Data Document) is created.

fatalcat<-c("Tornado","Heat","Flood","Lightning","Thunderstorm Wind")

Since there are many lines with event types that might contain the same key word (such as, for example, “tornado”), data are pulled into the corresponding categories with function called grep.

tornado<-fatal[grep("TORNADO|tornado|torn|Tornado",fatal$eventtype),]
tornado

##                      eventtype total
## 141                    TORNADO  5633
## 142 TORNADOES, TSTM WIND, HAIL    25
## 155         WATERSPOUT/TORNADO     3

tornadoT<-sum(tornado$total)
heat<-fatal[grep("HEAT|heat|Heat",fatal$eventtype),]
heatT<-sum(heat$total)
flood<-fatal[grep("FLOOD|flood|floo|Flood",fatal$eventtype),]
floodT<-sum(flood$total)
lightning<-fatal[grep("LIGHTNING|LIGHTN|Lightn|lightn",fatal$eventtype),]
lightningT<-sum(lightning$total)
tstm<-fatal[grep("THUNDERSTORM|TSTM|Thunderstorm|tstm",fatal$eventtype),]
tstmT<-sum(tstm$total)

Aggregated results from the total number of fatalities per event type are saved in a new variable. Data frame fatalities with the event type and total number of fatalities is created. See graph in the result section below (question 1).

totalfatal<-c(tornadoT,heatT,floodT,lightningT,tstmT)
fatalities<-data.frame(fatalcat,totalfatal)
fatalities

##            fatalcat totalfatal
## 1           Tornado       5661
## 2              Heat       3138
## 3             Flood       1525
## 4         Lightning        817
## 5 Thunderstorm Wind        754

QUESTION 1.2 INJURIES

Now, injuries are considered. To reduce irrelevant data, the new data set, where INJURIES equal to 0 are eliminated, is obtained.

dataI<-data[data$INJURIES!=0,]
head(dataI)

##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0           
## 7 TORNADO          0        1     2.5          K       0

Next, data are grouped by the weather event type (EVTYPE), and new data set with the total number of injuries per weather event type is obtained. Columns are renamed. Top ten weather events with the highest injury numbers are displayed.

byEVENT2<-group_by(dataI,EVTYPE)
injur<-as.data.frame(summarize(byEVENT2,sum(INJURIES)))
names(injur)<-c("eventtype","total")
head(injur[order(injur$total,decreasing=TRUE),],10)

##             eventtype total
## 129           TORNADO 91346
## 135         TSTM WIND  6957
## 30              FLOOD  6789
## 20     EXCESSIVE HEAT  6525
## 85          LIGHTNING  5230
## 47               HEAT  2100
## 79          ICE STORM  1975
## 28        FLASH FLOOD  1777
## 121 THUNDERSTORM WIND  1488
## 45               HAIL  1361

New variable with top five events that classify (according to the National Weather Service Storm Data Document) is created.

injurcat<-c("Tornado","Thunderstorm Wind","Flood","Heat","Lightning")

Since there are many lines with event types that might contain the same key word, data are pulled into the corresponding categories with function called grep.

tornado2<-injur[grep("TORNADO|tornado|torn|Tornado",injur$eventtype),]
tornado2T<-sum(tornado2$total)
tstm2<-injur[grep("TSTM|tstm|thunder|Thunder|THUNDER",injur$eventtype),]
tstm2T<-sum(tstm2$total)
flood2<-injur[grep("FLOOD|flood|floo|Flood",injur$eventtype),]
flood2T<-sum(flood2$total)
heat2<-injur[grep("HEAT|heat|Heat",injur$eventtype),]
heat2T<-sum(heat2$total)
lightning2<-injur[grep("LIGHTNING|LIGHTN|Lightn|lightn",injur$eventtype),]
lightning2T<-sum(lightning2$total)

Aggregated results from the total number of injuries per event type are saved in a new variable. Data frame injuries with the event type and total number of injuries is created. See graph in the result section below (question 1).

totalinjur<-c(tornado2T,tstm2T,flood2T,heat2T,lightning2T)
injuries <-data.frame(injurcat,totalinjur)
injuries

##            injurcat totalinjur
## 1           Tornado      91407
## 2 Thunderstorm Wind       9545
## 3             Flood       8604
## 4              Heat       9224
## 5         Lightning       5232

RESEARCH QUESTION 2 Data Preparation and Analysis

For estimating which types of events have the greatest economic consequences, four columns are considered in related pairs PROPDMG with PROPDMGEXP,and CROPDMG with CROPDMGEXP.

The information on the meaning of these columns was found at the following link How To Handle Exponent Value of PROPDMGEXP and CROPDMGEXP created by Soesilo Wijono on February 9, 2015.

From this post, the meaning of values from PROPDMGEXP and CROPDMGEXP columns can be obtained.
H,h - hundreds (a multiplier of 100)
K,k - thousands (a multiplier of 1000)
M,m - millions (a multipler of 10^6)
B - billions (a multipler of 10^9)
0..8 - numeric (a multipler of 10)
+ - plus (a multipler of 1)
- - minus (a multipler of 0)
? - question mark (a multipler of 0)
(blank) - blank (a multipler of 0)

The next steps will include replacing above values with corresponding numbers and combining the pairs of columns (e.g. PROPDMG*PROPDMGEXP) for furhter analysis.

QUESTION 2.1 PROPERTY DAMAGE

First, the data with 0 (USD) in property damage (column PROPDMG) are removed.

dataP<-data[data$PROPDMG!=0,]
head(dataP)

##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0

Unique values in column PROPDMGEXP are identified.

unique(data$PROPDMGEXP)

##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"

The percentage of data that has (+), a multiplier of 1, and numbers, a multiplier of 10, in the PROPDMGEXP data is estimated.

nrow(dataP[grep("\\+",dataP$PROPDMGEXP),])

## [1] 5

nrow(dataP[grep("[0-9]",dataP$PROPDMGEXP),])

## [1] 238

Next, PROPDMGEXP values that are equal to (-),(blank), a multiplier of 0, are excluded. The data with (+) and numbers in PROPDMGEXP are excluded as well, since the % of data that has these values in the column is very small (see output above).

dataP1<-dataP[dataP$PROPDMGEXP!="+"&dataP$PROPDMGEXP!="-"&dataP$PROPDMGEXP!=8&
                  dataP$PROPDMGEXP!=7&dataP$PROPDMGEXP!=6&dataP$PROPDMGEXP!=5&
                  dataP$PROPDMGEXP!=4&dataP$PROPDMGEXP!=3&dataP$PROPDMGEXP!=2&
                  dataP$PROPDMGEXP!=1&dataP$PROPDMGEXP!=0&dataP$PROPDMGEXP!=""
              ,]

Unique values in PROPDMGEXP column are verified - only letters are left.

unique(dataP1$PROPDMGEXP)

## [1] "K" "M" "B" "m" "h" "H"

Replace the letters in column PROPDMGEXP with the corresponding values and convert the column to numeric.

dataP1$PROPDMGEXP<-gsub("[Hh]",100,dataP1$PROPDMGEXP)
dataP1$PROPDMGEXP<-gsub("K",1000,dataP1$PROPDMGEXP)
dataP1$PROPDMGEXP<-gsub("[Mm]",1000000,dataP1$PROPDMGEXP)
dataP1$PROPDMGEXP<-gsub("B",1000000000,dataP1$PROPDMGEXP)
dataP1$PROPDMGEXP<-as.numeric(dataP1$PROPDMGEXP)

To estimate the property damage, a new column is created. It combines the number in PROPDMG and the corresponding multiplier to get the total USD.

library(dplyr)
dataP1<-mutate(dataP1,propdamage=PROPDMG*PROPDMGEXP)
tail(dataP1)

##              EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 238849 WINTER STORM          0        0     2.0       1000       0          K
## 238850 WINTER STORM          0        0     5.0       1000       0          K
## 238851  STRONG WIND          0        0     0.6       1000       0          K
## 238852  STRONG WIND          0        0     1.0       1000       0          K
## 238853      DROUGHT          0        0     2.0       1000       0          K
## 238854    HIGH WIND          0        0     7.5       1000       0          K
##        propdamage
## 238849       2000
## 238850       5000
## 238851        600
## 238852       1000
## 238853       2000
## 238854       7500

Next, data are grouped by the weather event type, and the new data set with the total cost in USD per weather event type is obtained. Columns are renamed. Top ten weather events with the highest cost are displayed.

byEVENT3<-group_by(dataP1,EVTYPE)
propertydamage<-as.data.frame(summarize(byEVENT3,sum(propdamage)))
names(propertydamage)<-c("eventtype","totalUSD")
head(propertydamage[order(propertydamage$totalUSD,decreasing=TRUE),],10)

##             eventtype     totalUSD
## 62              FLOOD 144657709800
## 178 HURRICANE/TYPHOON  69305840000
## 330           TORNADO  56937160480
## 278       STORM SURGE  43323536000
## 50        FLASH FLOOD  16140811510
## 103              HAIL  15732267220
## 170         HURRICANE  11868319010
## 338    TROPICAL STORM   7703890550
## 395      WINTER STORM   6688497250
## 155         HIGH WIND   5270046260

New variable with top six events that classify (according to the National Weather Service Storm Data Document) is created. Six events are used (instead of five similar to previous analysis above), because their cost is above 10 billion USD (11 digits, others have 10 or less).

propertycat<-c("Flood","Hurricane","Tornado","Storm Surge","Flash Flood","Hail")

Since there are many lines with event types that might contain the same key word, data are pulled into the corresponding categories with the grep function.

flood3<-propertydamage[grep("^Flood$|^FLOOD$|^Flood$",propertydamage$eventtype),]
flood3T<-sum(flood3$totalUSD)
hurricane3<-propertydamage[grep("Hurricane|HURRICANE|Hurricane|Typhoon|TYPHOON
                                |typhoon",propertydamage$eventtype),]
hurricane3T<-sum(hurricane3$totalUSD)
tornado3<-propertydamage[grep("TORNADO|Tornado|tornado",propertydamage$eventtype),]
tornado3T<-sum(tornado3$totalUSD)
stormsurge3<-propertydamage[grep("Storm Surge|Tide|STORM|Storm|storm|TIDE",propertydamage$eventtype),]
stormsurge3T<-sum(stormsurge3$totalUSD)
fflood3<-propertydamage[grep("^Flash Flood$|^FLASH FLOOD$",propertydamage$eventtype),]
fflood3T<-sum(fflood3$totalUSD)
hail3<-propertydamage[grep("HAIL",propertydamage$eventtype),]
hail3T<-sum(hail3$totalUSD)

Aggregated results from the total cost in USD for property damage per event type are saved in a new variable. Data frame property with the event type and total cost in USD is created. See graph in the result section below (question 2).

totalprdamage<-c(flood3T,hurricane3T,tornado3T,stormsurge3T,fflood3T,hail3T)
property<-data.frame(propertycat,totalprdamage)
property

##   propertycat totalprdamage
## 1       Flood  144657709800
## 2   Hurricane   84756180010
## 3     Tornado   58593097730
## 4 Storm Surge   73064803400
## 5 Flash Flood   16140811510
## 6        Hail   17619991220

QUESTION 2.2 CROP DAMAGE

First, the data with 0 (USD) in crop (column CROPDMG) are removed.

dataC<-data[data$CROPDMG!=0,]
head(dataC)

##                           EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 187566 HURRICANE OPAL/HIGH WINDS          2        0     0.1          B      10
## 187571        THUNDERSTORM WINDS          0        0     5.0          M     500
## 187581            HURRICANE ERIN          0        0    25.0          M       1
## 187583            HURRICANE OPAL          0        0    48.0          M       4
## 187584            HURRICANE OPAL          0        0    20.0          m      10
## 187653        THUNDERSTORM WINDS          0        0    50.0          K      50
##        CROPDMGEXP
## 187566          M
## 187571          K
## 187581          M
## 187583          M
## 187584          m
## 187653          K

Column CROPDMGEXP has the following unique values.

unique(dataC$CROPDMGEXP)

## [1] "M" "K" "m" "B" "k" "0" ""

The percentage of data that has 0, a multiplier of 10, in the CROPDMGEXP column is estimated.

nrow(dataC[grep("0",dataC$CROPDMGEXP),])

## [1] 12

Next, exclude the data that have 0, a multiplier of 10 (column CROPDMGEXP), since the % of data that has these values in CROPDMGEXP column is very small. Also, exlude the data with (blank), a multiplier of 0.

dataC1<-dataC[dataC$CROPDMGEXP!=0&dataC$CROPDMGEXP!="",]

Veryfy the unique values in PROPDMGEXP column. Only letters are left.

unique(dataC1$CROPDMGEXP)

## [1] "M" "K" "m" "B" "k"

Replace the letters in column CROPDMGEXP with the corresponding values and convert the column to numeric.

dataC1$CROPDMGEXP<-gsub("[Mm]",1000000,dataC1$CROPDMGEXP)
dataC1$CROPDMGEXP<-gsub("[Kk]",1000,dataC1$CROPDMGEXP)
dataC1$CROPDMGEXP<-gsub("B",1000000000,dataC1$CROPDMGEXP)
dataC1$CROPDMGEXP<-as.numeric(dataC1$CROPDMGEXP)

To estimate the crop damage, a new column is created. It combines the number in CROPDMG and the corresponding multiplier in CROPDMGEXP to get the total USD.

dataC1<-mutate(dataC1,cropdamage=CROPDMG*CROPDMGEXP)
tail(dataC1)

##            EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 22079       FLOOD          0        0       1          K       1       1000
## 22080       FLOOD          0        0       1          K       1       1000
## 22081       FLOOD          0        0       1          K       1       1000
## 22082 STRONG WIND          0        0       0          K      20       1000
## 22083 STRONG WIND          0        0       0          K       2       1000
## 22084 STRONG WIND          0        0       0          K       1       1000
##       cropdamage
## 22079       1000
## 22080       1000
## 22081       1000
## 22082      20000
## 22083       2000
## 22084       1000

byEVENT4<-group_by(dataC1,EVTYPE)
CROPdamage<-as.data.frame(summarize(byEVENT4,sum(cropdamage)))
names(CROPdamage)<-c("eventtype","totalUSD")
head(CROPdamage[order(CROPdamage$totalUSD,decreasing=TRUE),],10)

##            eventtype    totalUSD
## 10           DROUGHT 13972566000
## 27             FLOOD  5661968450
## 78       RIVER FLOOD  5029459000
## 72         ICE STORM  5022113500
## 42              HAIL  3025954450
## 64         HURRICANE  2741910000
## 69 HURRICANE/TYPHOON  2607872800
## 23       FLASH FLOOD  1421317100
## 19      EXTREME COLD  1292973000
## 37      FROST/FREEZE  1094086000

New variable with top five events that classify (according to the National Weather Service Storm Data Document) is created.

cropcat<-c("Drought","Flood","Ice Storm","Hail","Hurricane")

Next, since there are many lines with event types that might contain the same key word, data are pulled into the corresponding categories with the grep function.

drought4<-CROPdamage[grep("Drought|DROUGHT",CROPdamage$eventtype),]
drought4T<-sum(drought4$totalUSD)
flood4<-CROPdamage[grep("FLOOD|Flood",CROPdamage$eventtype),]
flood4T<-sum(flood4$totalUSD)
ice4<-CROPdamage[grep("Ice Storm|ICE STORM|ICE",CROPdamage$eventtype),]
ice4T<-sum(ice4$totalUSD)
hail4<-CROPdamage[grep("HAIL",CROPdamage$eventtype),]
hail4T<-sum(hail4$totalUSD)
hurricane4<-CROPdamage[grep("HURRICANE|Typhoon",CROPdamage$eventtype),]
hurricane4T<-sum(hurricane4$totalUSD)

Aggregated results from the total cost in USD for crop damage per event type are saved in a new variable. Data frame crop the event type and total cost in USD is created. See graph in the result section below (question 2).

totalcropdamage<-c(drought4T,flood4T,ice4T,hail4T,hurricane4T)
crop<-data.frame(cropcat,totalcropdamage)
crop

##     cropcat totalcropdamage
## 1   Drought     13972621780
## 2     Flood     12380109100
## 3 Ice Storm      5027114300
## 4      Hail      3114212850
## 5 Hurricane      5515292800

RESULTS

Total numbers of fatalities, injuries and USD spent on property and crop damage are presented below.

RESEARCH QUESTION 1

The following code creates a ggplot bar chart with the total number of weather-related fatalities in the US from 1950 to 2011.

library(ggplot2)
library(stringr)
g<-ggplot(data=fatalities,aes(x=fatalcat,y=totalfatal))+
    geom_bar(stat="identity",color="blue",fill="white")+
    geom_text(aes(label=totalfatal), vjust=-0.3, size=3.5)+
    ggtitle("Weather-related Fatalities\n in the US in 1950-2011")+
    scale_x_discrete(labels = function(x) str_wrap(x, width = 10))+
    labs(x="Event Type",y="Total fatalities")+
    theme(axis.text.x = element_text(angle = 90, hjust = 1,vjust=0.5),
          plot.title = element_text(size = 12,hjust=0.5,face="bold"))

The following code creates a ggplot bar chart with the total number of weather-related injuries in the US from 1950 to 2011.

g2<-ggplot(data=injuries,aes(x=injurcat,y=totalinjur))+
    geom_bar(stat="identity",color="green",fill="white")+
    geom_text(aes(label=totalinjur), vjust=-0.3, size=3.5)+
    scale_x_discrete(labels = function(x) str_wrap(x, width = 10))+
    ggtitle("Weather-related Injuries\n in the US in 1950-2011")+
    labs(x="Event Type",y="Total Injuries")+
    theme(axis.text.x = element_text(angle = 90, hjust = 1,vjust=0.5),
          plot.title = element_text(size = 12,hjust=0.5,face="bold"))

To show the types of events that are most harmful with respect to population health, both graphs (for the total number of fatalities and injuries) are displayed side by side.

require(gridExtra)

## Loading required package: gridExtra

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

grid.arrange(g,g2,ncol=2)

RESEARCH QUESTION 2

The total cost in USD is very high both for the property and crop damage. Therefore, the variables with the aggregated results are changed to scientific format with two decimal places, so that they can be used as a label on the bars of the graph in a compact form.

newprop<-formatC(totalprdamage,format="e",digits=2)
newcrop<-formatC(totalcropdamage,format="e",digits=2)

The following code creates a ggplot bar chart with the total cost in USD for property damage in the US from 1950 to 2011.

g3<-ggplot(data=property,aes(x=propertycat,y=totalprdamage))+
    geom_bar(stat="identity",color="red",fill="white")+ 
    geom_text(aes(label=newprop), vjust=-0.3, size=3)+
    ggtitle("Weather-related Property Damage\n in the US in 1950-2011")+
    scale_x_discrete(labels = function(x) str_wrap(x, width = 10))+
    labs(x="Event Type",y="Total USD")+
    theme(axis.text.x = element_text(angle = 90, hjust = 1,vjust=0.5),
          plot.title = element_text(size = 12,hjust=0.5,face="bold"))

The following code creates a ggplot bar chart with the total cost in USD for crop damage in the US from 1950 to 2011.

g4<-ggplot(data=crop,aes(x=cropcat,y=totalcropdamage))+
    geom_bar(stat="identity",color="yellow",fill="white")+
    geom_text(aes(label=newcrop), vjust=-0.3, size=3)+
    ggtitle("Weather-related Crop Damage\n in the US in 1950-2011")+
    scale_x_discrete(labels = function(x) str_wrap(x, width = 10))+
    labs(x="Event Type",y="Total USD")+
    theme(axis.text.x = element_text(angle = 90, hjust = 1,vjust=0.5),
          plot.title = element_text(size = 12,hjust=0.5,face="bold"))

To show the types of events that have the greatest economic consequences, both graphs (for the total cost in USD for property and crop damage) are displayed side by side.

require(gridExtra)
grid.arrange(g3,g4,ncol=2)