Introduction

Across the United States there as been many dangerous weathers. Severe weathers cause injuries, hurt the economies, and many times lead to fatalities. The U.S. National Oceanic and Atmospheric Administration’s (NOAA) data from 1950 through 2011 keep track of such as information of when and where fatalities, injuries, and Economic Cost in harmful weather events.

Synopsis

This project is written to answer two question relating to the Storm data. They are:
1)Across the United States, which types of events (EVTYPE variable) are most harmful with respect to population health?
2)Across the United States, which types of events have the greatest economic consequences?

Data Preprocessing

if (!file.exists("repdata_data_StormData.csv.bz2")) {
     fileUrl<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
     download.file(fileUrl, destfile="repdata_data_StormData.csv.bz2", method="curl")
}
storm <- read.csv("repdata_data_StormData.csv.bz2")
str(storm)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

We need to select the appropriate data set for this exercise. We only need a few variables from the original data set to help answer the two questions. The date parameter can be ignored for this project.

selected_var<-c('EVTYPE','FATALITIES','INJURIES','PROPDMG','PROPDMGEXP')
data<-storm[,selected_var]

Check types of EVTYPE and Transform Data Further

unique(storm$EVTYPE) ##We find that there are 985 different types of event

We need to find someway to group the data because there exists 985 events. However, some of those 985 events are identical but are named differently.

data$GroupEvent<-'Other'
data$GroupEvent[grep('Snow', data$EVTYPE, ignore.case=TRUE)]<-'Snow'
data$GroupEvent[grep('Rain', data$EVTYPE, ignore.case=TRUE)]<-'Rain'
data$GroupEvent[grep('Hail', data$EVTYPE, ignore.case=TRUE)]<-'Hail'
data$GroupEvent[grep('Wind|WND',data$EVTYPE, ignore.case=TRUE)]<-'Wind'
data$GroupEvent[grep('Light|Thunder',data$EVTYPE,ignore.case=TRUE)]<-'Lighting'
data$GroupEvent[grep('Storm|Stm',data$EVTYPE,ignore.case=TRUE)]<-'Storm'
data$GroupEvent[grep('Blizz',data$EVTYPE,ignore.case=TRUE)]<-'Blizzard'
data$GroupEvent[grep('Flood',data$EVTYPE, ignore.case=TRUE)]<-'Flood'
data$GroupEvent[grep('Heat|Fire', data$EVTYPE, ignore.case=TRUE) ]<-'Heat'
data$GroupEvent[grep('Torn', data$EVTYPE, ignore.case=TRUE)]<-'Tornado'
sort(table(data$GroupEvent), decreasing = TRUE) ##This will give us the numbers of each event. 
## 
##    Storm     Hail    Flood  Tornado    Other     Wind     Snow Lighting 
##   351850   289270    82731    60701    34399    28146    17419    15983 
##     Rain     Heat Blizzard 
##    12165     6888     2745

Transform Data Even Further

data<-data%>%
    mutate(PROPDMGEXPFACTOR=case_when(
        PROPDMGEXP==''~10^0,
        PROPDMGEXP=='?'~10^0,
        PROPDMGEXP=='-'~10^0,
        PROPDMGEXP=='+'~10^0,
        PROPDMGEXP=='0'~10^0,
        PROPDMGEXP=='H'~10^2,
        PROPDMGEXP=='K'~10^3,
        PROPDMGEXP=='M'~10^6,
        PROPDMGEXP=='B'~10^9,
        PROPDMGEXP=='1'~10^1,
        PROPDMGEXP=='2'~10^2,
        PROPDMGEXP=='3'~10^3,
        PROPDMGEXP=='4'~10^4,
        PROPDMGEXP=='5'~10^5,
        PROPDMGEXP=='6'~10^6,
        PROPDMGEXP=='7'~10^7,
        PROPDMGEXP=='8'~10^8,
        )
    )
data$PROPDMGEXPFACTOR<-as.numeric(as.character(data$PROPDMGEXPFACTOR))
data$Econcost<-data$PROPDMG*data$PROPDMGEXPFACTO

Let us answer the first question

This require us to have data on Fatalities and Injuries

datah<-data%>%
    group_by(GroupEvent)%>%
    summarise(death<-sum(FATALITIES))
## `summarise()` ungrouping output (override with `.groups` argument)
datah<-data.frame(datah)
datah<-datah%>%
    rename(
        FATALITIES=death....sum.FATALITIES.
    )
datah<-arrange(datah, -FATALITIES)
gplot<-ggplot(datah[1:5,],aes(x=reorder(GroupEvent,-FATALITIES),y=FATALITIES,colour=GroupEvent))
gplot<-gplot+geom_bar(stat="identity", fill='grey')
gplot<-gplot + theme(plot.background = element_rect(fill = "#BFD5E3"),
                     panel.background = element_rect(fill='white'))+
    xlab('Types of Event') + ylab('FATALITIES') + 
    ggtitle('Top 5 Most Harmful Weather and Their Fatalities ') +
    theme(plot.title = element_text(hjust = 0.5))
gplot

We find that Tornado causes the most deaths.

datah2<-data%>%
    group_by(GroupEvent)%>%
    summarise(injury<-sum(INJURIES))
## `summarise()` ungrouping output (override with `.groups` argument)
datah2<-data.frame(datah2)
datah2<-datah2%>%
    rename(
        injury=injury....sum.INJURIES.
    )
datah2<-datah2[order(-datah2$injury),]
gplot<-ggplot(datah2[1:5,],aes(x=reorder(GroupEvent,-injury),y=injury,colour=GroupEvent))
gplot<-gplot+geom_bar(stat="identity", fill='grey')
gplot<-gplot + theme(plot.background = element_rect(fill = "#BFD5E3"),
                     panel.background = element_rect(fill='white'))+
    xlab('Types of Event') + ylab('INJURIES') + 
    ggtitle('Top 5 Most Harmful Weather and Their Injuries ') +
    theme(plot.title = element_text(hjust = 0.5))
gplot

We find that Tornado causes the most injuries.


Let us answer question two

str(data)
## 'data.frame':    902297 obs. of  8 variables:
##  $ EVTYPE          : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES      : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES        : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG         : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP      : chr  "K" "K" "K" "K" ...
##  $ GroupEvent      : chr  "Tornado" "Tornado" "Tornado" "Tornado" ...
##  $ PROPDMGEXPFACTOR: num  1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
##  $ Econcost        : num  25000 2500 25000 2500 2500 2500 2500 2500 25000 25000 ...
dataE<-data%>%
    group_by(GroupEvent)%>%
    summarise(Econ_Cost<-sum(Econcost))
## `summarise()` ungrouping output (override with `.groups` argument)
dataE<-data.frame(dataE) 
dataE<-dataE%>%
    rename(
        Econ_Cost=Econ_Cost....sum.Econcost.
    )
dataE<-dataE[order(-dataE$Econ_Cost),]  
dataE$Cost<-paste('$',formatC(dataE$Econ_Cost, big.mark=',', format = 'f'))    
gplot<-ggplot(dataE[1:5,],aes(x=reorder(GroupEvent,-Econ_Cost),y=Econ_Cost,colour=GroupEvent))
gplot<-gplot+geom_bar(stat="identity", fill='grey')
gplot<-gplot + theme(plot.background = element_rect(fill = "#BFD5E3"),
                     panel.background = element_rect(fill='white'))+
    xlab('Types of Event') + ylab('Economic Cost') + 
    ggtitle('Top 5 Most Harmful Weathers to the Economy') +
    theme(plot.title = element_text(hjust = 0.5))
gplot

We find that Flood causes the most Economic damage.


Results

We found that Tornado caused the most fatalities and injuries. While, the most Economic damaged was from Flood.