Across the United States there as been many dangerous weathers. Severe weathers cause injuries, hurt the economies, and many times lead to fatalities. The U.S. National Oceanic and Atmospheric Administration’s (NOAA) data from 1950 through 2011 keep track of such as information of when and where fatalities, injuries, and Economic Cost in harmful weather events.
This project is written to answer two question relating to the Storm data. They are:
1)Across the United States, which types of events (EVTYPE variable) are most harmful with respect to population health?
2)Across the United States, which types of events have the greatest economic consequences?
if (!file.exists("repdata_data_StormData.csv.bz2")) {
fileUrl<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile="repdata_data_StormData.csv.bz2", method="curl")
}
storm <- read.csv("repdata_data_StormData.csv.bz2")
str(storm)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
We need to select the appropriate data set for this exercise. We only need a few variables from the original data set to help answer the two questions. The date parameter can be ignored for this project.
selected_var<-c('EVTYPE','FATALITIES','INJURIES','PROPDMG','PROPDMGEXP')
data<-storm[,selected_var]
unique(storm$EVTYPE) ##We find that there are 985 different types of event
We need to find someway to group the data because there exists 985 events. However, some of those 985 events are identical but are named differently.
data$GroupEvent<-'Other'
data$GroupEvent[grep('Snow', data$EVTYPE, ignore.case=TRUE)]<-'Snow'
data$GroupEvent[grep('Rain', data$EVTYPE, ignore.case=TRUE)]<-'Rain'
data$GroupEvent[grep('Hail', data$EVTYPE, ignore.case=TRUE)]<-'Hail'
data$GroupEvent[grep('Wind|WND',data$EVTYPE, ignore.case=TRUE)]<-'Wind'
data$GroupEvent[grep('Light|Thunder',data$EVTYPE,ignore.case=TRUE)]<-'Lighting'
data$GroupEvent[grep('Storm|Stm',data$EVTYPE,ignore.case=TRUE)]<-'Storm'
data$GroupEvent[grep('Blizz',data$EVTYPE,ignore.case=TRUE)]<-'Blizzard'
data$GroupEvent[grep('Flood',data$EVTYPE, ignore.case=TRUE)]<-'Flood'
data$GroupEvent[grep('Heat|Fire', data$EVTYPE, ignore.case=TRUE) ]<-'Heat'
data$GroupEvent[grep('Torn', data$EVTYPE, ignore.case=TRUE)]<-'Tornado'
sort(table(data$GroupEvent), decreasing = TRUE) ##This will give us the numbers of each event.
##
## Storm Hail Flood Tornado Other Wind Snow Lighting
## 351850 289270 82731 60701 34399 28146 17419 15983
## Rain Heat Blizzard
## 12165 6888 2745
data<-data%>%
mutate(PROPDMGEXPFACTOR=case_when(
PROPDMGEXP==''~10^0,
PROPDMGEXP=='?'~10^0,
PROPDMGEXP=='-'~10^0,
PROPDMGEXP=='+'~10^0,
PROPDMGEXP=='0'~10^0,
PROPDMGEXP=='H'~10^2,
PROPDMGEXP=='K'~10^3,
PROPDMGEXP=='M'~10^6,
PROPDMGEXP=='B'~10^9,
PROPDMGEXP=='1'~10^1,
PROPDMGEXP=='2'~10^2,
PROPDMGEXP=='3'~10^3,
PROPDMGEXP=='4'~10^4,
PROPDMGEXP=='5'~10^5,
PROPDMGEXP=='6'~10^6,
PROPDMGEXP=='7'~10^7,
PROPDMGEXP=='8'~10^8,
)
)
data$PROPDMGEXPFACTOR<-as.numeric(as.character(data$PROPDMGEXPFACTOR))
data$Econcost<-data$PROPDMG*data$PROPDMGEXPFACTO
This require us to have data on Fatalities and Injuries
datah<-data%>%
group_by(GroupEvent)%>%
summarise(death<-sum(FATALITIES))
## `summarise()` ungrouping output (override with `.groups` argument)
datah<-data.frame(datah)
datah<-datah%>%
rename(
FATALITIES=death....sum.FATALITIES.
)
datah<-arrange(datah, -FATALITIES)
gplot<-ggplot(datah[1:5,],aes(x=reorder(GroupEvent,-FATALITIES),y=FATALITIES,colour=GroupEvent))
gplot<-gplot+geom_bar(stat="identity", fill='grey')
gplot<-gplot + theme(plot.background = element_rect(fill = "#BFD5E3"),
panel.background = element_rect(fill='white'))+
xlab('Types of Event') + ylab('FATALITIES') +
ggtitle('Top 5 Most Harmful Weather and Their Fatalities ') +
theme(plot.title = element_text(hjust = 0.5))
gplot
datah2<-data%>%
group_by(GroupEvent)%>%
summarise(injury<-sum(INJURIES))
## `summarise()` ungrouping output (override with `.groups` argument)
datah2<-data.frame(datah2)
datah2<-datah2%>%
rename(
injury=injury....sum.INJURIES.
)
datah2<-datah2[order(-datah2$injury),]
gplot<-ggplot(datah2[1:5,],aes(x=reorder(GroupEvent,-injury),y=injury,colour=GroupEvent))
gplot<-gplot+geom_bar(stat="identity", fill='grey')
gplot<-gplot + theme(plot.background = element_rect(fill = "#BFD5E3"),
panel.background = element_rect(fill='white'))+
xlab('Types of Event') + ylab('INJURIES') +
ggtitle('Top 5 Most Harmful Weather and Their Injuries ') +
theme(plot.title = element_text(hjust = 0.5))
gplot
str(data)
## 'data.frame': 902297 obs. of 8 variables:
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ FATALITIES : num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP : chr "K" "K" "K" "K" ...
## $ GroupEvent : chr "Tornado" "Tornado" "Tornado" "Tornado" ...
## $ PROPDMGEXPFACTOR: num 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
## $ Econcost : num 25000 2500 25000 2500 2500 2500 2500 2500 25000 25000 ...
dataE<-data%>%
group_by(GroupEvent)%>%
summarise(Econ_Cost<-sum(Econcost))
## `summarise()` ungrouping output (override with `.groups` argument)
dataE<-data.frame(dataE)
dataE<-dataE%>%
rename(
Econ_Cost=Econ_Cost....sum.Econcost.
)
dataE<-dataE[order(-dataE$Econ_Cost),]
dataE$Cost<-paste('$',formatC(dataE$Econ_Cost, big.mark=',', format = 'f'))
gplot<-ggplot(dataE[1:5,],aes(x=reorder(GroupEvent,-Econ_Cost),y=Econ_Cost,colour=GroupEvent))
gplot<-gplot+geom_bar(stat="identity", fill='grey')
gplot<-gplot + theme(plot.background = element_rect(fill = "#BFD5E3"),
panel.background = element_rect(fill='white'))+
xlab('Types of Event') + ylab('Economic Cost') +
ggtitle('Top 5 Most Harmful Weathers to the Economy') +
theme(plot.title = element_text(hjust = 0.5))
gplot
We found that Tornado caused the most fatalities and injuries. While, the most Economic damaged was from Flood.