Welcome to the report. Storms and other severe weather events can cause varying degrees of economic problems for communities and municipalities. The purpose of this report is to analyse which specific types of severe weather events have the most serious health and economic consequences for the population. Death and injury data will be used to determine the severity of the population’s health outcomes. The database tracks the characteristics of major storms and weather events in the United States, including when and where they occurred, and estimates of any casualties and property damage.
Are most harmful types of events with respect to facilities change compared to 2008 and 2011?
Are most harmful types of events with respect to injuries change compared to 2008 and 2011?
knitr::opts_chunk$set(echo = TRUE)
First, let us put the original data set. And show sime property of the original data set. According to the output, we can see the main variable and smmary of each variable. Then we will create 2 analytic data base for this research. The two database separately include the data about the 2008 and 2011.
StormData = read.csv("repdata_data_StormData.csv.bz2")
head(StormData)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
dim(StormData)
## [1] 902297 37
str(StormData)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels ""," Christiansburg",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels ""," CANTON"," TULIA",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","%SD",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","\t","\t\t",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
Because this research want to compare the data of 2008 and 2011, so we just want to subset the original data set to form the analytic data. We start from the BGN_DATE variable.
StormData$BGN_DATE = as.character(StormData$BGN_DATE)
StormData$BGN_DATE = as.Date(StormData$BGN_DATE,format = "%m/%e/%Y %T")
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
StormData$Year = year(StormData$BGN_DATE)
da = subset(StormData,StormData$Year %in% c(2008,2011))
da2008 = subset(da, da$Year %in% c(2008))
da2011 = subset(da, da$Year %in% c(2011))
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:lubridate':
##
## intersect, setdiff, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Facility_Injury_2008 = da2008 %>%
group_by(EVTYPE) %>%
summarise(sum_facility = sum(FATALITIES,na.rm = TRUE),
sum_jury = sum(INJURIES,na.rm = TRUE),
sum_crop = sum(CROPDMG,na.rm = TRUE))
Facility_Injury_2011 = da2011 %>%
group_by(EVTYPE) %>%
summarise(sum_facility = sum(FATALITIES,na.rm = TRUE),
sum_jury = sum(INJURIES,na.rm = TRUE),
sum_crop = sum(CROPDMG,na.rm = TRUE))
After creating the analytic data set, we can see the out put of each data set to see the first 15 result and order them.
Facility_2008 = Facility_Injury_2008[order(-Facility_Injury_2008$sum_facility),c(1,2)][1:15,]
Injury_2008 = Facility_Injury_2008[order(-Facility_Injury_2008$sum_jury),c(1,3)][1:15,]
Facility_2011 = Facility_Injury_2011[order(-Facility_Injury_2011$sum_facility),c(1,2)][1:15,]
Injury_2011 = Facility_Injury_2011[order(-Facility_Injury_2011$sum_jury),c(1,3)][1:15,]
Facility_2008
## # A tibble: 15 x 2
## EVTYPE sum_facility
## <fct> <dbl>
## 1 TORNADO 129
## 2 FLASH FLOOD 55
## 3 RIP CURRENT 44
## 4 COLD/WIND CHILL 31
## 5 HEAT 27
## 6 AVALANCHE 25
## 7 LIGHTNING 25
## 8 THUNDERSTORM WIND 25
## 9 FLOOD 22
## 10 STRONG WIND 17
## 11 EXCESSIVE HEAT 15
## 12 HIGH WIND 11
## 13 STORM SURGE/TIDE 11
## 14 WINTER WEATHER 11
## 15 EXTREME COLD/WIND CHILL 8
name_2008 = Facility_2008$EVTYPE
Facility_2011
## # A tibble: 15 x 2
## EVTYPE sum_facility
## <fct> <dbl>
## 1 TORNADO 587
## 2 FLASH FLOOD 68
## 3 HEAT 63
## 4 FLOOD 58
## 5 THUNDERSTORM WIND 54
## 6 EXCESSIVE HEAT 36
## 7 RIP CURRENT 29
## 8 LIGHTNING 26
## 9 COLD/WIND CHILL 21
## 10 HIGH SURF 11
## 11 STRONG WIND 10
## 12 AVALANCHE 9
## 13 WILDFIRE 6
## 14 HIGH WIND 4
## 15 TROPICAL STORM 4
name_2011 = Facility_2011$EVTYPE
Injury_2008
## # A tibble: 15 x 2
## EVTYPE sum_jury
## <fct> <dbl>
## 1 TORNADO 1690
## 2 THUNDERSTORM WIND 253
## 3 LIGHTNING 207
## 4 EXCESSIVE HEAT 131
## 5 WINTER WEATHER 77
## 6 HIGH SURF 58
## 7 HIGH WIND 56
## 8 WILDFIRE 32
## 9 FLASH FLOOD 30
## 10 STRONG WIND 26
## 11 RIP CURRENT 21
## 12 HEAVY SNOW 18
## 13 FLOOD 16
## 14 HEAVY RAIN 16
## 15 AVALANCHE 14
name_2008_1 = Injury_2008$EVTYPE
Injury_2011
## # A tibble: 15 x 2
## EVTYPE sum_jury
## <fct> <dbl>
## 1 TORNADO 6163
## 2 HEAT 611
## 3 THUNDERSTORM WIND 373
## 4 LIGHTNING 194
## 5 EXCESSIVE HEAT 138
## 6 WILDFIRE 116
## 7 STRONG WIND 33
## 8 HAIL 31
## 9 FLASH FLOOD 30
## 10 RIP CURRENT 27
## 11 MARINE THUNDERSTORM WIND 14
## 12 HIGH SURF 11
## 13 HIGH WIND 11
## 14 FLOOD 10
## 15 AVALANCHE 8
name_2011_1 = Injury_2011$EVTYPE
Accoridng to these table, we can summarise that for facility, the most hamful events in 2008 and 2011 are all the Tornado, but the value of facility are different. The top 15 events in 2008 cause harm to fcility is TORNADO, FLASH FLOOD, RIP CURRENT, COLD/WIND CHILL, HEAT, AVALANCHE, LIGHTNING, THUNDERSTORM WIND, FLOOD, STRONG WIND, EXCESSIVE HEAT, HIGH WIND, STORM SURGE/TIDE, WINTER WEATHER, EXTREME COLD/WIND CHILL . And the top 15 events in 2011 cause harm to fcility is TORNADO, FLASH FLOOD, HEAT, FLOOD, THUNDERSTORM WIND, EXCESSIVE HEAT, RIP CURRENT, LIGHTNING, COLD/WIND CHILL, HIGH SURF, STRONG WIND, AVALANCHE, WILDFIRE, HIGH WIND, TROPICAL STORM .
The top 15 events in 2008 cause harm to Injury is TORNADO, THUNDERSTORM WIND, LIGHTNING, EXCESSIVE HEAT, WINTER WEATHER, HIGH SURF, HIGH WIND, WILDFIRE, FLASH FLOOD, STRONG WIND, RIP CURRENT, HEAVY SNOW, FLOOD, HEAVY RAIN, AVALANCHE . And the top 15 events in 2011 cause harm to fcility is TORNADO, HEAT, THUNDERSTORM WIND, LIGHTNING, EXCESSIVE HEAT, WILDFIRE, STRONG WIND, HAIL, FLASH FLOOD, RIP CURRENT, MARINE THUNDERSTORM WIND, HIGH SURF, HIGH WIND, FLOOD, AVALANCHE .
Now, we can use the graph to visual the result direcly:
par(mfrow = c(1, 2), mar = c(10, 4, 2, 2), las = 2, cex = 0.7, cex.main = 1.4, cex.lab = 1)
barplot(height = Facility_2008$sum_facility,names.arg = Facility_2008$EVTYPE,
col = heat.colors(15,alpha=1),
ylab = "Event Type",
main = "Top 15 Events to the Facilities in 2008",
ylim = c(0,600))
barplot(height = Facility_2011$sum_facility,names.arg = Facility_2011$EVTYPE,
col = heat.colors(15,alpha=1),
ylab = "Event Type",
main = "Top 15 Events to the Facilities in 2011",
ylim = c(0,600))
As a result, for the question 1, we can conclude that the most hamful event in 2008 and 2011 are same, they are both Tornado. Compared to the 2008, the tornado in 2011 is much series than 2008, and cause a larger harm to the facility. Compared the fisrt three events in 2008 and 2011, it seems that the HEAT event also cause the huge harm the facility.
par(mfrow = c(1, 2), mar = c(10, 4, 2, 2), las = 2, cex = 0.7, cex.main = 1.4, cex.lab = 1)
barplot(height = Injury_2008$sum_jury,names.arg = Injury_2008$EVTYPE,
col = heat.colors(15,alpha=1),
ylab = "Event Type",
main = "Top 15 Events Caused Injuries in 2008",
ylim = c(0,1000))
barplot(height = Injury_2011$sum_jury,names.arg = Injury_2011$EVTYPE,
col = heat.colors(15,alpha=1),
ylab = "Event Type",
main = "Top 15 Events Caused Injuries in 2011",
ylim = c(0,1000))
par(mfrow = c(1, 1))
As a result, for the question 2, we can conclude that the most hamful event in 2008 and 2011 are same, they are both Tornado. Compared to the 2008 and 2011, the tornado both cause a hugh harm to injury. Compared the fisrt three events in 2008 and 2011, it seems that the HEAT event also cause the huge harm to the Injury.
par(mfrow = c(1, 2), mar = c(4, 4, 2, 2),cex = 0.7, cex.main = 1.4, cex.lab = 1)
mrgg = merge(Facility_Injury_2008,Facility_Injury_2011,by = "EVTYPE")
plot(rep(x = 2008,each = 44),mrgg$sum_facility.x,col = mrgg$EVTYPE,
ylim = c(0,600),xlim = c(2007,2012),main = "Change in Facility Damage in 2008 and 2011",xlab = "Year",ylab = "Facility")
points(rep(x = 2011,each = 44),mrgg$sum_facility.y,col = mrgg$EVTYPE)
segments(rep(x = 2008,each = 44),mrgg$sum_facility.x,
rep(x = 2011,each = 44),mrgg$sum_facility.y,col = mrgg$EVTYPE)
plot(rep(x = 2008,each = 44),mrgg$sum_jury.x,col = mrgg$EVTYPE,
ylim = c(0,600),xlim = c(2007,2012),main = "Change in Injuries in 2008 and 2011",xlab = "Year",ylab = "Facility")
points(rep(x = 2011,each = 44),mrgg$sum_jury.y,col = mrgg$EVTYPE)
segments(rep(x = 2008,each = 44),mrgg$sum_jury.x,
rep(x = 2011,each = 44),mrgg$sum_jury.y,col = mrgg$EVTYPE)
We can use these graph as a summary. We can see that some event in 2011 is much series than 2008, especially for the Tornado Event and Heat Event. People need to find a way to reduce this type of event and find the way how to avoid this events.