Synopsis

Welcome to the report. Storms and other severe weather events can cause varying degrees of economic problems for communities and municipalities. The purpose of this report is to analyse which specific types of severe weather events have the most serious health and economic consequences for the population. Death and injury data will be used to determine the severity of the population’s health outcomes. The database tracks the characteristics of major storms and weather events in the United States, including when and where they occurred, and estimates of any casualties and property damage.

Question


Profile Setup

knitr::opts_chunk$set(echo = TRUE)

Data Processing

First, let us put the original data set. And show sime property of the original data set. According to the output, we can see the main variable and smmary of each variable. Then we will create 2 analytic data base for this research. The two database separately include the data about the 2008 and 2011.

StormData = read.csv("repdata_data_StormData.csv.bz2")
head(StormData)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6
dim(StormData)
## [1] 902297     37
str(StormData)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels ""," Christiansburg",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels ""," CANTON"," TULIA",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","%SD",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","\t","\t\t",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Because this research want to compare the data of 2008 and 2011, so we just want to subset the original data set to form the analytic data. We start from the BGN_DATE variable.

StormData$BGN_DATE = as.character(StormData$BGN_DATE)
StormData$BGN_DATE = as.Date(StormData$BGN_DATE,format = "%m/%e/%Y %T")
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
StormData$Year = year(StormData$BGN_DATE)
da = subset(StormData,StormData$Year %in% c(2008,2011))
da2008 = subset(da, da$Year %in% c(2008))
da2011 = subset(da, da$Year %in% c(2011))

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:lubridate':
## 
##     intersect, setdiff, union
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
Facility_Injury_2008 = da2008 %>%
        group_by(EVTYPE) %>%
        summarise(sum_facility = sum(FATALITIES,na.rm = TRUE),
                  sum_jury = sum(INJURIES,na.rm = TRUE),
                  sum_crop = sum(CROPDMG,na.rm = TRUE))

Facility_Injury_2011 = da2011 %>%
        group_by(EVTYPE) %>%
        summarise(sum_facility = sum(FATALITIES,na.rm = TRUE),
                  sum_jury = sum(INJURIES,na.rm = TRUE),
                  sum_crop = sum(CROPDMG,na.rm = TRUE))

Explory Data Analysis

After creating the analytic data set, we can see the out put of each data set to see the first 15 result and order them.

Facility_2008 = Facility_Injury_2008[order(-Facility_Injury_2008$sum_facility),c(1,2)][1:15,]
Injury_2008 = Facility_Injury_2008[order(-Facility_Injury_2008$sum_jury),c(1,3)][1:15,]
Facility_2011 = Facility_Injury_2011[order(-Facility_Injury_2011$sum_facility),c(1,2)][1:15,]
Injury_2011 = Facility_Injury_2011[order(-Facility_Injury_2011$sum_jury),c(1,3)][1:15,]
Facility_2008
## # A tibble: 15 x 2
##    EVTYPE                  sum_facility
##    <fct>                          <dbl>
##  1 TORNADO                          129
##  2 FLASH FLOOD                       55
##  3 RIP CURRENT                       44
##  4 COLD/WIND CHILL                   31
##  5 HEAT                              27
##  6 AVALANCHE                         25
##  7 LIGHTNING                         25
##  8 THUNDERSTORM WIND                 25
##  9 FLOOD                             22
## 10 STRONG WIND                       17
## 11 EXCESSIVE HEAT                    15
## 12 HIGH WIND                         11
## 13 STORM SURGE/TIDE                  11
## 14 WINTER WEATHER                    11
## 15 EXTREME COLD/WIND CHILL            8
name_2008 = Facility_2008$EVTYPE
Facility_2011
## # A tibble: 15 x 2
##    EVTYPE            sum_facility
##    <fct>                    <dbl>
##  1 TORNADO                    587
##  2 FLASH FLOOD                 68
##  3 HEAT                        63
##  4 FLOOD                       58
##  5 THUNDERSTORM WIND           54
##  6 EXCESSIVE HEAT              36
##  7 RIP CURRENT                 29
##  8 LIGHTNING                   26
##  9 COLD/WIND CHILL             21
## 10 HIGH SURF                   11
## 11 STRONG WIND                 10
## 12 AVALANCHE                    9
## 13 WILDFIRE                     6
## 14 HIGH WIND                    4
## 15 TROPICAL STORM               4
name_2011 = Facility_2011$EVTYPE
Injury_2008
## # A tibble: 15 x 2
##    EVTYPE            sum_jury
##    <fct>                <dbl>
##  1 TORNADO               1690
##  2 THUNDERSTORM WIND      253
##  3 LIGHTNING              207
##  4 EXCESSIVE HEAT         131
##  5 WINTER WEATHER          77
##  6 HIGH SURF               58
##  7 HIGH WIND               56
##  8 WILDFIRE                32
##  9 FLASH FLOOD             30
## 10 STRONG WIND             26
## 11 RIP CURRENT             21
## 12 HEAVY SNOW              18
## 13 FLOOD                   16
## 14 HEAVY RAIN              16
## 15 AVALANCHE               14
name_2008_1 = Injury_2008$EVTYPE
Injury_2011
## # A tibble: 15 x 2
##    EVTYPE                   sum_jury
##    <fct>                       <dbl>
##  1 TORNADO                      6163
##  2 HEAT                          611
##  3 THUNDERSTORM WIND             373
##  4 LIGHTNING                     194
##  5 EXCESSIVE HEAT                138
##  6 WILDFIRE                      116
##  7 STRONG WIND                    33
##  8 HAIL                           31
##  9 FLASH FLOOD                    30
## 10 RIP CURRENT                    27
## 11 MARINE THUNDERSTORM WIND       14
## 12 HIGH SURF                      11
## 13 HIGH WIND                      11
## 14 FLOOD                          10
## 15 AVALANCHE                       8
name_2011_1 = Injury_2011$EVTYPE

Accoridng to these table, we can summarise that for facility, the most hamful events in 2008 and 2011 are all the Tornado, but the value of facility are different. The top 15 events in 2008 cause harm to fcility is TORNADO, FLASH FLOOD, RIP CURRENT, COLD/WIND CHILL, HEAT, AVALANCHE, LIGHTNING, THUNDERSTORM WIND, FLOOD, STRONG WIND, EXCESSIVE HEAT, HIGH WIND, STORM SURGE/TIDE, WINTER WEATHER, EXTREME COLD/WIND CHILL . And the top 15 events in 2011 cause harm to fcility is TORNADO, FLASH FLOOD, HEAT, FLOOD, THUNDERSTORM WIND, EXCESSIVE HEAT, RIP CURRENT, LIGHTNING, COLD/WIND CHILL, HIGH SURF, STRONG WIND, AVALANCHE, WILDFIRE, HIGH WIND, TROPICAL STORM .

The top 15 events in 2008 cause harm to Injury is TORNADO, THUNDERSTORM WIND, LIGHTNING, EXCESSIVE HEAT, WINTER WEATHER, HIGH SURF, HIGH WIND, WILDFIRE, FLASH FLOOD, STRONG WIND, RIP CURRENT, HEAVY SNOW, FLOOD, HEAVY RAIN, AVALANCHE . And the top 15 events in 2011 cause harm to fcility is TORNADO, HEAT, THUNDERSTORM WIND, LIGHTNING, EXCESSIVE HEAT, WILDFIRE, STRONG WIND, HAIL, FLASH FLOOD, RIP CURRENT, MARINE THUNDERSTORM WIND, HIGH SURF, HIGH WIND, FLOOD, AVALANCHE .


Result

Q1

Now, we can use the graph to visual the result direcly:

par(mfrow = c(1, 2), mar = c(10, 4, 2, 2), las = 2, cex = 0.7, cex.main = 1.4, cex.lab = 1)
barplot(height = Facility_2008$sum_facility,names.arg = Facility_2008$EVTYPE,
        col = heat.colors(15,alpha=1),
        ylab = "Event Type",
        main = "Top 15 Events to the Facilities in 2008",
        ylim = c(0,600))
barplot(height = Facility_2011$sum_facility,names.arg = Facility_2011$EVTYPE,
        col = heat.colors(15,alpha=1),
        ylab = "Event Type",
        main = "Top 15 Events to the Facilities in 2011",
        ylim = c(0,600))

As a result, for the question 1, we can conclude that the most hamful event in 2008 and 2011 are same, they are both Tornado. Compared to the 2008, the tornado in 2011 is much series than 2008, and cause a larger harm to the facility. Compared the fisrt three events in 2008 and 2011, it seems that the HEAT event also cause the huge harm the facility.

Q2

par(mfrow = c(1, 2), mar = c(10, 4, 2, 2), las = 2, cex = 0.7, cex.main = 1.4, cex.lab = 1)
barplot(height = Injury_2008$sum_jury,names.arg = Injury_2008$EVTYPE,
        col = heat.colors(15,alpha=1),
        ylab = "Event Type",
        main = "Top 15 Events Caused Injuries in 2008",
        ylim = c(0,1000))
barplot(height = Injury_2011$sum_jury,names.arg = Injury_2011$EVTYPE,
        col = heat.colors(15,alpha=1),
        ylab = "Event Type",
        main = "Top 15 Events Caused Injuries in 2011",
        ylim = c(0,1000))

par(mfrow = c(1, 1))

As a result, for the question 2, we can conclude that the most hamful event in 2008 and 2011 are same, they are both Tornado. Compared to the 2008 and 2011, the tornado both cause a hugh harm to injury. Compared the fisrt three events in 2008 and 2011, it seems that the HEAT event also cause the huge harm to the Injury.

Conclude

par(mfrow = c(1, 2), mar = c(4, 4, 2, 2),cex = 0.7, cex.main = 1.4, cex.lab = 1)
mrgg = merge(Facility_Injury_2008,Facility_Injury_2011,by = "EVTYPE")
plot(rep(x = 2008,each = 44),mrgg$sum_facility.x,col = mrgg$EVTYPE,
     ylim = c(0,600),xlim = c(2007,2012),main = "Change in Facility Damage in 2008 and 2011",xlab = "Year",ylab = "Facility")
points(rep(x = 2011,each = 44),mrgg$sum_facility.y,col = mrgg$EVTYPE)
segments(rep(x = 2008,each = 44),mrgg$sum_facility.x,
         rep(x = 2011,each = 44),mrgg$sum_facility.y,col = mrgg$EVTYPE)

plot(rep(x = 2008,each = 44),mrgg$sum_jury.x,col = mrgg$EVTYPE,
     ylim = c(0,600),xlim = c(2007,2012),main = "Change in Injuries in 2008 and 2011",xlab = "Year",ylab = "Facility")
points(rep(x = 2011,each = 44),mrgg$sum_jury.y,col = mrgg$EVTYPE)
segments(rep(x = 2008,each = 44),mrgg$sum_jury.x,
         rep(x = 2011,each = 44),mrgg$sum_jury.y,col = mrgg$EVTYPE)

We can use these graph as a summary. We can see that some event in 2011 is much series than 2008, especially for the Tornado Event and Heat Event. People need to find a way to reduce this type of event and find the way how to avoid this events.