1. Synopsis

Storms and some other weather events could cause both public health and economic problems for communities and goverments. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

The project below involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. It tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The following research will solve two main problems:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

The writer hopes this report analysis could help goverment to make futher resources allocation in the future

2. Data Processing

2.1 First Question

Loading the data from the folder in my computer

Data_frame_1 <- read.csv("~/Desktop/R language/repdata_data_StormData.csv", stringsAsFactors=FALSE)
head(Data_frame_1)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6

From the output above, we can clearly see that descriptive variables on “harmful” effects could be injuries and fatalities.

Thus, here we create a new data frame to record both variables and event type.

harm_data <- Data_frame_1[,c("EVTYPE","FATALITIES","INJURIES")]
library(dplyr)
library(ggplot2)
harm_data_gp<-group_by(harm_data,EVTYPE)
sum_harm_data<-summarize(harm_data_gp,sum_injury = sum(INJURIES),sum_fatal = sum(FATALITIES))

Brief summary:

By far, we have make a brief summary on the numbers of injury and fatality caused by various events, let’s have a look at them.

head(sum_harm_data)
## # A tibble: 6 x 3
##   EVTYPE                  sum_injury sum_fatal
##   <chr>                        <dbl>     <dbl>
## 1 "   HIGH SURF ADVISORY"          0         0
## 2 " COASTAL FLOOD"                 0         0
## 3 " FLASH FLOOD"                   0         0
## 4 " LIGHTNING"                     0         0
## 5 " TSTM WIND"                     0         0
## 6 " TSTM WIND (G45)"               0         0

Here we find respectively first 2 ranked events in the injury and fatality.

sum_harm_data[which.max(sum_harm_data$sum_fatal),]
## # A tibble: 1 x 3
##   EVTYPE  sum_injury sum_fatal
##   <chr>        <dbl>     <dbl>
## 1 TORNADO      91346      5633
sum_harm_data[which.max(sum_harm_data$sum_injury),]
## # A tibble: 1 x 3
##   EVTYPE  sum_injury sum_fatal
##   <chr>        <dbl>     <dbl>
## 1 TORNADO      91346      5633

We found that they are both Tornado, which answers the first question.

2.1 Second Question

We could easily know that the effective variable to measure economic damages could be ‘PROPDMG’ and ‘PROPDMGEXP’

Like the method we used above, the analysis on this will be done as followed:

eco_data<-Data_frame_1[,c("EVTYPE","PROPDMG","PROPDMGEXP")]
eco_gp<-group_by(eco_data,EVTYPE)
exp_data <- subset(eco_gp,PROPDMGEXP == 'B')
exp_data[which.max(exp_data$PROPDMG),]
## # A tibble: 1 x 3
## # Groups:   EVTYPE [1]
##   EVTYPE PROPDMG PROPDMGEXP
##   <chr>    <dbl> <chr>     
## 1 FLOOD      115 B

As ‘B’ represents billion which is larger than thousand and million, thus we first get the billion unit and then get the biggest number, and successfully find that flood could cause biggest economic damages

last_data <- group_by(exp_data,EVTYPE)
sum_last <- summarize(last_data,sum = sum(PROPDMG))
ggplot(sum_last,aes(EVTYPE,sum))+theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1,size = 5))+geom_bar(stat = "identity")

This figure depicts ones of the greatest economic consequenses brought by various natural disaters

3. Result

From all of the analysis we did above, we could answer the questions proposed at teh beginning of this paper.

The most harmful natural event could be Tornado, no matter from the injury or fatality perspective, it causes significant effect on the human.

And for the economic consequences brought by natural effect, we could say most significant effect is flood, as is indicated above.

Through the bar plot made above, we could easily measure ones of the greatest effect caused by natural events.

To sum up, from all things done above, it is useful for government to do futher analysis and resources allocation on the natural disasters.