R Markdown

1.Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage

2.Data Processing

2.1 data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

Storm Data -https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 [47Mb]

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

National Climatic Data Center Storm Events FAQ https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

2.2 Assignment

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. 1.Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? 2.Across the United States, which types of events have the greatest economic consequences?

2.3 Process

2.3.1 Loading the data

The data was downloaded from the above mentioned website and saved on local computer. Then it was loaded on the R using the following code.

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
storm<-read.csv("D:/profile/documents/GitHub/StormData.csv")
head(storm)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

2.4 Data Processing

2.4.1 Health Impact

To evaluate the health impact, the total fatalities and the total injuries for each event type (EVTYPE) are calculated.

storm_fatalities<- storm %>% select(EVTYPE,FATALITIES) %>% group_by(EVTYPE) %>% summarise(TOTAL_FATALITIES = sum(FATALITIES)) %>% arrange(-TOTAL_FATALITIES)

head(storm_fatalities,n=10)
## # A tibble: 10 x 2
##    EVTYPE         TOTAL_FATALITIES
##    <fct>                     <dbl>
##  1 TORNADO                    5633
##  2 EXCESSIVE HEAT             1903
##  3 FLASH FLOOD                 978
##  4 HEAT                        937
##  5 LIGHTNING                   816
##  6 TSTM WIND                   504
##  7 FLOOD                       470
##  8 RIP CURRENT                 368
##  9 HIGH WIND                   248
## 10 AVALANCHE                   224
storm_injuries<- storm %>% select(EVTYPE,INJURIES) %>% group_by(EVTYPE) %>% summarise(TOTAL_INJURIES = sum(INJURIES)) %>% arrange(-TOTAL_INJURIES)

head(storm_injuries, n=10)
## # A tibble: 10 x 2
##    EVTYPE            TOTAL_INJURIES
##    <fct>                      <dbl>
##  1 TORNADO                    91346
##  2 TSTM WIND                   6957
##  3 FLOOD                       6789
##  4 EXCESSIVE HEAT              6525
##  5 LIGHTNING                   5230
##  6 HEAT                        2100
##  7 ICE STORM                   1975
##  8 FLASH FLOOD                 1777
##  9 THUNDERSTORM WIND           1488
## 10 HAIL                        1361

2.4.2 Economic Impact

The data provides two types of economic impact, namely property damage (PROPDMG) and crop damage (CROPDMG). The actual damage in $USD is indicated by PROPDMGEXP and CROPDMGEXP parameters. The index in the PROPDMGEXP and CROPDMGEXP can be interpreted as the following:-

H, h -> hundreds = x100

K, K -> kilos = x1,000

M, m -> millions = x1,000,000

B,b -> billions = x1,000,000,000

(+) -> x1

(-) -> x0

(?) -> x0

blank -> x0

the total damage caused by each event type is calculated with the following code:

storm_damage<-storm[,c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG","CROPDMGEXP")]

code<- sort(unique(as.character(storm_damage$PROPDMGEXP)))

value<- c(0,0,0,1,10,10,10,10,10,10,10,10,10,10^9,10^2,10^2,10^3,10^6,10^6)

code_value<- data.frame(code, value)

storm_damage$prop.value<-code_value$value[match(storm_damage$PROPDMGEXP,code_value$code)]

storm_damage$crop.value<-code_value$value[match(storm_damage$CROPDMGEXP,code_value$code)]

storm_damage<-storm_damage %>% mutate(PROPDMG = PROPDMG*prop.value) %>% mutate(CROPDMG = CROPDMG*crop.value) %>% mutate(TOTALDMG = PROPDMG + CROPDMG)

storm_damage_total<- storm_damage %>% group_by(EVTYPE) %>% summarise(TOTALDMG.EVTYPE = sum(TOTALDMG)) %>% arrange(-TOTALDMG.EVTYPE)

head(storm_damage_total,n=10)
## # A tibble: 10 x 2
##    EVTYPE            TOTALDMG.EVTYPE
##    <fct>                       <dbl>
##  1 FLOOD                150319678250
##  2 HURRICANE/TYPHOON     71913712800
##  3 TORNADO               57352117607
##  4 STORM SURGE           43323541000
##  5 FLASH FLOOD           17562132111
##  6 DROUGHT               15018672000
##  7 HURRICANE             14610229010
##  8 RIVER FLOOD           10148404500
##  9 ICE STORM              8967041810
## 10 TROPICAL STORM         8382236550

2.5 Results

2.5.1 Health Impact

The top 10 events with the highest total fatalities and injuries are shown graphically.

plot1<- ggplot(storm_fatalities[1:10,], aes(x = EVTYPE, y = TOTAL_FATALITIES)) + geom_bar(stat = "identity")+  theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))+ggtitle("Top 10 Events with Highest Total Fatalities") +labs(x="EVENT TYPE", y="Total Fatalities")

plot1

plot2<-ggplot(storm_injuries[1:10,], aes(x = EVTYPE, y = TOTAL_INJURIES)) + geom_bar(stat = "identity")+ theme(axis.text.x  = element_text(angle = 90, vjust = 0.5,hjust = 1))+ggtitle("Top 10 Events with Highest Total Injuries")+ labs(x= "EVENT TYPE", y="TOTAL INJURIES")

plot2

We can see from plots above Tornado caused maximum Fatalities and maximum Injuries .

2.5.2 Economic Impact

The top 10 events with the highest total economic damages (property and crop combined) are shown graphically.

plot3<- ggplot(storm_damage_total[1:10,], aes(x=EVTYPE, y= TOTALDMG.EVTYPE))+geom_bar(stat = "identity")+theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))+ggtitle("Top 10 Events with Highest Economic Impact")+ labs(x= "EVENT TYPE", y= "TOTAL IMPACT (USD)")

plot3

As shown in plot3 Floods have maximum Economic Impact