THE IMPACT OF WEATHER EVENTS IN US

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

Storm Data

(47MB)

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation

National Climatic Data Center Storm Events FAQ The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Data Processing

Load the packages that will be needed for aggregating the data, and visualizing the results.

library(plyr)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.3

Setting workign directory

setwd("C:/Users/Inspiron 5537pro/Desktop/Project/Reproducible_Research_P")
dat <- read.csv("repdata_data_StormData.csv")

Inspect the dataset.

str(dat)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
summary(dat)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY       COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31.0   Class :character   Class :character   Class :character  
##  Median : 75.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :100.6                                                           
##  3rd Qu.:131.0                                                           
##  Max.   :873.0                                                           
##                                                                          
##    BGN_RANGE          BGN_AZI           BGN_LOCATI          END_DATE        
##  Min.   :   0.000   Length:902297      Length:902297      Length:902297     
##  1st Qu.:   0.000   Class :character   Class :character   Class :character  
##  Median :   0.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :   1.484                                                           
##  3rd Qu.:   1.000                                                           
##  Max.   :3749.000                                                           
##                                                                             
##    END_TIME           COUNTY_END COUNTYENDN       END_RANGE       
##  Length:902297      Min.   :0    Mode:logical   Min.   :  0.0000  
##  Class :character   1st Qu.:0    NA's:902297    1st Qu.:  0.0000  
##  Mode  :character   Median :0                   Median :  0.0000  
##                     Mean   :0                   Mean   :  0.9862  
##                     3rd Qu.:0                   3rd Qu.:  0.0000  
##                     Max.   :0                   Max.   :925.0000  
##                                                                   
##    END_AZI           END_LOCATI            LENGTH              WIDTH         
##  Length:902297      Length:902297      Min.   :   0.0000   Min.   :   0.000  
##  Class :character   Class :character   1st Qu.:   0.0000   1st Qu.:   0.000  
##  Mode  :character   Mode  :character   Median :   0.0000   Median :   0.000  
##                                        Mean   :   0.2301   Mean   :   7.503  
##                                        3rd Qu.:   0.0000   3rd Qu.:   0.000  
##                                        Max.   :2315.0000   Max.   :4400.000  
##                                                                              
##        F               MAG            FATALITIES          INJURIES        
##  Min.   :0.0      Min.   :    0.0   Min.   :  0.0000   Min.   :   0.0000  
##  1st Qu.:0.0      1st Qu.:    0.0   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  Median :1.0      Median :   50.0   Median :  0.0000   Median :   0.0000  
##  Mean   :0.9      Mean   :   46.9   Mean   :  0.0168   Mean   :   0.1557  
##  3rd Qu.:1.0      3rd Qu.:   75.0   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  Max.   :5.0      Max.   :22000.0   Max.   :583.0000   Max.   :1700.0000  
##  NA's   :843563                                                           
##     PROPDMG         PROPDMGEXP           CROPDMG         CROPDMGEXP       
##  Min.   :   0.00   Length:902297      Min.   :  0.000   Length:902297     
##  1st Qu.:   0.00   Class :character   1st Qu.:  0.000   Class :character  
##  Median :   0.00   Mode  :character   Median :  0.000   Mode  :character  
##  Mean   :  12.06                      Mean   :  1.527                     
##  3rd Qu.:   0.50                      3rd Qu.:  0.000                     
##  Max.   :5000.00                      Max.   :990.000                     
##                                                                           
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
## 
dat[1:10,1:10]
##    STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1        1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2        1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3        1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4        1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5        1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6        1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
## 7        1 11/16/1951 0:00:00     0100       CST      9     BLOUNT    AL
## 8        1  1/22/1952 0:00:00     0900       CST    123 TALLAPOOSA    AL
## 9        1  2/13/1952 0:00:00     2000       CST    125 TUSCALOOSA    AL
## 10       1  2/13/1952 0:00:00     2000       CST     57    FAYETTE    AL
##     EVTYPE BGN_RANGE BGN_AZI
## 1  TORNADO         0        
## 2  TORNADO         0        
## 3  TORNADO         0        
## 4  TORNADO         0        
## 5  TORNADO         0        
## 6  TORNADO         0        
## 7  TORNADO         0        
## 8  TORNADO         0        
## 9  TORNADO         0        
## 10 TORNADO         0

We will need to recode the numbers for property and crop damages using PROPDMG-PROPDMGEXP and CROPDMG-CROPDMGEXP respectively.

# Recode the PROPDMGEXP into appropriate 'multipliers'
dat$PROPEXP[dat$PROPDMGEXP == "H"] <- 100          #H-> Hundreds
dat$PROPEXP[dat$PROPDMGEXP == "h"] <- 100
dat$PROPEXP[dat$PROPDMGEXP == "K"] <- 1000         #K-> Thousands
dat$PROPEXP[dat$PROPDMGEXP == "M"] <- 1e+06        #M-> Millions
dat$PROPEXP[dat$PROPDMGEXP == "m"] <- 1e+06
dat$PROPEXP[dat$PROPDMGEXP == "B"] <- 1e+09        #B-> Billions
dat$PROPEXP[dat$PROPDMGEXP == ""] <- 1
dat$PROPEXP[dat$PROPDMGEXP == "0"] <- 1
dat$PROPEXP[dat$PROPDMGEXP == "1"] <- 10
dat$PROPEXP[dat$PROPDMGEXP == "2"] <- 100
dat$PROPEXP[dat$PROPDMGEXP == "3"] <- 1000
dat$PROPEXP[dat$PROPDMGEXP == "4"] <- 10000
dat$PROPEXP[dat$PROPDMGEXP == "5"] <- 1e+05
dat$PROPEXP[dat$PROPDMGEXP == "6"] <- 1e+06
dat$PROPEXP[dat$PROPDMGEXP == "7"] <- 1e+07
dat$PROPEXP[dat$PROPDMGEXP == "8"] <- 1e+08
# Invalid values
dat$PROPEXP[dat$PROPDMGEXP == "+"] <- 0
dat$PROPEXP[dat$PROPDMGEXP == "-"] <- 0
dat$PROPEXP[dat$PROPDMGEXP == "?"] <- 0

#Calculate for the PROPERTY  DAMAGEVALUE: Whole number x Multiplier
dat$propvalue <- dat$PROPDMG * dat$PROPEXP

# Recode the CROPDMGEXP into appropriate 'multipliers'
dat$CROPEXP[dat$CROPDMGEXP == "K"] <- 1000
dat$CROPEXP[dat$CROPDMGEXP == "k"] <- 1000
dat$CROPEXP[dat$CROPDMGEXP == "M"] <- 1e+06
dat$CROPEXP[dat$CROPDMGEXP == "m"] <- 1e+06
dat$CROPEXP[dat$CROPDMGEXP == "B"] <- 1e+09
dat$CROPEXP[dat$CROPDMGEXP == "0"] <- 1
dat$CROPEXP[dat$CROPDMGEXP == "2"] <- 100
dat$CROPEXP[dat$CROPDMGEXP == ""] <- 1
# Invalid values
dat$CROPEXP[dat$CROPDMGEXP == "?"] <- 0
#Calculate for the CROP DAMAGEVALUE: Whole number x Multiplier
dat$cropvalue <- dat$CROPDMG * dat$CROPEXP

Aggregating the data.

Get the total number (SUM) of ijuries and fatalities by event type.

fatal<-aggregate(FATALITIES ~ EVTYPE, data=dat, sum)
injur<-aggregate(INJURIES ~ EVTYPE, data=dat, sum)
propv<-aggregate(propvalue ~ EVTYPE, data=dat, sum)
cropv<-aggregate(cropvalue ~ EVTYPE, data=dat, sum)

SORT THE DATA INTO DECREASING ORDER

fatalsort<-fatal[order(fatal$FATALITIES,decreasing = T),]
injursort<-injur[order(injur$INJURIES,decreasing = T),]
propvsort<-propv[order(propv$propvalue,decreasing = T),]
cropsort<-cropv[order(cropv$cropvalue,decreasing = T),]

forG1<-fatalsort[1:10,]
forG2<-injursort[1:10,]
forG3<-propvsort[1:10,]
forG4<-cropsort[1:10,]
forG3$propvalue2<-forG3$propvalue/(10^9)
forG4$cropvalue2<-forG4$cropvalue/(10^9)

RESULT

This data showed that tornado has that highest value which created more injuries and fatalities compared to RIPCurrent, Heat and Avalanches. However the economic loss is aggravated by flood compared to winter and high storms. And drought has the highest crop loss effect compared to cold or frost. This result indicates prioritizing our strategy for economic loss, population health and agriculture should be given based on the different effect of environmental disasters.

THE IMPACT ON HEALTH

Top 10 events in terms of number of fatalities, and injuries.

G1<-ggplot(data=forG1, aes(x=reorder(EVTYPE, FATALITIES),y =FATALITIES))+ coord_flip() +geom_bar(fill="violet",stat="identity")+labs(title = "Top 10 Fatality causing Events in US",x = "Weather Event", y ="Total Number of Fatalities")
G2 = ggplot(data=forG2,aes(x=reorder(EVTYPE, INJURIES),y =INJURIES))+coord_flip()+geom_bar(fill = "green",stat = "identity")+labs(title = "Top 10 Injury causing Events in US",x = "Weather Event", y="Total Number of Injuries")
G1

G2

THE IMPACT TO ECONOMY

Top 10 events in terms of damage to properties and crops.

G3 = ggplot(data=forG3,aes(x=reorder(EVTYPE, propvalue2),y =propvalue2))+ coord_flip()+geom_bar(fill = "blue",stat = "identity")+labs(title = "Top 10 Injury causing Events in US",x = "Weather Event", y="Total Number of Injuries")

G4 = ggplot(data=forG4,aes(x=reorder(EVTYPE,cropvalue2),y = cropvalue2))+ coord_flip()+geom_bar(fill = "red",stat = "identity")+labs(title = "Top 10 Injury causing Events in US",x = "Weather Event", y="Total Number of Injuries")
  
G3

G4

CONCLUSION

  1. TORNADO has the highest damage to people’s health.
  2. FLOOD has the highest property damage cost
  3. ROUGHT has the highest crop damage cost.