Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Data
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:
(47MB)
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
Load the packages that will be needed for aggregating the data, and visualizing the results.
library(plyr)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.3
setwd("C:/Users/Inspiron 5537pro/Desktop/Project/Reproducible_Research_P")
dat <- read.csv("repdata_data_StormData.csv")
str(dat)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
summary(dat)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## Min. : 1.0 Length:902297 Length:902297 Length:902297
## 1st Qu.:19.0 Class :character Class :character Class :character
## Median :30.0 Mode :character Mode :character Mode :character
## Mean :31.2
## 3rd Qu.:45.0
## Max. :95.0
##
## COUNTY COUNTYNAME STATE EVTYPE
## Min. : 0.0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 31.0 Class :character Class :character Class :character
## Median : 75.0 Mode :character Mode :character Mode :character
## Mean :100.6
## 3rd Qu.:131.0
## Max. :873.0
##
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE
## Min. : 0.000 Length:902297 Length:902297 Length:902297
## 1st Qu.: 0.000 Class :character Class :character Class :character
## Median : 0.000 Mode :character Mode :character Mode :character
## Mean : 1.484
## 3rd Qu.: 1.000
## Max. :3749.000
##
## END_TIME COUNTY_END COUNTYENDN END_RANGE
## Length:902297 Min. :0 Mode:logical Min. : 0.0000
## Class :character 1st Qu.:0 NA's:902297 1st Qu.: 0.0000
## Mode :character Median :0 Median : 0.0000
## Mean :0 Mean : 0.9862
## 3rd Qu.:0 3rd Qu.: 0.0000
## Max. :0 Max. :925.0000
##
## END_AZI END_LOCATI LENGTH WIDTH
## Length:902297 Length:902297 Min. : 0.0000 Min. : 0.000
## Class :character Class :character 1st Qu.: 0.0000 1st Qu.: 0.000
## Mode :character Mode :character Median : 0.0000 Median : 0.000
## Mean : 0.2301 Mean : 7.503
## 3rd Qu.: 0.0000 3rd Qu.: 0.000
## Max. :2315.0000 Max. :4400.000
##
## F MAG FATALITIES INJURIES
## Min. :0.0 Min. : 0.0 Min. : 0.0000 Min. : 0.0000
## 1st Qu.:0.0 1st Qu.: 0.0 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median :1.0 Median : 50.0 Median : 0.0000 Median : 0.0000
## Mean :0.9 Mean : 46.9 Mean : 0.0168 Mean : 0.1557
## 3rd Qu.:1.0 3rd Qu.: 75.0 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :5.0 Max. :22000.0 Max. :583.0000 Max. :1700.0000
## NA's :843563
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0.00 Length:902297 Min. : 0.000 Length:902297
## 1st Qu.: 0.00 Class :character 1st Qu.: 0.000 Class :character
## Median : 0.00 Mode :character Median : 0.000 Mode :character
## Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :5000.00 Max. :990.000
##
## WFO STATEOFFIC ZONENAMES LATITUDE
## Length:902297 Length:902297 Length:902297 Min. : 0
## Class :character Class :character Class :character 1st Qu.:2802
## Mode :character Mode :character Mode :character Median :3540
## Mean :2875
## 3rd Qu.:4019
## Max. :9706
## NA's :47
## LONGITUDE LATITUDE_E LONGITUDE_ REMARKS
## Min. :-14451 Min. : 0 Min. :-14455 Length:902297
## 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0 Class :character
## Median : 8707 Median : 0 Median : 0 Mode :character
## Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. : 17124 Max. :9706 Max. :106220
## NA's :40
## REFNUM
## Min. : 1
## 1st Qu.:225575
## Median :451149
## Mean :451149
## 3rd Qu.:676723
## Max. :902297
##
dat[1:10,1:10]
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## 7 1 11/16/1951 0:00:00 0100 CST 9 BLOUNT AL
## 8 1 1/22/1952 0:00:00 0900 CST 123 TALLAPOOSA AL
## 9 1 2/13/1952 0:00:00 2000 CST 125 TUSCALOOSA AL
## 10 1 2/13/1952 0:00:00 2000 CST 57 FAYETTE AL
## EVTYPE BGN_RANGE BGN_AZI
## 1 TORNADO 0
## 2 TORNADO 0
## 3 TORNADO 0
## 4 TORNADO 0
## 5 TORNADO 0
## 6 TORNADO 0
## 7 TORNADO 0
## 8 TORNADO 0
## 9 TORNADO 0
## 10 TORNADO 0
We will need to recode the numbers for property and crop damages using PROPDMG-PROPDMGEXP and CROPDMG-CROPDMGEXP respectively.
# Recode the PROPDMGEXP into appropriate 'multipliers'
dat$PROPEXP[dat$PROPDMGEXP == "H"] <- 100 #H-> Hundreds
dat$PROPEXP[dat$PROPDMGEXP == "h"] <- 100
dat$PROPEXP[dat$PROPDMGEXP == "K"] <- 1000 #K-> Thousands
dat$PROPEXP[dat$PROPDMGEXP == "M"] <- 1e+06 #M-> Millions
dat$PROPEXP[dat$PROPDMGEXP == "m"] <- 1e+06
dat$PROPEXP[dat$PROPDMGEXP == "B"] <- 1e+09 #B-> Billions
dat$PROPEXP[dat$PROPDMGEXP == ""] <- 1
dat$PROPEXP[dat$PROPDMGEXP == "0"] <- 1
dat$PROPEXP[dat$PROPDMGEXP == "1"] <- 10
dat$PROPEXP[dat$PROPDMGEXP == "2"] <- 100
dat$PROPEXP[dat$PROPDMGEXP == "3"] <- 1000
dat$PROPEXP[dat$PROPDMGEXP == "4"] <- 10000
dat$PROPEXP[dat$PROPDMGEXP == "5"] <- 1e+05
dat$PROPEXP[dat$PROPDMGEXP == "6"] <- 1e+06
dat$PROPEXP[dat$PROPDMGEXP == "7"] <- 1e+07
dat$PROPEXP[dat$PROPDMGEXP == "8"] <- 1e+08
# Invalid values
dat$PROPEXP[dat$PROPDMGEXP == "+"] <- 0
dat$PROPEXP[dat$PROPDMGEXP == "-"] <- 0
dat$PROPEXP[dat$PROPDMGEXP == "?"] <- 0
#Calculate for the PROPERTY DAMAGEVALUE: Whole number x Multiplier
dat$propvalue <- dat$PROPDMG * dat$PROPEXP
# Recode the CROPDMGEXP into appropriate 'multipliers'
dat$CROPEXP[dat$CROPDMGEXP == "K"] <- 1000
dat$CROPEXP[dat$CROPDMGEXP == "k"] <- 1000
dat$CROPEXP[dat$CROPDMGEXP == "M"] <- 1e+06
dat$CROPEXP[dat$CROPDMGEXP == "m"] <- 1e+06
dat$CROPEXP[dat$CROPDMGEXP == "B"] <- 1e+09
dat$CROPEXP[dat$CROPDMGEXP == "0"] <- 1
dat$CROPEXP[dat$CROPDMGEXP == "2"] <- 100
dat$CROPEXP[dat$CROPDMGEXP == ""] <- 1
# Invalid values
dat$CROPEXP[dat$CROPDMGEXP == "?"] <- 0
#Calculate for the CROP DAMAGEVALUE: Whole number x Multiplier
dat$cropvalue <- dat$CROPDMG * dat$CROPEXP
Get the total number (SUM) of ijuries and fatalities by event type.
fatal<-aggregate(FATALITIES ~ EVTYPE, data=dat, sum)
injur<-aggregate(INJURIES ~ EVTYPE, data=dat, sum)
propv<-aggregate(propvalue ~ EVTYPE, data=dat, sum)
cropv<-aggregate(cropvalue ~ EVTYPE, data=dat, sum)
fatalsort<-fatal[order(fatal$FATALITIES,decreasing = T),]
injursort<-injur[order(injur$INJURIES,decreasing = T),]
propvsort<-propv[order(propv$propvalue,decreasing = T),]
cropsort<-cropv[order(cropv$cropvalue,decreasing = T),]
forG1<-fatalsort[1:10,]
forG2<-injursort[1:10,]
forG3<-propvsort[1:10,]
forG4<-cropsort[1:10,]
forG3$propvalue2<-forG3$propvalue/(10^9)
forG4$cropvalue2<-forG4$cropvalue/(10^9)
This data showed that tornado has that highest value which created more injuries and fatalities compared to RIPCurrent, Heat and Avalanches. However the economic loss is aggravated by flood compared to winter and high storms. And drought has the highest crop loss effect compared to cold or frost. This result indicates prioritizing our strategy for economic loss, population health and agriculture should be given based on the different effect of environmental disasters.
Top 10 events in terms of number of fatalities, and injuries.
G1<-ggplot(data=forG1, aes(x=reorder(EVTYPE, FATALITIES),y =FATALITIES))+ coord_flip() +geom_bar(fill="violet",stat="identity")+labs(title = "Top 10 Fatality causing Events in US",x = "Weather Event", y ="Total Number of Fatalities")
G2 = ggplot(data=forG2,aes(x=reorder(EVTYPE, INJURIES),y =INJURIES))+coord_flip()+geom_bar(fill = "green",stat = "identity")+labs(title = "Top 10 Injury causing Events in US",x = "Weather Event", y="Total Number of Injuries")
G1
G2
Top 10 events in terms of damage to properties and crops.
G3 = ggplot(data=forG3,aes(x=reorder(EVTYPE, propvalue2),y =propvalue2))+ coord_flip()+geom_bar(fill = "blue",stat = "identity")+labs(title = "Top 10 Injury causing Events in US",x = "Weather Event", y="Total Number of Injuries")
G4 = ggplot(data=forG4,aes(x=reorder(EVTYPE,cropvalue2),y = cropvalue2))+ coord_flip()+geom_bar(fill = "red",stat = "identity")+labs(title = "Top 10 Injury causing Events in US",x = "Weather Event", y="Total Number of Injuries")
G3
G4