The analysis of storm event database showed that tornadoes are most dangerous weather event to the population health. And Flash floods cost most with property damages.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. https://www.coursera.org/learn/reproducible-research/peer/OMZ37/course-project-2.
First we need to load data into R
## suppose the csv file is in this default folder
df <- read.csv("repdata%2Fdata%2FStormData.csv")
head (df)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
names (df)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
Since we want to know which types of events are most harmful with respect to population health, we can calculate the total numbers of “FATALITIES” and “INJURIES” with different type of events.
First we can calculate the sum number of “FATALITIES” with various “EVTYPE” variables.
fatalByEvt <- aggregate(FATALITIES ~ EVTYPE, data = df, FUN = sum)
fatalByEvt <- fatalByEvt[order(-fatalByEvt$FATALITIES),]
The top 10 events cause “FATALITIES” are
head(fatalByEvt, 10)
## EVTYPE FATALITIES
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
Then we can calculate the total number of “INJURIES” with different type of events
injByEvt <- aggregate(INJURIES ~ EVTYPE, data = df, FUN = sum)
injByEvt <- injByEvt[order(-injByEvt$INJURIES),]
The top 10 events cause “INJURIES” are
head (injByEvt, 10)
## EVTYPE INJURIES
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
Now let’s find out which types of events have the greatest economic consequences.
We noticed there is a colomn called PROPDMG which may represent property damage. So we can calculate the total numbers of damages caused by various events and compare which one has largest impact.
exp_unit <- function (e) {
if (e %in% c ('h','H'))
return (2)
else if (e %in% c ('k','K'))
return (3)
else if (e %in% c ('m','M'))
return (6)
else if (e %in% c ('B','B'))
return (9)
else if (!is.na(as.numeric((e))))
return (as.numeric (e))
else if (e %in% c (' ', '-', '+', '?'))
return (0)
else {
stop ("invalid exponent value")
}
}
prog_dm_exp <- sapply (df$PROPDMGEXP, FUN = exp_unit)
df$prog_dmg <-df$PROPDMG * (10**prog_dm_exp)
crop_dm_exp <- sapply (df$CROPDMGEXP, FUN = exp_unit)
df$crop_dmg <- df$CROPDMG * (10 ** crop_dm_exp)
The top 10 events which have the greatest economic consequences are:
PDMG <- aggregate(prog_dmg ~ EVTYPE, data = df, FUN = sum)
PDMG <- PDMG[order(-PDMG$prog_dmg),]
head (PDMG, 10)
## EVTYPE prog_dmg
## 153 FLASH FLOOD 6.820237e+13
## 786 THUNDERSTORM WINDS 2.086532e+13
## 834 TORNADO 1.078951e+12
## 244 HAIL 3.157558e+11
## 464 LIGHTNING 1.729433e+11
## 170 FLOOD 1.446577e+11
## 411 HURRICANE/TYPHOON 6.930584e+10
## 185 FLOODING 5.920826e+10
## 670 STORM SURGE 4.332354e+10
## 310 HEAVY SNOW 1.793259e+10
The top 10 events which have the greatest crop economic damage consequences are
CDMG <- aggregate(crop_dmg ~ EVTYPE, data = df, FUN = sum)
CDMG <- CDMG[order(-CDMG$crop_dmg),]
head (CDMG, 10)
## EVTYPE crop_dmg
## 95 DROUGHT 13972566000
## 170 FLOOD 5661968450
## 590 RIVER FLOOD 5029459000
## 427 ICE STORM 5022113500
## 244 HAIL 3025974480
## 402 HURRICANE 2741910000
## 411 HURRICANE/TYPHOON 2607872800
## 153 FLASH FLOOD 1421317100
## 140 EXTREME COLD 1292973000
## 212 FROST/FREEZE 1094086000
Here we can draw the plot of the calculated results from above
library (ggplot2)
library(scales)
topFatal <- head(fatalByEvt, 10)
topInjur <- head (injByEvt, 10)
topPDMG <- head (PDMG, 10)
topCDMG <- head (CDMG, 10)
F <- ggplot (data = topFatal, aes(x = reorder(EVTYPE,FATALITIES),
y = FATALITIES), fill = supp)
F + geom_bar(width = .5, stat = "identity",
fill = "dark blue") + coord_flip()+ xlab ("TYPE") +ggtitle ("Top 10 Fatalities")
Below is the plot of top injuries with various type
G <- ggplot (data = topInjur, aes(x = reorder(EVTYPE,INJURIES),
y = INJURIES))
G + geom_bar(width = .5, stat = "identity",
fill = "dark blue") + coord_flip()+ xlab ("TYPE") +ggtitle ("Top 10 Injuries")
Below is the plot of top property damages with variou type. The property damage is given in log scale since the difference between numbers are huge.
H <- ggplot (data = topPDMG, aes(x = reorder(EVTYPE,prog_dmg),
y = log10(prog_dmg) ))
H + geom_bar(width = .5, stat = "identity",
fill = "dark blue") + coord_flip()+ xlab ("TYPE") +ggtitle ("Top 10 property damage")+ylab ("Property damage in dollars (log-sclae)")