Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This analysis involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
This goal of this analysis is to answer the following two questions:
The data for this assignment is in the form of a compressed CSV file and is available here. The documentation for the file can be found in the links below:
Loading libraries:
library(dplyr)
library(ggplot2)
library(gridExtra)
library(grid)
Downloading the data file and loading it into RStudio:
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, "stormData.csv")
data <- read.csv("stormData.csv")
Use the head and summary function to take a look at the data:
head(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
summary(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## Min. : 1.0 Length:902297 Length:902297 Length:902297
## 1st Qu.:19.0 Class :character Class :character Class :character
## Median :30.0 Mode :character Mode :character Mode :character
## Mean :31.2
## 3rd Qu.:45.0
## Max. :95.0
##
## COUNTY COUNTYNAME STATE EVTYPE
## Min. : 0.0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 31.0 Class :character Class :character Class :character
## Median : 75.0 Mode :character Mode :character Mode :character
## Mean :100.6
## 3rd Qu.:131.0
## Max. :873.0
##
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE
## Min. : 0.000 Length:902297 Length:902297 Length:902297
## 1st Qu.: 0.000 Class :character Class :character Class :character
## Median : 0.000 Mode :character Mode :character Mode :character
## Mean : 1.484
## 3rd Qu.: 1.000
## Max. :3749.000
##
## END_TIME COUNTY_END COUNTYENDN END_RANGE
## Length:902297 Min. :0 Mode:logical Min. : 0.0000
## Class :character 1st Qu.:0 NA's:902297 1st Qu.: 0.0000
## Mode :character Median :0 Median : 0.0000
## Mean :0 Mean : 0.9862
## 3rd Qu.:0 3rd Qu.: 0.0000
## Max. :0 Max. :925.0000
##
## END_AZI END_LOCATI LENGTH WIDTH
## Length:902297 Length:902297 Min. : 0.0000 Min. : 0.000
## Class :character Class :character 1st Qu.: 0.0000 1st Qu.: 0.000
## Mode :character Mode :character Median : 0.0000 Median : 0.000
## Mean : 0.2301 Mean : 7.503
## 3rd Qu.: 0.0000 3rd Qu.: 0.000
## Max. :2315.0000 Max. :4400.000
##
## F MAG FATALITIES INJURIES
## Min. :0.0 Min. : 0.0 Min. : 0.0000 Min. : 0.0000
## 1st Qu.:0.0 1st Qu.: 0.0 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median :1.0 Median : 50.0 Median : 0.0000 Median : 0.0000
## Mean :0.9 Mean : 46.9 Mean : 0.0168 Mean : 0.1557
## 3rd Qu.:1.0 3rd Qu.: 75.0 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :5.0 Max. :22000.0 Max. :583.0000 Max. :1700.0000
## NA's :843563
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0.00 Length:902297 Min. : 0.000 Length:902297
## 1st Qu.: 0.00 Class :character 1st Qu.: 0.000 Class :character
## Median : 0.00 Mode :character Median : 0.000 Mode :character
## Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :5000.00 Max. :990.000
##
## WFO STATEOFFIC ZONENAMES LATITUDE
## Length:902297 Length:902297 Length:902297 Min. : 0
## Class :character Class :character Class :character 1st Qu.:2802
## Mode :character Mode :character Mode :character Median :3540
## Mean :2875
## 3rd Qu.:4019
## Max. :9706
## NA's :47
## LONGITUDE LATITUDE_E LONGITUDE_ REMARKS
## Min. :-14451 Min. : 0 Min. :-14455 Length:902297
## 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0 Class :character
## Median : 8707 Median : 0 Median : 0 Mode :character
## Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. : 17124 Max. :9706 Max. :106220
## NA's :40
## REFNUM
## Min. : 1
## 1st Qu.:225575
## Median :451149
## Mean :451149
## 3rd Qu.:676723
## Max. :902297
##
Here, we assume that harmful events with respect to population health refer to FATALITIES and INJURIES. And, events having greatest economic consequences are calculated with Crop (CROPDMG) and Property (PROPDMG) damage. CROPDMG and PROPDMG damage have two related variables called Crop damage exponent (CROPDMGEXP) and Property damage exponent (PROPDMGEXP). Total damage value has to be calculated by multiplying the damage and the exponent values.
var <-c ("EVTYPE","FATALITIES","INJURIES","PROPDMG", "PROPDMGEXP","CROPDMG","CROPDMGEXP")
storm <- data[var]
dim(storm)
## [1] 902297 7
Events causing the maximum number of fatalities and injuries are calculated as:
fatal <- aggregate(FATALITIES ~ EVTYPE, data = storm, FUN = sum)
fatal10 <- fatal[order(-fatal$FATALITIES), ][1:10, ]
fatal10
## EVTYPE FATALITIES
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
injury <- aggregate(INJURIES ~ EVTYPE, data = storm, FUN = sum)
injury10 <- injury[order(-injury$INJURIES), ][1:10, ]
injury10
## EVTYPE INJURIES
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
Plots for the fatalities and injuries for the top 10 events:
fatalplot <- ggplot(fatal10, aes(x = reorder(EVTYPE, -FATALITIES), y = FATALITIES)) +
geom_bar(stat = "identity", fill = "pink") +
theme(axis.text.x = element_text(angle = 90, vjust = .5, hjust = 1)) +
xlab("Event") + ylab("Fatalities")
injuryplot <- ggplot(injury10, aes(x = reorder(EVTYPE, - INJURIES), y = INJURIES)) +
geom_bar(stat = "identity", fill = "pink") +
theme(axis.text.x = element_text(angle = 90, vjust = .5, hjust = 1)) +
xlab("Event") + ylab("Injuries")
grid.arrange(fatalplot, injuryplot, ncol=2, nrow=1,
top = textGrob("Most Harmful Events w.r.t. Public Health",gp = gpar(fontsize = 14, font = 3)))
Fig.1
The plots above show that most fatalities and injuries are caused by Tornado events.
unique(storm$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(storm$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
The values of exponents are not consistent. Hence, we need to assign proper values to non-consistent ones. We convert known symbols to corresponding numbers and unknown/invalid symbols to 0.
storm$corrPROPEXP[storm$PROPDMGEXP == "K"] <- 1000
storm$corrPROPEXP[storm$PROPDMGEXP == "M"] <- 1e+06
storm$corrPROPEXP[storm$PROPDMGEXP == ""] <- 1
storm$corrPROPEXP[storm$PROPDMGEXP == "B"] <- 1e+09
storm$corrPROPEXP[storm$PROPDMGEXP == "m"] <- 1e+06
storm$corrPROPEXP[storm$PROPDMGEXP == "0"] <- 1
storm$corrPROPEXP[storm$PROPDMGEXP == "5"] <- 1e+05
storm$corrPROPEXP[storm$PROPDMGEXP == "6"] <- 1e+06
storm$corrPROPEXP[storm$PROPDMGEXP == "4"] <- 10000
storm$corrPROPEXP[storm$PROPDMGEXP == "2"] <- 100
storm$corrPROPEXP[storm$PROPDMGEXP == "3"] <- 1000
storm$corrPROPEXP[storm$PROPDMGEXP == "h"] <- 100
storm$corrPROPEXP[storm$PROPDMGEXP == "7"] <- 1e+07
storm$corrPROPEXP[storm$PROPDMGEXP == "H"] <- 100
storm$corrPROPEXP[storm$PROPDMGEXP == "1"] <- 10
storm$corrPROPEXP[storm$PROPDMGEXP == "8"] <- 1e+08
storm$corrPROPEXP[storm$PROPDMGEXP == "+"] <- 0
storm$corrPROPEXP[storm$PROPDMGEXP == "-"] <- 0
storm$corrPROPEXP[storm$PROPDMGEXP == "?"] <- 0
storm$corrCROPEXP[storm$CROPDMGEXP == "M"] <- 1e+06
storm$corrCROPEXP[storm$CROPDMGEXP == "K"] <- 1000
storm$corrCROPEXP[storm$CROPDMGEXP == "m"] <- 1e+06
storm$corrCROPEXP[storm$CROPDMGEXP == "B"] <- 1e+09
storm$corrCROPEXP[storm$CROPDMGEXP == "0"] <- 1
storm$corrCROPEXP[storm$CROPDMGEXP == "k"] <- 1000
storm$corrCROPEXP[storm$CROPDMGEXP == "2"] <- 100
storm$corrCROPEXP[storm$CROPDMGEXP == ""] <- 1
storm$corrCROPEXP[storm$CROPDMGEXP == "?"] <- 0
Calculating total damage value as:
storm$TOTPROPDMG <- storm$PROPDMG * storm$corrPROPEXP
storm$TOTCROPDMG <- storm$CROPDMG * storm$corrCROPEXP
Aggregating and sorting the property and crop damage by event type (top 10):
prop <- aggregate(TOTPROPDMG ~ EVTYPE, data = storm, FUN = sum, na.rm = TRUE)
prop <- prop[with(prop, order(-TOTPROPDMG)),]
prop <- head(prop, 10)
print(prop)
## EVTYPE TOTPROPDMG
## 170 FLOOD 144657709807
## 411 HURRICANE/TYPHOON 69305840000
## 834 TORNADO 56947380617
## 670 STORM SURGE 43323536000
## 153 FLASH FLOOD 16822673979
## 244 HAIL 15735267513
## 402 HURRICANE 11868319010
## 848 TROPICAL STORM 7703890550
## 972 WINTER STORM 6688497251
## 359 HIGH WIND 5270046260
crop <- aggregate(TOTCROPDMG ~ EVTYPE, data = storm, FUN = sum, na.rm = TRUE)
crop <- crop[with(crop, order(-TOTCROPDMG)),]
crop <- head(crop, 10)
print(crop)
## EVTYPE TOTCROPDMG
## 95 DROUGHT 13972566000
## 170 FLOOD 5661968450
## 590 RIVER FLOOD 5029459000
## 427 ICE STORM 5022113500
## 244 HAIL 3025954473
## 402 HURRICANE 2741910000
## 411 HURRICANE/TYPHOON 2607872800
## 153 FLASH FLOOD 1421317100
## 140 EXTREME COLD 1292973000
## 212 FROST/FREEZE 1094086000
Plots for the property and crop damage for the top 10 events:
propplot <- ggplot(prop, aes(x = reorder(EVTYPE, -TOTPROPDMG), y = TOTPROPDMG)) +
geom_bar(stat = "identity", fill = "pink") +
theme(axis.text.x = element_text(angle = 90, vjust = .5, hjust = 1)) +
xlab("Event") + ylab("Property Damage (in $)")
cropplot <- ggplot(crop, aes(x = reorder(EVTYPE, - TOTCROPDMG), y = TOTCROPDMG)) +
geom_bar(stat = "identity", fill = "pink") +
theme(axis.text.x = element_text(angle = 90, vjust = .5, hjust = 1)) +
xlab("Event") + ylab("Crop Damage (in $)")
grid.arrange(propplot, cropplot, ncol = 2, nrow = 1, top = textGrob("Events with Greatest Economic Consequences",gp = gpar(fontsize = 14, font = 3)))
Fig. 2
The plots above show that flood events cause the greatest property damage while drought events cause the greatest crop damage. Flood events are second on the crop damage list, which means they have the greatest economic consequences.
The following statements can be confirmed from our analysis: