This project analyzed the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to determine the effects of weather events on US population and economy. Impact on the populous, measured in injuries and fatalities, was caused by similar weather event patterns, with Tornados inflicting the harshest toll. Economic impact, measured in crop and property damage, followed a very different pattern of weather events, with Floods causing the largest total damage.
Download data set (if not present) and load into R.
stormData <- read.csv("repdata_data_StormData.csv")
#Visualizing data frame variables
head(stormData)## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
Note: Dataset avalaible in: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
This dataset consists of lot of information most of which is not required for our present study. So, here is the code to extract the required data for health and economic impact analysis against weather.
#Filtering and load only event types that cuases health and economic impact
eventType <- c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
workingData <- stormData[eventType]
head(workingData)## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
Property damage exponents for each level was listed out and assigned those values for the property exponent data. Invalid data was excluded by assigning the value as ‘0’. Then property damage value was calculated by multiplying the property damage and property exponent value.The code for this process was listed below
#Finding property damage levels and exponents
unique(workingData$PROPDMGEXP)## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
workingData$PROPEXP[workingData$PROPDMGEXP == "K"] <- 1000
workingData$PROPEXP[workingData$PROPDMGEXP == "M"] <- 1e+06
workingData$PROPEXP[workingData$PROPDMGEXP == ""] <- 1
workingData$PROPEXP[workingData$PROPDMGEXP == "B"] <- 1e+09
workingData$PROPEXP[workingData$PROPDMGEXP == "m"] <- 1e+06
workingData$PROPEXP[workingData$PROPDMGEXP == "0"] <- 1
workingData$PROPEXP[workingData$PROPDMGEXP == "5"] <- 1e+05
workingData$PROPEXP[workingData$PROPDMGEXP == "6"] <- 1e+06
workingData$PROPEXP[workingData$PROPDMGEXP == "4"] <- 10000
workingData$PROPEXP[workingData$PROPDMGEXP == "2"] <- 100
workingData$PROPEXP[workingData$PROPDMGEXP == "3"] <- 1000
workingData$PROPEXP[workingData$PROPDMGEXP == "h"] <- 100
workingData$PROPEXP[workingData$PROPDMGEXP == "7"] <- 1e+07
workingData$PROPEXP[workingData$PROPDMGEXP == "H"] <- 100
workingData$PROPEXP[workingData$PROPDMGEXP == "1"] <- 10
workingData$PROPEXP[workingData$PROPDMGEXP == "8"] <- 1e+08
workingData$PROPEXP[workingData$PROPDMGEXP == "+"] <- 0
workingData$PROPEXP[workingData$PROPDMGEXP == "-"] <- 0
workingData$PROPEXP[workingData$PROPDMGEXP == "?"] <- 0
workingData$PROPDMGVAL <- workingData$PROPDMG * workingData$PROPEXP
unique(workingData$CROPDMGEXP)## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
workingData$CROPEXP[workingData$CROPDMGEXP == "M"] <- 1e+06
workingData$CROPEXP[workingData$CROPDMGEXP == "K"] <- 1000
workingData$CROPEXP[workingData$CROPDMGEXP == "m"] <- 1e+06
workingData$CROPEXP[workingData$CROPDMGEXP == "B"] <- 1e+09
workingData$CROPEXP[workingData$CROPDMGEXP == "0"] <- 1
workingData$CROPEXP[workingData$CROPDMGEXP == "k"] <- 1000
workingData$CROPEXP[workingData$CROPDMGEXP == "2"] <- 100
workingData$CROPEXP[workingData$CROPDMGEXP == ""] <- 1
workingData$CROPEXP[workingData$CROPDMGEXP == "?"] <- 0
workingData$CROPDMGVAL <- workingData$CROPDMG * workingData$CROPEXPIt was observed that " most harmful to population health" events are fatalities and injuries.So,only those events with fatalities and injuries were selecetd.
It was observed that " most harmful to econamic problem“” events are Property and crop damages.So,only those events with property and crop damage were selecetd.
Then for each incident (Fatalities,Injuries, Property damage and Crop damage), the total values were estimated. Code for which is as follows.
# Totalling the data by event
fatal <- aggregate(FATALITIES ~ EVTYPE, workingData, FUN = sum)
injury <- aggregate(INJURIES ~ EVTYPE, workingData, FUN = sum)
propdmg <- aggregate(PROPDMGVAL ~ EVTYPE, workingData, FUN = sum)
cropdmg <- aggregate(CROPDMGVAL ~ EVTYPE, workingData, FUN = sum)Highest fatalities and highest injuries for Top 10 events were calculated. For better understanding and comparision these values were plotted as follows.
# Listing events with highest fatalities
fatal10 <- fatal[order(-fatal$FATALITIES), ][1:10, ]
# Listing events with highest injuries
injury10 <- injury[order(-injury$INJURIES), ][1:10, ]
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(fatal10$FATALITIES, las = 3, names.arg = fatal10$EVTYPE, main = "Events with Highest Fatalities",
ylab = "Number of fatalities", col = "lightblue")
barplot(injury10$INJURIES, las = 3, names.arg = injury10$EVTYPE, main = "Events with Highest Injuries",
ylab = "Number of injuries", col = "lightblue")Highest Property damage and highest crop damage for Top 10 events were calculated. For better understanding and comparision these values were plotted as follows.
# Finding events with highest property damage
propdmg10 <- propdmg[order(-propdmg$PROPDMGVAL), ][1:10, ]
# Finding events with highest crop damage
cropdmg10 <- cropdmg[order(-cropdmg$CROPDMGVAL), ][1:10, ]
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(propdmg10$PROPDMGVAL/(10^9), las = 3, names.arg = propdmg10$EVTYPE,
main = "Events with Highest Property Damages", ylab = "Damage Cost ($ billions)",
col = "light green")
barplot(cropdmg10$CROPDMGVAL/(10^9), las = 3, names.arg = cropdmg10$EVTYPE,
main = "Events With Highest Crop Damages", ylab = "Damage Cost ($ billions)",
col = "light green")Fatalities and injuries:
Property damages and crop damages