Storms and other severe weather events can cause severe problems for communities and municipalities with respect to both the public health and the economy. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This report explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and aims to answer the following questions:
1.Across the United States, which types of events (as indicated in the Event variable) are most harmful with respect to population health?
2.Across the United States, which types of events have the greatest economic consequences?
The results show that, in the past 60 years, tornados are most harmful with respect to population health, which have led to 5633 deaths and 91346 injuries and floods have the greatest economic consequences, which have cause over 150 billion dollars in economic losses.
First, load the data which is in .bz2 format, into a data frame in R
We shall use the following library
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Verify the number of rows and columns in the entire data set
dim(Data)
## [1] 902297 37
View the first 10 rows of the data set
head(Data, 10)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## 7 1 11/16/1951 0:00:00 0100 CST 9 BLOUNT AL
## 8 1 1/22/1952 0:00:00 0900 CST 123 TALLAPOOSA AL
## 9 1 2/13/1952 0:00:00 2000 CST 125 TUSCALOOSA AL
## 10 1 2/13/1952 0:00:00 2000 CST 57 FAYETTE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## 7 TORNADO 0 0
## 8 TORNADO 0 0
## 9 TORNADO 0 0
## 10 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## 7 NA 0 1.5 33 2 0 0
## 8 NA 0 0.0 33 1 0 0
## 9 NA 0 3.3 100 3 0 1
## 10 NA 0 2.3 100 3 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## 7 1 2.5 K 0
## 8 0 2.5 K 0
## 9 14 25.0 K 0
## 10 0 25.0 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
## 7 3405 8631 0 0 7
## 8 3255 8558 0 0 8
## 9 3334 8740 3336 8738 9
## 10 3336 8738 3337 8737 10
Extract the unique values for EVTYPE (event types)
EventTypes_Unique <- unique(Data$EVTYPE)
The two measurements in the dataset we need to use to calculate the effects on human population are the fields Injuries and Fatalities. We sum the two fields to find out the most harmful type of event.
TotalCasualties <- with(Data, aggregate(INJURIES + FATALITIES ~ EVTYPE, FUN = "sum"))
Adjust the field names appropriately
names(TotalCasualties) <- c('Event Type', 'Total Casualties')
Sort the data frame by Total Casualties in descending order
TotalCasualties_Ordered <- TotalCasualties[ order(TotalCasualties$`Total Casualties`,decreasing = TRUE), ]
Since there are many different events, we will only show the top 10 contributors to this analysis
TotalCasualties_Top10 <- head(TotalCasualties_Ordered, 10)
with(
TotalCasualties_Top10,
barplot(
`Total Casualties`,
names.arg = `Event Type`,
main = "Top 10 events most harmful to population health",
xlab = "Event Type",
ylab = "Total Number of casualties",
cex.axis=0.7, cex.names=0.75,
col = heat.colors(12),
legend.text = `Event Type`,
args.legend = list(x = "topright")
)
)
This analysis shows that the Tornadoes are the most harmful events towards human health in United States.
Create a new data frame. This data frame shall consist of Crop Damage and Property Damages by event type.
Data_EconomicExpenses <- filter(
Data[,c('EVTYPE', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')],
Data$CROPDMGEXP == "H" |
Data$CROPDMGEXP == "M" |
Data$CROPDMGEXP == "K" |
Data$CROPDMGEXP == "B" |
Data$PROPDMGEXP == "H" |
Data$PROPDMGEXP == "M" |
Data$PROPDMGEXP == "K" |
Data$PROPDMGEXP == "B"
)
Since the expenses are some represented in millions, billions, hundreds, and thousands, we need to convert all the figures into the same type.
Data_EconomicExpenses <-
mutate(
Data_EconomicExpenses,
CropDamage = ifelse(CROPDMGEXP == "M", CROPDMG * 1000000, ifelse(CROPDMGEXP == "B", CROPDMG * 1000000000, ifelse(CROPDMGEXP == "K", CROPDMG * 1000, ifelse(CROPDMGEXP == "H", CROPDMG * 100, CROPDMG)))),
PropDamage = ifelse(PROPDMGEXP == "M", PROPDMG * 1000000, ifelse(PROPDMGEXP == "B", PROPDMG * 1000000000, ifelse(PROPDMGEXP == "K", PROPDMG * 1000, ifelse(PROPDMGEXP == "H", PROPDMG * 100, PROPDMG))))
)
Create a new field named Total Damage which is a total of Crop Damage and Property Damage
Data_EconomicExpenses <- mutate( Data_EconomicExpenses, TotalDamage = CropDamage+PropDamage )
Aggregate Total Damage by Event Type
TotalEconomicConsequences <- with(Data_EconomicExpenses, aggregate(TotalDamage ~ EVTYPE, FUN = "sum"))
Create more appropriatee column names
names(TotalEconomicConsequences) <- c('Event Type', 'Total Economic Damage')
Sort the data frame by Total Economic Damage in descending order
TotalEconomicConsequences_Ordered <- TotalEconomicConsequences[ order(TotalEconomicConsequences$`Total Economic Damage`, decreasing = TRUE), ]
Since there are many different events, we will only show the top 10 contributors to this analysis
TotalEconomicConsequences_Top10 <- head(TotalEconomicConsequences_Ordered, 10)
with(
TotalEconomicConsequences_Top10,
barplot(
`Total Economic Damage`/ 1000000000,
names.arg = `Event Type`,
main = "Top 10 events most harmful to the economy",
xlab = "Event Type",
ylab = "Total Damages (in Billions of Dollars)",
cex.axis=0.7, cex.names=0.75,
col = heat.colors(12),
legend.text = `Event Type`,
args.legend = list(x = "topright")
)
)
This analysis shows that Floods are the major contributors towards economic consequences in United States.