In this analysis, I looked at the NOAA Storm Database to determine the most harmful weather events. I first prepared the data and summarized it so that I could view the harmful weather events by population health and then economic hardship. The population health consisted of a combination of injuries and deaths caused by the weather event. The economic hardship consisted of a combination of property damage and crop damage. The most harmful weather event according to population health was tornadoes and the most harmful weather event according to economic consequences was floods.
In this section I will be showing how I first loaded the data and then pre-processed it in order to get it summarized and in a form where I was able to do my plots and analysis. First I will load the data, preview it, and subset it so that I only have the columns of interest.
setwd("~/Desktop/R files/Reproducible Research/Week 4")
stormdata <- read.csv("repdata-data-StormData.csv", header = TRUE)
head(stormdata)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
stormdata_wanted <- subset(stormdata[,c("EVTYPE", "FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")])
head(stormdata_wanted)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
Next, I will pre-process the data a bit. First, I will create a new column in my data frame that gives the total number of people who were harmed by an event - due to either an injury or death.
stormdata_wanted$TOTALHARM <- stormdata_wanted$FATALITIES + stormdata_wanted$INJURIES
Next, I will look at the property damage and crop damage columns. I want to convert the damage to the actual numbers by using the exponent column. First I view the values of the exponent column.
unique(stormdata_wanted$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
By looking at the values in this column, I see that they are not all numbers, therefore I need to replace the letters with numbers. H stands for hundreds (10^2), so I will replace with a 2. K stands for thousands (10^3), so replace with a 3, M stands for millions (10^6), so replace with a 6, and B stands for billions (10^9), so replace with a 9. The extra symbols I will replace with a 0 to essentially become a 10^0 or 1.
stormdata_wanted$PROPDMGEXP <- gsub("K","3",stormdata_wanted$PROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$PROPDMGEXP <- gsub("M","6",stormdata_wanted$PROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$PROPDMGEXP <- gsub("B","9",stormdata_wanted$PROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$PROPDMGEXP <- gsub("H","2",stormdata_wanted$PROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$PROPDMGEXP <- gsub("\\ ","0",stormdata_wanted$PROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$PROPDMGEXP <- gsub("\\+","0",stormdata_wanted$PROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$PROPDMGEXP <- gsub("\\-","0",stormdata_wanted$PROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$PROPDMGEXP <- gsub("\\?","0",stormdata_wanted$PROPDMGEXP, ignore.case = TRUE)
Next, I will make sure this column is numeric, change all of the NAs to 0, and then create my new column in the data frame with the total property damage cost.
stormdata_wanted$PROPDMGEXP <- as.numeric(stormdata_wanted$PROPDMGEXP)
stormdata_wanted$PROPDMGEXP[is.na(stormdata_wanted$PROPDMGEXP)] <- 0
stormdata_wanted$PROPMONEY <- stormdata_wanted$PROPDMG * 10 ^ stormdata_wanted$PROPDMGEXP
Next, I need to repeat this whole process for the crop damage data and create a new total crop damage cost column. The replacements for the exponent column will be the same as property damage.
unique(stormdata_wanted$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
stormdata_wanted$CROPDMGEXP <- gsub("K","3",stormdata_wanted$CROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$CROPDMGEXP<- gsub("M","6",stormdata_wanted$CROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$CROPDMGEXP <- gsub("B","9",stormdata_wanted$CROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$CROPDMGEXP <- gsub("\\ ","0",stormdata_wanted$CROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$CROPDMGEXP <- gsub("\\?","0",stormdata_wanted$CROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$CROPDMGEXP <- as.numeric(stormdata_wanted$CROPDMGEXP)
stormdata_wanted$CROPDMGEXP[is.na(stormdata_wanted$CROPDMGEXP)] <- 0
stormdata_wanted$CROPMONEY <- stormdata_wanted$CROPDMG * 10 ^ stormdata_wanted$CROPDMGEXP
Next I will combine the total crop damage money and total property damage money to see the total economic consequence. I will create a new column in the data frame with these values.
stormdata_wanted$TOTALMONEY <- stormdata_wanted$PROPMONEY + stormdata_wanted$CROPMONEY
head(stormdata_wanted)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP TOTALHARM
## 1 TORNADO 0 15 25.0 3 0 0 15
## 2 TORNADO 0 0 2.5 3 0 0 0
## 3 TORNADO 0 2 25.0 3 0 0 2
## 4 TORNADO 0 2 2.5 3 0 0 2
## 5 TORNADO 0 2 2.5 3 0 0 2
## 6 TORNADO 0 6 2.5 3 0 0 6
## PROPMONEY CROPMONEY TOTALMONEY
## 1 25000 0 25000
## 2 2500 0 2500
## 3 25000 0 25000
## 4 2500 0 2500
## 5 2500 0 2500
## 6 2500 0 2500
Now, in order to view the most harmful weather events, I need to summarize the data according to total harm (injuries and death) and total economic damage (property and crop).
total_harm_data <- aggregate(TOTALHARM ~ EVTYPE, stormdata_wanted, sum)
total_money_data <- aggregate(TOTALMONEY ~ EVTYPE, stormdata_wanted, sum)
I will then order my new data frames in decreasing order and view the beginning of the new data frame.
total_harm_data_ordered <- total_harm_data[order(total_harm_data$TOTALHARM, decreasing = TRUE), ]
total_money_data_ordered <- total_money_data[order(total_money_data$TOTALMONEY, decreasing = TRUE), ]
head(total_harm_data_ordered)
## EVTYPE TOTALHARM
## 834 TORNADO 96979
## 130 EXCESSIVE HEAT 8428
## 856 TSTM WIND 7461
## 170 FLOOD 7259
## 464 LIGHTNING 6046
## 275 HEAT 3037
head(total_money_data_ordered)
## EVTYPE TOTALMONEY
## 170 FLOOD 150319678257
## 411 HURRICANE/TYPHOON 71913712800
## 834 TORNADO 57362333946
## 670 STORM SURGE 43323541000
## 244 HAIL 18761221986
## 153 FLASH FLOOD 18243991078
Now that I have the ordered data frames, in order to answer the questions of the most harmful weather events, I will subset the new data frames and only look at the top 20 most harmful weather events due to population health and economic consequence.
top20_harm <- total_harm_data_ordered[1:20,]
top20_money <- total_money_data_ordered[1:20,]
top20_harm
## EVTYPE TOTALHARM
## 834 TORNADO 96979
## 130 EXCESSIVE HEAT 8428
## 856 TSTM WIND 7461
## 170 FLOOD 7259
## 464 LIGHTNING 6046
## 275 HEAT 3037
## 153 FLASH FLOOD 2755
## 427 ICE STORM 2064
## 760 THUNDERSTORM WIND 1621
## 972 WINTER STORM 1527
## 359 HIGH WIND 1385
## 244 HAIL 1376
## 411 HURRICANE/TYPHOON 1339
## 310 HEAVY SNOW 1148
## 957 WILDFIRE 986
## 786 THUNDERSTORM WINDS 972
## 30 BLIZZARD 906
## 188 FOG 796
## 585 RIP CURRENT 600
## 955 WILD/FOREST FIRE 557
top20_money
## EVTYPE TOTALMONEY
## 170 FLOOD 150319678257
## 411 HURRICANE/TYPHOON 71913712800
## 834 TORNADO 57362333946
## 670 STORM SURGE 43323541000
## 244 HAIL 18761221986
## 153 FLASH FLOOD 18243991078
## 95 DROUGHT 15018672000
## 402 HURRICANE 14610229010
## 590 RIVER FLOOD 10148404500
## 427 ICE STORM 8967041360
## 848 TROPICAL STORM 8382236550
## 972 WINTER STORM 6715441251
## 359 HIGH WIND 5908617595
## 957 WILDFIRE 5060586800
## 856 TSTM WIND 5038935845
## 671 STORM SURGE/TIDE 4642038000
## 760 THUNDERSTORM WIND 3897965522
## 408 HURRICANE OPAL 3191846000
## 955 WILD/FOREST FIRE 3108626330
## 299 HEAVY RAIN/SEVERE WEATHER 2500000000
In order to visually see the results I will use barplots to show the top 20 most harmful weather events. This first barplot shows the top 20 most harmful weather events according to population health (total of injuries and deaths).
par(mar = c(9,6,4,2))
barplot(top20_harm$TOTALHARM, col = "red", names.arg = top20_harm$EVTYPE, las = 2, cex.names = 0.6, main = "Top 20 Most Harmful Weather Events According \n to Population Health")
mtext(text = "Weather Event Type", side = 1, line = 8)
mtext(text = "Total Number of Harmed Individuals", side = 2, line = 5)
This next barplot shows the top 20 most harmful weather events according to economic consequences (total of property and crop damages).
par(mar = c(9,6,4,2))
barplot(top20_money$TOTALMONEY, col = "red", names.arg = top20_money$EVTYPE, las = 2, cex.names = 0.6, main = "Top 20 Most Harmful Weather Events According \n to Economic Consequences - Property and Crop Damage")
mtext(text = "Weather Event Type", side = 1, line = 8)
mtext(text = "Total Amount of Property and Crop Damage", side = 2, line = 5)
As we can see from both of these barplots, the most harmful weather event according to population health is Tornadoes and the most harmful weather event according to economic consequences is Floods.