In this analysis, I looked at the NOAA Storm Database to determine the most harmful weather events. I first prepared the data and summarized it so that I could view the harmful weather events by population health and then economic hardship. The population health consisted of a combination of injuries and deaths caused by the weather event. The economic hardship consisted of a combination of property damage and crop damage. The most harmful weather event according to population health was tornadoes and the most harmful weather event according to economic consequences was floods.

Data Processing

In this section I will be showing how I first loaded the data and then pre-processed it in order to get it summarized and in a form where I was able to do my plots and analysis. First I will load the data, preview it, and subset it so that I only have the columns of interest.

setwd("~/Desktop/R files/Reproducible Research/Week 4")
stormdata <- read.csv("repdata-data-StormData.csv", header = TRUE)
head(stormdata)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6
stormdata_wanted <- subset(stormdata[,c("EVTYPE", "FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")])
head(stormdata_wanted)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0

Next, I will pre-process the data a bit. First, I will create a new column in my data frame that gives the total number of people who were harmed by an event - due to either an injury or death.

stormdata_wanted$TOTALHARM <- stormdata_wanted$FATALITIES + stormdata_wanted$INJURIES

Next, I will look at the property damage and crop damage columns. I want to convert the damage to the actual numbers by using the exponent column. First I view the values of the exponent column.

unique(stormdata_wanted$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"

By looking at the values in this column, I see that they are not all numbers, therefore I need to replace the letters with numbers. H stands for hundreds (10^2), so I will replace with a 2. K stands for thousands (10^3), so replace with a 3, M stands for millions (10^6), so replace with a 6, and B stands for billions (10^9), so replace with a 9. The extra symbols I will replace with a 0 to essentially become a 10^0 or 1.

stormdata_wanted$PROPDMGEXP <- gsub("K","3",stormdata_wanted$PROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$PROPDMGEXP <- gsub("M","6",stormdata_wanted$PROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$PROPDMGEXP <- gsub("B","9",stormdata_wanted$PROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$PROPDMGEXP <- gsub("H","2",stormdata_wanted$PROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$PROPDMGEXP <- gsub("\\ ","0",stormdata_wanted$PROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$PROPDMGEXP <- gsub("\\+","0",stormdata_wanted$PROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$PROPDMGEXP <- gsub("\\-","0",stormdata_wanted$PROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$PROPDMGEXP <- gsub("\\?","0",stormdata_wanted$PROPDMGEXP, ignore.case = TRUE)

Next, I will make sure this column is numeric, change all of the NAs to 0, and then create my new column in the data frame with the total property damage cost.

stormdata_wanted$PROPDMGEXP <- as.numeric(stormdata_wanted$PROPDMGEXP)
stormdata_wanted$PROPDMGEXP[is.na(stormdata_wanted$PROPDMGEXP)] <- 0
stormdata_wanted$PROPMONEY <- stormdata_wanted$PROPDMG * 10 ^ stormdata_wanted$PROPDMGEXP

Next, I need to repeat this whole process for the crop damage data and create a new total crop damage cost column. The replacements for the exponent column will be the same as property damage.

unique(stormdata_wanted$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"
stormdata_wanted$CROPDMGEXP <- gsub("K","3",stormdata_wanted$CROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$CROPDMGEXP<- gsub("M","6",stormdata_wanted$CROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$CROPDMGEXP <- gsub("B","9",stormdata_wanted$CROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$CROPDMGEXP <- gsub("\\ ","0",stormdata_wanted$CROPDMGEXP, ignore.case = TRUE)
stormdata_wanted$CROPDMGEXP <- gsub("\\?","0",stormdata_wanted$CROPDMGEXP, ignore.case = TRUE)

stormdata_wanted$CROPDMGEXP <- as.numeric(stormdata_wanted$CROPDMGEXP)
stormdata_wanted$CROPDMGEXP[is.na(stormdata_wanted$CROPDMGEXP)] <- 0
stormdata_wanted$CROPMONEY <- stormdata_wanted$CROPDMG * 10 ^ stormdata_wanted$CROPDMGEXP

Next I will combine the total crop damage money and total property damage money to see the total economic consequence. I will create a new column in the data frame with these values.

stormdata_wanted$TOTALMONEY <- stormdata_wanted$PROPMONEY + stormdata_wanted$CROPMONEY
head(stormdata_wanted)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP TOTALHARM
## 1 TORNADO          0       15    25.0          3       0          0        15
## 2 TORNADO          0        0     2.5          3       0          0         0
## 3 TORNADO          0        2    25.0          3       0          0         2
## 4 TORNADO          0        2     2.5          3       0          0         2
## 5 TORNADO          0        2     2.5          3       0          0         2
## 6 TORNADO          0        6     2.5          3       0          0         6
##   PROPMONEY CROPMONEY TOTALMONEY
## 1     25000         0      25000
## 2      2500         0       2500
## 3     25000         0      25000
## 4      2500         0       2500
## 5      2500         0       2500
## 6      2500         0       2500

Now, in order to view the most harmful weather events, I need to summarize the data according to total harm (injuries and death) and total economic damage (property and crop).

total_harm_data <- aggregate(TOTALHARM ~ EVTYPE, stormdata_wanted, sum)
total_money_data <- aggregate(TOTALMONEY ~ EVTYPE, stormdata_wanted, sum)

I will then order my new data frames in decreasing order and view the beginning of the new data frame.

total_harm_data_ordered <- total_harm_data[order(total_harm_data$TOTALHARM, decreasing = TRUE), ]
total_money_data_ordered <- total_money_data[order(total_money_data$TOTALMONEY, decreasing = TRUE), ]

head(total_harm_data_ordered)
##             EVTYPE TOTALHARM
## 834        TORNADO     96979
## 130 EXCESSIVE HEAT      8428
## 856      TSTM WIND      7461
## 170          FLOOD      7259
## 464      LIGHTNING      6046
## 275           HEAT      3037
head(total_money_data_ordered)
##                EVTYPE   TOTALMONEY
## 170             FLOOD 150319678257
## 411 HURRICANE/TYPHOON  71913712800
## 834           TORNADO  57362333946
## 670       STORM SURGE  43323541000
## 244              HAIL  18761221986
## 153       FLASH FLOOD  18243991078

Results

Now that I have the ordered data frames, in order to answer the questions of the most harmful weather events, I will subset the new data frames and only look at the top 20 most harmful weather events due to population health and economic consequence.

top20_harm <- total_harm_data_ordered[1:20,]
top20_money <- total_money_data_ordered[1:20,]

top20_harm
##                 EVTYPE TOTALHARM
## 834            TORNADO     96979
## 130     EXCESSIVE HEAT      8428
## 856          TSTM WIND      7461
## 170              FLOOD      7259
## 464          LIGHTNING      6046
## 275               HEAT      3037
## 153        FLASH FLOOD      2755
## 427          ICE STORM      2064
## 760  THUNDERSTORM WIND      1621
## 972       WINTER STORM      1527
## 359          HIGH WIND      1385
## 244               HAIL      1376
## 411  HURRICANE/TYPHOON      1339
## 310         HEAVY SNOW      1148
## 957           WILDFIRE       986
## 786 THUNDERSTORM WINDS       972
## 30            BLIZZARD       906
## 188                FOG       796
## 585        RIP CURRENT       600
## 955   WILD/FOREST FIRE       557
top20_money
##                        EVTYPE   TOTALMONEY
## 170                     FLOOD 150319678257
## 411         HURRICANE/TYPHOON  71913712800
## 834                   TORNADO  57362333946
## 670               STORM SURGE  43323541000
## 244                      HAIL  18761221986
## 153               FLASH FLOOD  18243991078
## 95                    DROUGHT  15018672000
## 402                 HURRICANE  14610229010
## 590               RIVER FLOOD  10148404500
## 427                 ICE STORM   8967041360
## 848            TROPICAL STORM   8382236550
## 972              WINTER STORM   6715441251
## 359                 HIGH WIND   5908617595
## 957                  WILDFIRE   5060586800
## 856                 TSTM WIND   5038935845
## 671          STORM SURGE/TIDE   4642038000
## 760         THUNDERSTORM WIND   3897965522
## 408            HURRICANE OPAL   3191846000
## 955          WILD/FOREST FIRE   3108626330
## 299 HEAVY RAIN/SEVERE WEATHER   2500000000

In order to visually see the results I will use barplots to show the top 20 most harmful weather events. This first barplot shows the top 20 most harmful weather events according to population health (total of injuries and deaths).

par(mar = c(9,6,4,2))
barplot(top20_harm$TOTALHARM, col = "red", names.arg = top20_harm$EVTYPE, las = 2, cex.names = 0.6, main = "Top 20 Most Harmful Weather Events According \n to Population Health")
mtext(text = "Weather Event Type", side = 1, line = 8)
mtext(text = "Total Number of Harmed Individuals", side = 2, line = 5)

This next barplot shows the top 20 most harmful weather events according to economic consequences (total of property and crop damages).

par(mar = c(9,6,4,2))
barplot(top20_money$TOTALMONEY, col = "red", names.arg = top20_money$EVTYPE, las = 2, cex.names = 0.6, main = "Top 20 Most Harmful Weather Events According \n to Economic Consequences - Property and Crop Damage")
mtext(text = "Weather Event Type", side = 1, line = 8)
mtext(text = "Total Amount of Property and Crop Damage", side = 2, line = 5)

As we can see from both of these barplots, the most harmful weather event according to population health is Tornadoes and the most harmful weather event according to economic consequences is Floods.