Synopsis

  Weather events have caused a lot of damage economically and resulted in many deaths and injuries.  Overall, the most harmful weather events have been tornados.  Tornados have caused, by far, the most property damage, crop damage, and human casualties (injuries + deaths).  Tornados caused more than 10 times the human casualties as their nearest competitor, excessive heat.  As for crop damage, tornados have caused roughly twice the monetary loss as their nearest competitor, flash flooding.  We transform the Storm Data dataset into several summarised datasets, then use figures to visualize the data.  
  

Data Processing

Load necessary packages, and load Storm Data from working directory.

library(ggplot2)
library(dplyr)
library(grid)

First we read in the data (repdata-data-StormData.csv.bz2 is saved in the working directory).

stormData <- read.csv("repdata-data-StormData.csv.bz2")
stormData <- tbl_df(stormData)

Look at the dimensions and first few rows of data.

dim(stormData)
## [1] 902297     37
print(stormData)
## Source: local data frame [902,297 x 37]
## 
##    STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME  STATE
##      (dbl)             (fctr)   (fctr)    (fctr)  (dbl)     (fctr) (fctr)
## 1        1  4/18/1950 0:00:00     0130       CST     97     MOBILE     AL
## 2        1  4/18/1950 0:00:00     0145       CST      3    BALDWIN     AL
## 3        1  2/20/1951 0:00:00     1600       CST     57    FAYETTE     AL
## 4        1   6/8/1951 0:00:00     0900       CST     89    MADISON     AL
## 5        1 11/15/1951 0:00:00     1500       CST     43    CULLMAN     AL
## 6        1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE     AL
## 7        1 11/16/1951 0:00:00     0100       CST      9     BLOUNT     AL
## 8        1  1/22/1952 0:00:00     0900       CST    123 TALLAPOOSA     AL
## 9        1  2/13/1952 0:00:00     2000       CST    125 TUSCALOOSA     AL
## 10       1  2/13/1952 0:00:00     2000       CST     57    FAYETTE     AL
## ..     ...                ...      ...       ...    ...        ...    ...
## Variables not shown: EVTYPE (fctr), BGN_RANGE (dbl), BGN_AZI (fctr),
##   BGN_LOCATI (fctr), END_DATE (fctr), END_TIME (fctr), COUNTY_END (dbl),
##   COUNTYENDN (lgl), END_RANGE (dbl), END_AZI (fctr), END_LOCATI (fctr),
##   LENGTH (dbl), WIDTH (dbl), F (int), MAG (dbl), FATALITIES (dbl),
##   INJURIES (dbl), PROPDMG (dbl), PROPDMGEXP (fctr), CROPDMG (dbl),
##   CROPDMGEXP (fctr), WFO (fctr), STATEOFFIC (fctr), ZONENAMES (fctr),
##   LATITUDE (dbl), LONGITUDE (dbl), LATITUDE_E (dbl), LONGITUDE_ (dbl),
##   REMARKS (fctr), REFNUM (dbl)

The transformations and their justifications will accompany the results in the following section.

Results

1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

We create a new variable CASUALTIES = FATALITIES + INJURIES, then consider the cumulative CASUALTIES for each EVTYPE.

stormData1 <- stormData %>%
      mutate(CASUALTIES = FATALITIES + INJURIES) %>%
      group_by(EVTYPE) %>%
      summarise(TOTAL_CASUALTIES = sum(CASUALTIES, na.rm = T)) %>%
      arrange(desc(TOTAL_CASUALTIES)) %>%
      slice(1:20) %>%
      print()
## Source: local data frame [20 x 2]
## 
##                EVTYPE TOTAL_CASUALTIES
##                (fctr)            (dbl)
## 1             TORNADO            96979
## 2      EXCESSIVE HEAT             8428
## 3           TSTM WIND             7461
## 4               FLOOD             7259
## 5           LIGHTNING             6046
## 6                HEAT             3037
## 7         FLASH FLOOD             2755
## 8           ICE STORM             2064
## 9   THUNDERSTORM WIND             1621
## 10       WINTER STORM             1527
## 11          HIGH WIND             1385
## 12               HAIL             1376
## 13  HURRICANE/TYPHOON             1339
## 14         HEAVY SNOW             1148
## 15           WILDFIRE              986
## 16 THUNDERSTORM WINDS              972
## 17           BLIZZARD              906
## 18                FOG              796
## 19        RIP CURRENT              600
## 20   WILD/FOREST FIRE              557

We visualize the top 20 events most hazardous to human population health in the following plot.

p <- ggplot(stormData1, aes(x = EVTYPE, y = TOTAL_CASUALTIES, fill = TOTAL_CASUALTIES)) +
      geom_bar(stat = "identity") + 
      theme(axis.text.x = element_text(angle = 90, hjust = 1)) 

ymin <- min(stormData1$TOTAL_CASUALTIES)
ymax <- max(stormData1$TOTAL_CASUALTIES)

p <- p + coord_cartesian(ylim=c(ymin, 0.1*ymax)) + 
      scale_fill_gradientn(colours=rainbow(5)) + 
      labs(xlab(""), ylab("Total Casualties")) +
      ylab("Total Casualties") +
      ggtitle("Top 20 Weather Events Harmful to Human Health")
print(p)

The figure indicates that Tornados, by far, cause the highest number of casualties. In fact, the numer of casualties inflicted by tornados is about 100,000, which is more than 10 times that of any of the other top 20.

2. Across the United States, which types of events have the greatest economic consequences?

Note that the factor levels of PROPDMGEXP and CROPDMGEXP are unintelligible:

levels(stormData$PROPDMGEXP)
##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
levels(stormData$CROPDMGEXP)
## [1] ""  "?" "0" "2" "B" "k" "K" "m" "M"

We rename the factor levels according to convention adopted at https://rstudio-pubs-static.s3.amazonaws.com/58957_37b6723ee52b455990e149edde45e5b6.html, then transform PROPDMGEXP and CROPDMGEXP to type numeric in stormData. We form new variables representing the total property and crop damage, named totalPropDmg and totalCropDmg, respectively.

levels(stormData$PROPDMGEXP) <- c(0, 0, 0, 1, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10^9, 10^2, 10^2, 10^3, 10^6, 10^6)
levels(stormData$CROPDMGEXP) <- c(0, 0, 10, 10, 10^9, 10^3, 10^3, 10^6, 10^6)
stormData <- stormData %>%
      mutate(PROPDMGEXP = as.numeric(PROPDMGEXP), 
             CROPDMGEXP = as.numeric(CROPDMGEXP),
             totalPropDmg = PROPDMG*PROPDMGEXP,
             totalCropDmg = CROPDMG*CROPDMGEXP,
             totalDamage = totalPropDmg + totalCropDmg)

The following summarised dataset will come in handy later for graphing purposes.

stormDataSummarised <- stormData %>%
      group_by(EVTYPE) %>%
      summarise(propDamageSum = sum(totalPropDmg), cropDamageSum = sum(totalCropDmg)) %>%
      print()
## Source: local data frame [985 x 3]
## 
##                   EVTYPE propDamageSum cropDamageSum
##                   (fctr)         (dbl)         (dbl)
## 1     HIGH SURF ADVISORY          1200             0
## 2          COASTAL FLOOD             0             0
## 3            FLASH FLOOD           300             0
## 4              LIGHTNING             0             0
## 5              TSTM WIND           656             0
## 6        TSTM WIND (G45)            48             0
## 7             WATERSPOUT             0             0
## 8                   WIND             0             0
## 9                      ?            30             0
## 10       ABNORMAL WARMTH             0             0
## ..                   ...           ...           ...

Let us process stormData by summarising the total economic damages inflicted by each type of event. Here we create a summary data frame for each type of damage, then merge the two.

stormDataProp <- stormData %>%
      group_by(EVTYPE) %>%
      summarise(propDamageSum = sum(totalPropDmg)) %>%
      arrange(desc(propDamageSum)) %>%
      slice(1:20)

stormDataCrop <- stormData %>%
      group_by(EVTYPE) %>%
      summarise(cropDamageSum = sum(totalCropDmg)) %>%
      arrange(desc(cropDamageSum)) %>%
      slice(1:20)

stormDataCropProp <- merge(x = stormDataProp, y = stormDataCrop, all = T) 
print(stormDataCropProp)
##                  EVTYPE propDamageSum cropDamageSum
## 1               DROUGHT            NA     148044.10
## 2          EXTREME COLD            NA      25772.70
## 3           FLASH FLOOD     8532395.2     718045.20
## 4        FLASH FLOODING      171242.5      20514.20
## 5                 FLOOD     5420630.1     677650.95
## 6          FROST/FREEZE            NA      29224.70
## 7                  HAIL     4144326.7    2320784.95
## 8            HEAVY RAIN      305696.9      45214.20
## 9            HEAVY SNOW      734318.0            NA
## 10            HIGH WIND     1951880.8      69754.75
## 11           HIGH WINDS      333988.3            NA
## 12            HURRICANE            NA      24096.55
## 13    HURRICANE/TYPHOON            NA      20286.58
## 14            ICE STORM      399736.9            NA
## 15            LIGHTNING     3619930.4      14330.96
## 16          STRONG WIND      378075.2            NA
## 17    THUNDERSTORM WIND     5263238.0     267514.20
## 18   THUNDERSTORM WINDS     2660174.6      74727.95
## 19              TORNADO    19321049.9     400069.49
## 20       TROPICAL STORM      293039.8      24269.60
## 21            TSTM WIND     8018780.8     437255.65
## 22       TSTM WIND/HAIL            NA      17487.00
## 23 URBAN/SML STREAM FLD      156343.9            NA
## 24     WILD/FOREST FIRE      237530.6      16860.87
## 25             WILDFIRE      510398.3      17748.20
## 26         WINTER STORM      797867.9            NA

Before visualizing the data, we fill in the missing values. Since we chose the top 20 events by property and crop damage individually, there are now 26 event types in EVTYPE. We replace the missing values in the second and third columns with those found above in stormDataSummarised.

events <- stormDataCropProp$EVTYPE 

newLogical <- stormDataSummarised$EVTYPE %in% events
newProp <- stormDataSummarised$propDamageSum[newLogical]
newCrop <- stormDataSummarised$cropDamageSum[newLogical]

stormDataCropProp$propDamageSum <- newProp
stormDataCropProp$cropDamageSum <- newCrop

print(stormDataCropProp)
##                  EVTYPE propDamageSum cropDamageSum
## 1               DROUGHT      25637.35     148044.10
## 2          EXTREME COLD      46005.38      25772.70
## 3           FLASH FLOOD    8532395.22     718045.20
## 4        FLASH FLOODING     171242.45      20514.20
## 5                 FLOOD    5420630.06     677650.95
## 6          FROST/FREEZE       5819.64      29224.70
## 7                  HAIL    4144326.74    2320784.95
## 8            HEAVY RAIN     305696.89      45214.20
## 9            HEAVY SNOW     734317.99       8795.50
## 10            HIGH WIND    1951880.76      69754.75
## 11           HIGH WINDS     333988.35       7077.40
## 12            HURRICANE      99229.65      24096.55
## 13    HURRICANE/TYPHOON      38709.09      20286.58
## 14            ICE STORM     399736.88       6771.25
## 15            LIGHTNING    3619930.38      14330.96
## 16          STRONG WIND     378075.22       6531.00
## 17    THUNDERSTORM WIND    5263238.05     267514.20
## 18   THUNDERSTORM WINDS    2660174.61      74727.95
## 19              TORNADO   19321049.94     400069.49
## 20       TROPICAL STORM     293039.76      24269.60
## 21            TSTM WIND    8018780.83     437255.65
## 22       TSTM WIND/HAIL      50265.00      17487.00
## 23 URBAN/SML STREAM FLD     156343.93      11180.90
## 24     WILD/FOREST FIRE     237530.65      16860.87
## 25             WILDFIRE     510398.26      17748.20
## 26         WINTER STORM     797867.88       7940.95

Now we visualize the events that caused the most property and crop damage, along with those that caused the most total damage (sum of property and crop damages).

p4 <- ggplot(stormDataCropProp, aes(x = EVTYPE, y = propDamageSum/(10^6))) +
      geom_point(aes(color = "Property Damage", shape = "Property Damage")) +
      geom_point(aes(x = EVTYPE, y = cropDamageSum/(10^6), color = "Crop Damage", shape = "Crop Damage")) +
      theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
      xlab("") +
      ylab("Economic Loss ($millions)") +
      ggtitle("Economic Consequences of Weaher Events") 
print(p4)

The figure above shows the economic losses caused by weather events. The events included registered as top 20 for either crop damage or property damage; there are 26 total. It is evident that for property damage, tornados are the biggest culprit, while hail is the biggest culprit for crop damage.

Let’s look at the 20 weather events that have caused the most total damage (property + crop).

top20 <- stormData %>%
      select(EVTYPE, totalDamage) %>%
      group_by(EVTYPE) %>%
      summarise(totalDamageSum = sum(totalDamage)) %>%
      arrange(desc(totalDamageSum)) %>%
      slice(1:20)
print(top20)
## Source: local data frame [20 x 2]
## 
##                EVTYPE totalDamageSum
##                (fctr)          (dbl)
## 1             TORNADO     19721119.4
## 2         FLASH FLOOD      9250440.4
## 3           TSTM WIND      8456036.5
## 4                HAIL      6465111.7
## 5               FLOOD      6098281.0
## 6   THUNDERSTORM WIND      5530752.2
## 7           LIGHTNING      3634261.3
## 8  THUNDERSTORM WINDS      2734902.6
## 9           HIGH WIND      2021635.5
## 10       WINTER STORM       805808.8
## 11         HEAVY SNOW       743113.5
## 12           WILDFIRE       528146.5
## 13          ICE STORM       406508.1
## 14        STRONG WIND       384606.2
## 15         HEAVY RAIN       350911.1
## 16         HIGH WINDS       341065.8
## 17     TROPICAL STORM       317309.4
## 18   WILD/FOREST FIRE       254391.5
## 19     FLASH FLOODING       191756.6
## 20            DROUGHT       173681.5
p3 <- ggplot(top20, aes(x = EVTYPE, y = totalDamageSum/(10^6))) +
       geom_bar(stat = "identity") + 
       theme(axis.text.x = element_text(angle = 90, hjust = 1)) 
p3 <- p3 + 
      ylab("Total Damage ($millions)") +
      xlab("") +
      ggtitle("Weather Events and Damage")
print(p3)

The figure above shows the 20 events that caused the hightest amount of total economic damage (property + crop). Unsurprisingly, property damage accounts for the bulk of the total damage, which is why the pattern in this figure looks so similar to that of the blue triangles in the previous figure.