Title: National Weather Service Storm Data Analysis on different event types harming population health and causing economic damage

Synopsis

Analyzing national weather service storm data, it was found out that among almost a thousand weather types, the top 10 event types that lead to the most population health damage are heat wave, tropical storm gordon, tornadoes, tstm wind, hail, cold and snow, thunderstormw, high wind and seas, heat wave drought, snow/high winds, winter storm high winds. The top 10 event types that lead to the most economic damage are tropical storm gordon, coastal erosion, heavy rain and flood, river and stream flood, landslump, dust storm/high winds, high winds/cold, forest fires, blizzard/winter storm, flash flood. Beside, we found out that events that cause the most fatalities have little intersection with events that cause the most injuries.

Data Processing

Loading and preprocessing the data

setwd("~/Desktop/JHU/Reproducible_Research/PA2")
df <- read.csv("repdata-data-StormData.csv")
head(df)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6
# extract data useful for this task
df <- df[, c(8, 23, 24, 25, 26, 27, 28)]
summary(df)
##                EVTYPE         FATALITIES     INJURIES         PROPDMG    
##  HAIL             :288661   Min.   :  0   Min.   :   0.0   Min.   :   0  
##  TSTM WIND        :219940   1st Qu.:  0   1st Qu.:   0.0   1st Qu.:   0  
##  THUNDERSTORM WIND: 82563   Median :  0   Median :   0.0   Median :   0  
##  TORNADO          : 60652   Mean   :  0   Mean   :   0.2   Mean   :  12  
##  FLASH FLOOD      : 54277   3rd Qu.:  0   3rd Qu.:   0.0   3rd Qu.:   0  
##  FLOOD            : 25326   Max.   :583   Max.   :1700.0   Max.   :5000  
##  (Other)          :170878                                                
##    PROPDMGEXP        CROPDMG        CROPDMGEXP    
##         :465934   Min.   :  0.0          :618413  
##  K      :424665   1st Qu.:  0.0   K      :281832  
##  M      : 11330   Median :  0.0   M      :  1994  
##  0      :   216   Mean   :  1.5   k      :    21  
##  B      :    40   3rd Qu.:  0.0   0      :    19  
##  5      :    28   Max.   :990.0   B      :     9  
##  (Other):    84                   (Other):     9
str(df)
## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
names(df)
## [1] "EVTYPE"     "FATALITIES" "INJURIES"   "PROPDMG"    "PROPDMGEXP"
## [6] "CROPDMG"    "CROPDMGEXP"

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to popuulation health?

average fatalities for different event types:

event_fatality_mean <- tapply(df$FATALITIES, df$EVTYPE, mean, na.rm = TRUE)
sorted_event_fatality_mean <- sort(event_fatality_mean, decreasing = TRUE)
sorted_event_fatality_mean[1:10]
## TORNADOES, TSTM WIND, HAIL              COLD AND SNOW 
##                     25.000                     14.000 
##      TROPICAL STORM GORDON      RECORD/EXCESSIVE HEAT 
##                      8.000                      5.667 
##               EXTREME HEAT          HEAT WAVE DROUGHT 
##                      4.364                      4.000 
##             HIGH WIND/SEAS              MARINE MISHAP 
##                      4.000                      3.500 
##              WINTER STORMS        Heavy surf and wind 
##                      3.333                      3.000

So, the top 10 event types that cause the highest averge fatalities are TORNADOES, TSTM WIND, HAIL, COLD AND SNOW, TROPICAL STORM GORDON, RECORD/EXCESSIVE HEAT, EXTREME HEAT, HEAT WAVE DROUGHT, HIGH WIND/SEAS, MARINE MISHAP, WINTER STORMS, Heavy surf and wind, with average fatalities 25, 14, 8, 5.67, 4.36, 4, 4, 3.5, 3.33, 3, respectively.

median fatalities for different event types:

event_fatality_median <- tapply(df$FATALITIES, df$EVTYPE, median, na.rm = TRUE)
sorted_event_fatality_median <- sort(event_fatality_median, decreasing = TRUE)
sorted_event_fatality_median[1:10]
## TORNADOES, TSTM WIND, HAIL              COLD AND SNOW 
##                       25.0                       14.0 
##      TROPICAL STORM GORDON          HEAT WAVE DROUGHT 
##                        8.0                        4.0 
##             HIGH WIND/SEAS              MARINE MISHAP 
##                        4.0                        3.5 
##        Heavy surf and wind         HIGH WIND AND SEAS 
##                        3.0                        3.0 
##                 HEAT WAVES    RIP CURRENTS/HEAVY SURF 
##                        2.5                        2.5

So, the top 10 event types that cause the highest fatalities (from the aspect of median) are TORNADOES, TSTM WIND, HAIL, COLD AND SNOW, TROPICAL STORM GORDON, HEAT WAVE DROUGHT, HIGH WIND/SEAS, MARINE MISHAP, Heavy surf and wind, HIGH WIND AND SEAS, HEAT WAVES, RIP CURRENTS/HEAVY SURF, with average fatalities 25, 14, 8, 4, 4, 3.5, 3, 3, 2.5, 2.5, respectively.

average injuries for different event types:

event_injuries_mean <- tapply(df$INJURIES, df$EVTYPE, mean, na.rm = TRUE)
sorted_event_injuries_mean <- sort(event_injuries_mean, decreasing = TRUE)
sorted_event_injuries_mean[1:10]
##               Heat Wave   TROPICAL STORM GORDON              WILD FIRES 
##                   70.00                   43.00                   37.50 
##           THUNDERSTORMW      HIGH WIND AND SEAS         SNOW/HIGH WINDS 
##                   27.00                   20.00                   18.00 
##         GLAZE/ICE STORM       HEAT WAVE DROUGHT WINTER STORM HIGH WINDS 
##                   15.00                   15.00                   15.00 
##       HURRICANE/TYPHOON 
##                   14.49

So, the top 10 event types that cause the highest averge injuries are Heat Wave, TROPICAL STORM GORDON, WILD FIRES, THUNDERSTORMW, HIGH WIND AND SEAS, SNOW/HIGH WINDS, GLAZE/ICE STORM, HEAT WAVE DROUGHT, WINTER STORM HIGH WINDS, HURRICANE/TYPHOON, with average injuries 70, 43, 37.5, 27, 20, 18, 15, 15, 15, 14.49, respectively.

median injuries for different event types:

event_injuries_median <- tapply(df$INJURIES, df$EVTYPE, median, na.rm = TRUE)
sorted_event_injuries_median <- sort(event_injuries_median, decreasing = TRUE)
sorted_event_injuries_median[1:10]
##               Heat Wave   TROPICAL STORM GORDON           THUNDERSTORMW 
##                      70                      43                      27 
##      HIGH WIND AND SEAS         SNOW/HIGH WINDS         GLAZE/ICE STORM 
##                      20                      18                      15 
##       HEAT WAVE DROUGHT WINTER STORM HIGH WINDS  NON-SEVERE WIND DAMAGE 
##                      15                      15                       7 
##              TORNADO F2 
##                       4

So, the top 10 event types that cause the highest averge injuries are Heat Wave, TROPICAL STORM GORDON, THUNDERSTORMW, HIGH WIND AND SEAS, SNOW/HIGH WINDS, GLAZE/ICE STORM, HEAT WAVE DROUGHT, WINTER STORM HIGH WINDS, NON-SEVERE WIND DAMAGE, TORNADO F2, with average injuries 70, 43, 27, 20, 18, 15, 15, 15, 7, 4, respectively.

intersect(names(sorted_event_fatality_median[1:10]), names(sorted_event_fatality_mean[1:10]))
## [1] "TORNADOES, TSTM WIND, HAIL" "COLD AND SNOW"             
## [3] "TROPICAL STORM GORDON"      "HEAT WAVE DROUGHT"         
## [5] "HIGH WIND/SEAS"             "MARINE MISHAP"             
## [7] "Heavy surf and wind"
intersect(names(sorted_event_injuries_median[1:10]), names(sorted_event_injuries_mean[1:10]))
## [1] "Heat Wave"               "TROPICAL STORM GORDON"  
## [3] "THUNDERSTORMW"           "HIGH WIND AND SEAS"     
## [5] "SNOW/HIGH WINDS"         "GLAZE/ICE STORM"        
## [7] "HEAT WAVE DROUGHT"       "WINTER STORM HIGH WINDS"
intersect(names(sorted_event_fatality_mean[1:10]), names(sorted_event_injuries_mean[1:10]))
## [1] "TROPICAL STORM GORDON" "HEAT WAVE DROUGHT"
intersect(names(sorted_event_fatality_median[1:10]), names(sorted_event_injuries_median[1:10]))
## [1] "TROPICAL STORM GORDON" "HEAT WAVE DROUGHT"     "HIGH WIND AND SEAS"

*From the analysis above, and Figure 1(sub.figure 1 and 2), for either fatality or injury data, the results from using mean doesn't differ much from using median (intersection = 7 or 8 out of 10). But for either mean or median, the results from using fatality data differ very much( intersection = 2 or 3 out of 10) *Therefore, a weighted data using both fatality and injury data is used below (1 on fatality, 0.5 on injury):

average and median fatality+injury for different event types:

df$FATA_INJU <- df$FATALITIES + 0.5 * df$INJURIES
event_fatality_injury_median <- tapply(df$FATA_INJU, df$EVTYPE, median, na.rm = TRUE)
event_fatality_injury_mean <- tapply(df$FATA_INJU, df$EVTYPE, mean, na.rm = TRUE)
sorted_event_fatality_injury_median <- sort(event_fatality_injury_median, decreasing = TRUE)
sorted_event_fatality_injury_mean <- sort(event_fatality_injury_mean, decreasing = TRUE)
sorted_event_fatality_injury_median[1:10]
##                  Heat Wave      TROPICAL STORM GORDON 
##                       35.0                       29.5 
## TORNADOES, TSTM WIND, HAIL              COLD AND SNOW 
##                       25.0                       14.0 
##              THUNDERSTORMW         HIGH WIND AND SEAS 
##                       13.5                       13.0 
##          HEAT WAVE DROUGHT            SNOW/HIGH WINDS 
##                       11.5                        9.0 
##    WINTER STORM HIGH WINDS            GLAZE/ICE STORM 
##                        8.5                        7.5
sorted_event_fatality_injury_mean[1:10]
##                  Heat Wave      TROPICAL STORM GORDON 
##                       35.0                       29.5 
## TORNADOES, TSTM WIND, HAIL                 WILD FIRES 
##                       25.0                       19.5 
##              COLD AND SNOW              THUNDERSTORMW 
##                       14.0                       13.5 
##         HIGH WIND AND SEAS          HEAT WAVE DROUGHT 
##                       13.0                       11.5 
##            SNOW/HIGH WINDS    WINTER STORM HIGH WINDS 
##                        9.0                        8.5
x <- intersect(names(sorted_event_fatality_injury_median[1:10]), names(sorted_event_fatality_injury_mean[1:10]))

So, the top 10 event types that cause the highest fatalities and injuries (from the aspect of median) are Heat Wave, TROPICAL STORM GORDON, TORNADOES, TSTM WIND, HAIL, COLD AND SNOW, THUNDERSTORMW, HIGH WIND AND SEAS, HEAT WAVE DROUGHT, SNOW/HIGH WINDS, WINTER STORM HIGH WINDS


#### Here is the scatterplot

```r
par(mfrow = c(3, 1))
x <- 1:length(event_injuries_median)
plot(x, event_fatality_median, type = "l", col = "green", lwd = 2, xlab = "", 
    ylab = "fatalities", cex.axis = 1.5, cex.lab = 1.5)
points(x, event_fatality_mean, lty = 3, type = "l", col = "red", lwd = 2)
legend("topleft", legend = c("mean", "median"), lty = c(3, 1), col = c("red", 
    "green"), lwd = 3, cex = 1.5)
title("Figure 1. number of fatalities and/or injuries under different event types", 
    cex.main = 2)

plot(x, event_injuries_median, type = "l", col = "green", lwd = 2, xlab = "", 
    ylab = "injuries", cex.axis = 1.5, cex.lab = 1.5)
points(x, event_injuries_mean, lty = 3, type = "l", col = "red", lwd = 2)
legend("topleft", legend = c("mean", "median"), lty = c(3, 1), col = c("red", 
    "green"), lwd = 3, cex = 1.5)

plot(x, event_fatality_injury_median, type = "l", col = "green", lwd = 2, xlab = "event type", 
    ylab = "fatalities and injuries", cex.axis = 1.5, cex.lab = 1.5)
points(x, event_fatality_injury_mean, lty = 3, type = "l", col = "red", lwd = 2)
legend("topleft", legend = c("mean", "median"), lty = c(3, 1), col = c("red", 
    "green"), lwd = 3, cex = 1.5)

plot of chunk scatterplot


Across the United States, which types of events have the greatest economic consequences?

df$EconDMG <- df$PROPDMG + df$CROPDMG
event_EconDMG_mean <- tapply(df$EconDMG, df$EVTYPE, mean, na.rm = TRUE)
event_EconDMG_median <- tapply(df$EconDMG, df$EVTYPE, median, na.rm = TRUE)
sorted_event_EconDMG_mean <- sort(event_EconDMG_mean, decreasing = TRUE)
sorted_event_EconDMG_median <- sort(event_EconDMG_median, decreasing = TRUE)
intersect(names(sorted_event_EconDMG_mean[1:10]), names(sorted_event_EconDMG_median[1:10]))
##  [1] "TROPICAL STORM GORDON"  "COASTAL EROSION"       
##  [3] "HEAVY RAIN AND FLOOD"   "RIVER AND STREAM FLOOD"
##  [5] "Landslump"              "DUST STORM/HIGH WINDS" 
##  [7] "HIGH WINDS/COLD"        "FOREST FIRES"          
##  [9] "BLIZZARD/WINTER STORM"  "FLASH FLOOD/"


x <- 1:length(event_EconDMG_median)
plot(x, event_EconDMG_median, type = "l", col = "green", lwd = 2, xlab = "", 
    ylab = "economic damage", cex.axis = 1.5, cex.lab = 1.5)
points(x, event_EconDMG_mean, lty = 3, type = "l", col = "red", lwd = 2)
legend("topleft", legend = c("mean", "median"), lty = c(3, 1), col = c("red", 
    "green"), lwd = 3, cex = 1.5)
title("Figure 2. number of economic damages under different event types", cex.main = 1)

plot of chunk scatterplot2

Therefore, the results from mean and median are almost the same, and the top 10 events that cause the most serious economic damage are TROPICAL STORM GORDON, COASTAL EROSION, HEAVY RAIN AND FLOOD, RIVER AND STREAM FLOOD, Landslump, DUST STORM/HIGH WINDS, HIGH WINDS/COLD, FOREST FIRES, BLIZZARD/WINTER STORM, FLASH FLOOD/

Results

So from the analysis above, the top 10 event types that lead to the most population health damage are heat wave, tropical storm gordon, tornadoes, tstm wind, hail, cold and snow, thunderstormw, high wind and seas, heat wave drought, snow/high winds, winter storm high winds. The top 10 event types that lead to the most economic damage are tropical storm gordon, coastal erosion, heavy rain and flood, river and stream flood, landslump, dust storm/high winds, high winds/cold, forest fires, blizzard/winter storm, flash flood.