Synopsis

We take data from NOAA weather and make minor changes to the data categories. Then we ordered an subsetted the data to look at the the most destructive weather events by INJURIES to humans; FATALITIES to humans; PROPDMG (Property Damage in billions); and, CROPDMG (Crop Damage in millions). We ordered and plotted the data to deteremine which weather events are most deleterious to human health and which weather events cause the most economic damage.

Reading the data

Read the storm data obtained from NOAA weather. Storm data documentation is also available with an FAQ.

df <- read.csv("repdata-data-StormData.csv")
head(df)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6
rows <- dim(df)

There are 902297 rows of data.

Data Processing

Looking at the EVTYPE we see there are some categories with duplicative names. (omitted due to length) For this analysis, nuances in the naming of EVTYPE categories are preserved as much as possible. Some categories are combined below due to obvious naming errors or duplications.

df$EVTYPE <- tolower(df$EVTYPE) ## removes duplicates, from levels 985 to 898
df$EVTYPE <- gsub(" ", "", df$EVTYPE) ## removes all spaces, from levels 898 to 863
df$EVTYPE <- gsub("/", "", df$EVTYPE) ## removes all "/", from levels 863 to 843
df$EVTYPE <- gsub("thunderstormwinds", "thunderstormwind", df$EVTYPE) ## this category is duplicative
df$EVTYPE <- gsub("tstmwind", "thunderstormwind", df$EVTYPE) ## this category is duplicative
df$EVTYPE <- gsub("highwinds", "highwind", df$EVTYPE) ## this category is duplicative

The first question we are attempting to answer is which events are most harmful to human health. The “INJURIES” and “FATALITIES” columns provide the most useful information on human health. Therefore, we will make dataframes reflecting that data.

We create a dataframe of injuries based on EVTYPE.

injuries <- aggregate(df$INJURIES, list(df$EVTYPE), sum)
names(injuries) <- c("EVENT", "INJURIES")
injuries <- injuries[order(injuries$INJURIES, decreasing = TRUE), ]
injuries <- subset(injuries, injuries$INJURIES > 100)

We create a dataframe of fatalities based on each EVTYPE.

fatalities <- aggregate(df$FATALITIES, list(df$EVTYPE), sum)
names(fatalities) <- c("EVENT", "FATALITIES")
fatalities <- fatalities[order(fatalities$FATALITIES, decreasing = TRUE), ]
fatalities <- subset(fatalities, fatalities$FATALITIES > 20)
fatalities$EVENT <- as.factor(fatalities$EVENT)

We are also answering a question about which EVTYPE has the most economic consequences. We will look at the “PROPDMG” and “CROPDMG”columns to answer this question.

We create a dataframe of propdmg based on EVTYPE.

propdmg <- aggregate(df$PROPDMG, list(df$EVTYPE), sum)
names(propdmg) <- c("EVENT", "PROPDMG")
propdmg <- propdmg[order(propdmg$PROPDMG, decreasing = TRUE), ]
propdmg <- subset(propdmg, propdmg$PROPDMG > 5000)
propdmg$PROPDMG <- propdmg$PROPDMG/1000

We create a dataframe of cropdmg based on each EVTYPE.

cropdmg <- aggregate(df$FATALITIES, list(df$EVTYPE), sum)
names(cropdmg) <- c("EVENT", "CROPDMG")
cropdmg <- cropdmg[order(cropdmg$CROPDMG, decreasing = TRUE), ]
cropdmg <- subset(cropdmg, cropdmg$CROPDMG > 0)
cropdmg$EVENT <- as.factor(cropdmg$EVENT)

Results

The top 25 weather events by injuries.

head(injuries, n=25)
##                EVENT INJURIES
## 713          tornado    91346
## 649 thunderstormwind     9363
## 139            flood     6789
## 104    excessiveheat     6525
## 383        lightning     5230
## 223             heat     2100
## 358         icestorm     1975
## 125       flashflood     1777
## 295         highwind     1439
## 193             hail     1361
## 816      winterstorm     1321
## 342 hurricanetyphoon     1275
## 253        heavysnow     1021
## 801         wildfire      911
## 20          blizzard      805
## 153              fog      734
## 803   wildforestfire      545
## 94         duststorm      440
## 819    winterweather      398
## 226         heatwave      379
## 67          densefog      342
## 727    tropicalstorm      340
## 491      ripcurrents      297
## 565       strongwind      280
## 233        heavyrain      251
library(ggplot2)
injuries$EVENT <- reorder(injuries$EVENT,-injuries$INJURIES)
ggplot(injuries,aes(EVENT,INJURIES))+
    geom_bar(stat="identity")+
    theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
        xlab("Weather Event") + ylab("Number of Injuries") + ggtitle("Figure 1: Injuries by Weather Event (all years)")

The top 25 most fatal weather events .

head(fatalities, n=25)
##                    EVENT FATALITIES
## 713              tornado       5633
## 104        excessiveheat       1903
## 125           flashflood        978
## 223                 heat        937
## 383            lightning        816
## 649     thunderstormwind        701
## 139                flood        470
## 490           ripcurrent        368
## 295             highwind        283
## 11             avalanche        224
## 816          winterstorm        206
## 491          ripcurrents        204
## 226             heatwave        172
## 113          extremecold        162
## 253            heavysnow        127
## 114 extremecoldwindchill        125
## 286             highsurf        104
## 565           strongwind        103
## 20              blizzard        101
## 233            heavyrain         98
## 115          extremeheat         96
## 57         coldwindchill         95
## 358             icestorm         89
## 801             wildfire         75
## 342     hurricanetyphoon         64
fatalities$EVENT<-reorder(fatalities$EVENT,-fatalities$FATALITIES)
ggplot(fatalities,aes(EVENT,FATALITIES))+
    geom_bar(stat="identity")+
    theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
        xlab("Weather Event") + ylab("Number of Fatalities") + ggtitle("Figure 2: Fatalities by Weather Event (all years)")

The top 25 weather events by propdmg.

head(propdmg, n=25)
##                 EVENT    PROPDMG
## 713           tornado 3212.25816
## 649  thunderstormwind 2659.26646
## 125        flashflood 1420.67459
## 139             flood  899.93848
## 193              hail  688.69338
## 383         lightning  603.35178
## 295          highwind  380.40956
## 816       winterstorm  132.72059
## 253         heavysnow  122.25199
## 801          wildfire   84.45934
## 358          icestorm   66.00067
## 565        strongwind   63.01181
## 233         heavyrain   50.84214
## 727     tropicalstorm   48.42368
## 803    wildforestfire   39.34495
## 130     flashflooding   28.49715
## 772 urbansmlstreamfld   26.05194
## 20           blizzard   25.31848
## 560        stormsurge   19.39349
## 141   floodflashflood   19.16825
## 369         landslide   18.96194
## 333         hurricane   15.51368
## 366   lake-effectsnow   14.14100
## 494        riverflood   13.85570
## 40       coastalflood   13.53684

For our final plot we will look at PROPDMG. This will give us the most general indication of economic damamge.

propdmg$EVENT <- reorder(propdmg$EVENT,-propdmg$PROPDMG)
ggplot(propdmg,aes(EVENT, PROPDMG))+
    geom_bar(stat="identity")+
    theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
        xlab("Weather Event") + ylab("Damage in Billions of US Dollars") + ggtitle("Figure 3: Property Damage by Weather Event (all years)")

The top 25 weather events by cropdmg.

head(cropdmg, n = 25)
##                    EVENT CROPDMG
## 713              tornado    5633
## 104        excessiveheat    1903
## 125           flashflood     978
## 223                 heat     937
## 383            lightning     816
## 649     thunderstormwind     701
## 139                flood     470
## 490           ripcurrent     368
## 295             highwind     283
## 11             avalanche     224
## 816          winterstorm     206
## 491          ripcurrents     204
## 226             heatwave     172
## 113          extremecold     162
## 253            heavysnow     127
## 114 extremecoldwindchill     125
## 286             highsurf     104
## 565           strongwind     103
## 20              blizzard     101
## 233            heavyrain      98
## 115          extremeheat      96
## 57         coldwindchill      95
## 358             icestorm      89
## 801             wildfire      75
## 342     hurricanetyphoon      64

Summary of Results

From the graphs and data we can see that tornados are by far the most deleterious to human health. Tornados and Thunderstorm wind are the most damaging to property. Tornados and excessive heat are the most damaging to crops. From these data and results, tornados are the most damging to human health and the economy.