We take data from NOAA weather and make minor changes to the data categories. Then we ordered an subsetted the data to look at the the most destructive weather events by INJURIES to humans; FATALITIES to humans; PROPDMG (Property Damage in billions); and, CROPDMG (Crop Damage in millions). We ordered and plotted the data to deteremine which weather events are most deleterious to human health and which weather events cause the most economic damage.
Read the storm data obtained from NOAA weather. Storm data documentation is also available with an FAQ.
df <- read.csv("repdata-data-StormData.csv")
head(df)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
rows <- dim(df)
There are 902297 rows of data.
Looking at the EVTYPE we see there are some categories with duplicative names. (omitted due to length) For this analysis, nuances in the naming of EVTYPE categories are preserved as much as possible. Some categories are combined below due to obvious naming errors or duplications.
df$EVTYPE <- tolower(df$EVTYPE) ## removes duplicates, from levels 985 to 898
df$EVTYPE <- gsub(" ", "", df$EVTYPE) ## removes all spaces, from levels 898 to 863
df$EVTYPE <- gsub("/", "", df$EVTYPE) ## removes all "/", from levels 863 to 843
df$EVTYPE <- gsub("thunderstormwinds", "thunderstormwind", df$EVTYPE) ## this category is duplicative
df$EVTYPE <- gsub("tstmwind", "thunderstormwind", df$EVTYPE) ## this category is duplicative
df$EVTYPE <- gsub("highwinds", "highwind", df$EVTYPE) ## this category is duplicative
The first question we are attempting to answer is which events are most harmful to human health. The “INJURIES” and “FATALITIES” columns provide the most useful information on human health. Therefore, we will make dataframes reflecting that data.
We create a dataframe of injuries based on EVTYPE.
injuries <- aggregate(df$INJURIES, list(df$EVTYPE), sum)
names(injuries) <- c("EVENT", "INJURIES")
injuries <- injuries[order(injuries$INJURIES, decreasing = TRUE), ]
injuries <- subset(injuries, injuries$INJURIES > 100)
We create a dataframe of fatalities based on each EVTYPE.
fatalities <- aggregate(df$FATALITIES, list(df$EVTYPE), sum)
names(fatalities) <- c("EVENT", "FATALITIES")
fatalities <- fatalities[order(fatalities$FATALITIES, decreasing = TRUE), ]
fatalities <- subset(fatalities, fatalities$FATALITIES > 20)
fatalities$EVENT <- as.factor(fatalities$EVENT)
We are also answering a question about which EVTYPE has the most economic consequences. We will look at the “PROPDMG” and “CROPDMG”columns to answer this question.
We create a dataframe of propdmg based on EVTYPE.
propdmg <- aggregate(df$PROPDMG, list(df$EVTYPE), sum)
names(propdmg) <- c("EVENT", "PROPDMG")
propdmg <- propdmg[order(propdmg$PROPDMG, decreasing = TRUE), ]
propdmg <- subset(propdmg, propdmg$PROPDMG > 5000)
propdmg$PROPDMG <- propdmg$PROPDMG/1000
We create a dataframe of cropdmg based on each EVTYPE.
cropdmg <- aggregate(df$FATALITIES, list(df$EVTYPE), sum)
names(cropdmg) <- c("EVENT", "CROPDMG")
cropdmg <- cropdmg[order(cropdmg$CROPDMG, decreasing = TRUE), ]
cropdmg <- subset(cropdmg, cropdmg$CROPDMG > 0)
cropdmg$EVENT <- as.factor(cropdmg$EVENT)
The top 25 weather events by injuries.
head(injuries, n=25)
## EVENT INJURIES
## 713 tornado 91346
## 649 thunderstormwind 9363
## 139 flood 6789
## 104 excessiveheat 6525
## 383 lightning 5230
## 223 heat 2100
## 358 icestorm 1975
## 125 flashflood 1777
## 295 highwind 1439
## 193 hail 1361
## 816 winterstorm 1321
## 342 hurricanetyphoon 1275
## 253 heavysnow 1021
## 801 wildfire 911
## 20 blizzard 805
## 153 fog 734
## 803 wildforestfire 545
## 94 duststorm 440
## 819 winterweather 398
## 226 heatwave 379
## 67 densefog 342
## 727 tropicalstorm 340
## 491 ripcurrents 297
## 565 strongwind 280
## 233 heavyrain 251
library(ggplot2)
injuries$EVENT <- reorder(injuries$EVENT,-injuries$INJURIES)
ggplot(injuries,aes(EVENT,INJURIES))+
geom_bar(stat="identity")+
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
xlab("Weather Event") + ylab("Number of Injuries") + ggtitle("Figure 1: Injuries by Weather Event (all years)")
The top 25 most fatal weather events .
head(fatalities, n=25)
## EVENT FATALITIES
## 713 tornado 5633
## 104 excessiveheat 1903
## 125 flashflood 978
## 223 heat 937
## 383 lightning 816
## 649 thunderstormwind 701
## 139 flood 470
## 490 ripcurrent 368
## 295 highwind 283
## 11 avalanche 224
## 816 winterstorm 206
## 491 ripcurrents 204
## 226 heatwave 172
## 113 extremecold 162
## 253 heavysnow 127
## 114 extremecoldwindchill 125
## 286 highsurf 104
## 565 strongwind 103
## 20 blizzard 101
## 233 heavyrain 98
## 115 extremeheat 96
## 57 coldwindchill 95
## 358 icestorm 89
## 801 wildfire 75
## 342 hurricanetyphoon 64
fatalities$EVENT<-reorder(fatalities$EVENT,-fatalities$FATALITIES)
ggplot(fatalities,aes(EVENT,FATALITIES))+
geom_bar(stat="identity")+
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
xlab("Weather Event") + ylab("Number of Fatalities") + ggtitle("Figure 2: Fatalities by Weather Event (all years)")
The top 25 weather events by propdmg.
head(propdmg, n=25)
## EVENT PROPDMG
## 713 tornado 3212.25816
## 649 thunderstormwind 2659.26646
## 125 flashflood 1420.67459
## 139 flood 899.93848
## 193 hail 688.69338
## 383 lightning 603.35178
## 295 highwind 380.40956
## 816 winterstorm 132.72059
## 253 heavysnow 122.25199
## 801 wildfire 84.45934
## 358 icestorm 66.00067
## 565 strongwind 63.01181
## 233 heavyrain 50.84214
## 727 tropicalstorm 48.42368
## 803 wildforestfire 39.34495
## 130 flashflooding 28.49715
## 772 urbansmlstreamfld 26.05194
## 20 blizzard 25.31848
## 560 stormsurge 19.39349
## 141 floodflashflood 19.16825
## 369 landslide 18.96194
## 333 hurricane 15.51368
## 366 lake-effectsnow 14.14100
## 494 riverflood 13.85570
## 40 coastalflood 13.53684
For our final plot we will look at PROPDMG. This will give us the most general indication of economic damamge.
propdmg$EVENT <- reorder(propdmg$EVENT,-propdmg$PROPDMG)
ggplot(propdmg,aes(EVENT, PROPDMG))+
geom_bar(stat="identity")+
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
xlab("Weather Event") + ylab("Damage in Billions of US Dollars") + ggtitle("Figure 3: Property Damage by Weather Event (all years)")
The top 25 weather events by cropdmg.
head(cropdmg, n = 25)
## EVENT CROPDMG
## 713 tornado 5633
## 104 excessiveheat 1903
## 125 flashflood 978
## 223 heat 937
## 383 lightning 816
## 649 thunderstormwind 701
## 139 flood 470
## 490 ripcurrent 368
## 295 highwind 283
## 11 avalanche 224
## 816 winterstorm 206
## 491 ripcurrents 204
## 226 heatwave 172
## 113 extremecold 162
## 253 heavysnow 127
## 114 extremecoldwindchill 125
## 286 highsurf 104
## 565 strongwind 103
## 20 blizzard 101
## 233 heavyrain 98
## 115 extremeheat 96
## 57 coldwindchill 95
## 358 icestorm 89
## 801 wildfire 75
## 342 hurricanetyphoon 64
From the graphs and data we can see that tornados are by far the most deleterious to human health. Tornados and Thunderstorm wind are the most damaging to property. Tornados and excessive heat are the most damaging to crops. From these data and results, tornados are the most damging to human health and the economy.