Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data processing

Load the packages that will be used throughout the study

library(plyr)
library(ggplot2)

Load the data from the Coursera website; download it into your working directory, unzip the file, and import the dataset into R.

path <- getwd()
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, file.path(path, "repdata-data-StormData.csv.bz2"))
Storm_Data <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))

Take a first glance at the data

names(Storm_Data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Results

Find the deadliest weather events

## Find the deadliest weather event
fatalityData <- aggregate(FATALITIES ~ EVTYPE, data = Storm_Data, FUN ="sum")
most_fatal <- which.max(fatalityData$FATALITIES)

fatalityData[most_fatal,]
##      EVTYPE FATALITIES
## 834 TORNADO       5633
## Find the 10 deadliest events
fatalityData <- arrange(fatalityData, desc(fatalityData[,2]))
top10fatData <- fatalityData[1:10,]
top10fatData
##            EVTYPE FATALITIES
## 1         TORNADO       5633
## 2  EXCESSIVE HEAT       1903
## 3     FLASH FLOOD        978
## 4            HEAT        937
## 5       LIGHTNING        816
## 6       TSTM WIND        504
## 7           FLOOD        470
## 8     RIP CURRENT        368
## 9       HIGH WIND        248
## 10      AVALANCHE        224

Plot the 10 deadliest weather events

ggplot(top10fatData, aes(x = reorder(EVTYPE, -FATALITIES), y = FATALITIES)) + 
  geom_bar(stat = "identity") + 
  ggtitle("10 deadliest events") + 
  xlab("Event Type") +
  theme(axis.text.x = element_text(angle = 45, hjust =1))

Find the weather events that lead to the most injured people

## Find the event that leads to the most injured people
injuriesData <- aggregate(INJURIES ~ EVTYPE, data=Storm_Data, FUN = "sum")
most_injuries <- which.max(injuriesData$INJURIES)

injuriesData[most_injuries,]
##      EVTYPE INJURIES
## 834 TORNADO    91346
## Find the 10 events that lead to the most injured people
injuriesData <- arrange(injuriesData, desc(injuriesData[,2]))
top10injData <- injuriesData[1:10,]
top10injData
##               EVTYPE INJURIES
## 1            TORNADO    91346
## 2          TSTM WIND     6957
## 3              FLOOD     6789
## 4     EXCESSIVE HEAT     6525
## 5          LIGHTNING     5230
## 6               HEAT     2100
## 7          ICE STORM     1975
## 8        FLASH FLOOD     1777
## 9  THUNDERSTORM WIND     1488
## 10              HAIL     1361

Plot the 10 events that lead to the most injured people

ggplot(top10injData, aes(x = reorder(EVTYPE, -INJURIES), y = INJURIES)) + 
  geom_bar(stat = "identity") + 
  ggtitle("10 events leading to the most injuries") + 
  xlab("Event Type") +
  theme(axis.text.x = element_text(angle = 45, hjust =1))

Find the weather events that have the biggest negative impact on the economy (cost the most).Impact on the economy is characterized in this study by PROPDMG (the property damages) and CROPDMG (crop damages).

EconomicImpactData <- aggregate(PROPDMG + CROPDMG ~ EVTYPE, data = Storm_Data, FUN=sum)
names(EconomicImpactData) <- c("EVTYPE", "Damages")

EcoImpact <- arrange(EconomicImpactData, desc(EconomicImpactData[, 2]))
top10EcoData <- EcoImpact[1:10,]
top10EcoData
##                EVTYPE   Damages
## 1             TORNADO 3312276.7
## 2         FLASH FLOOD 1599325.1
## 3           TSTM WIND 1445168.2
## 4                HAIL 1268289.7
## 5               FLOOD 1067976.4
## 6   THUNDERSTORM WIND  943635.6
## 7           LIGHTNING  606932.4
## 8  THUNDERSTORM WINDS  464978.1
## 9           HIGH WIND  342014.8
## 10       WINTER STORM  134699.6

Plot the 10 weather events the have the biggest negative impact on the economy (cost the most).

ggplot(top10EcoData, aes(x = reorder(EVTYPE, -Damages), y = Damages)) + 
  geom_bar(stat = "identity") + 
  ggtitle("10 events having the greatest economic damages") + 
  xlab("Event Type") +
  theme(axis.text.x = element_text(angle = 90))

Conclusions

This study seems to show that tornadoes are the deadliest weather events in the US throughout the years studied in this study (1950 to 2011); tornadoes are also the weather events that led to the most injured people in the US throughout the study. Unsurprisingly, tornadoes also appear to be responsible for the highest negative impact in the US.

Excessive heat and flash flood follow tornadoes in the number of fatalities, with about 2000 and 1000 respectively for the studied period.

Wind and flood follow tornadoes in the number of injuries.

As for the economic impact of the weather events, flash flood and wind follow tornadoes.

To sum up, it can thus be said that tornadoes should be very carefully monitored to avoid both the human and economic cost (though the property damages and crop damages might not be avoidable); (flash) floods and wind should also be carefully watched.