Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

We will find across the United States :

In summary we can say that the Tornado is most harmful event causing 5.633 deaths and 91.346 injuries. In terms of economical consequences, flood has been the responsible of most of the properties losses, while drought has been the reponsible of most of the crops losses.

Data Processing

Getting Data

The data for this analysis can be download from web site Storm Data

There is also some documentation of the database available. Here we can find how some of the variables are constructed/defined.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Loading Data

First, we download the data file from the site Storm Data and unzip it.

if (!file.exists("StormData.csv.bz2")) {
    fileURL <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
    download.file(fileURL, destfile='StormData.csv.bz2')
}

Then, we read the generated csv file .

stormData <- read.csv(bzfile('StormData.csv.bz2'),header=TRUE, stringsAsFactors = FALSE)

Data Processing

The file contains 902297 rows and 37 columns. The names of the columns are:

names(stormData)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

We can create a subset with the informations relevant for this analysis. So a subset containing weather event, health and economic impact data is created. These columns are:

  • BGN_DATE: Date.
  • EVTYPE: Type of weather event.
  • FATALITIES: Number of fatalities.
  • INJURIES: Number of injuries.
  • PROPDMG: Amount of property damage.
  • PROPDMGEXP: Exponential of property damage.
  • CROPDMG: Amount of crop damage.
  • CROPDMGEXP:Exponential of crop damage.
col<-c("BGN_DATE","EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
subsetData <-stormData[col]
even<-unique(subsetData$EVTYPE)

length(even)
## [1] 985

There are 985 different types of events. We will analyse events which impact more.

PROPDMGEXP and CROPDMGEXP are character type. Before we start our analysis we need to convert the field PROPDMGEXP and CROPDMGEXP (character field) to Numeric field. We create two new numeric columns. Note: Hundred (H), Thousand (K), Million (M) and Billion (B)

unique(subsetData$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
## Funtion to convert PROPDMGEXP as a number
library(plyr)
subsetData$PROPex<-subsetData$PROPDMGEXP
subsetData$PROPex <- revalue(subsetData$PROPex, c("K"="3", "M"="6","m"="6","B"="9","+"="0","h"="2","H"="2","-"="0","?"="0"))

subsetData$PROPex[subsetData$PROPex==""]<-"0"
subsetData$PROPex<-as.numeric(subsetData$PROPex)

unique(subsetData$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"
## Funtion to convert PROPDMGEXP as a number
subsetData$CROPex<-subsetData$CROPDMGEXP
subsetData$CROPex <- revalue(subsetData$CROPex, c("K"="3","k"="3", "M"="6","m"="6","B"="9","?"="0"))
subsetData$CROPex[subsetData$CROPex==""]<-"0"
subsetData$CROPex<-as.numeric(subsetData$CROPex)

We calculate the Total property damage and the Total Crop damage

subsetData$TOTALPROPDMG <- subsetData$PROPDMG * (10^subsetData$PROPex)
subsetData$TOTALCROPDMG <- subsetData$CROPDMG * (10^subsetData$CROPex)

Results

1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

In this section we analyse about the top 10 of weather events that caused the most fatalities and injuries.

Fatalities

agrfatalities<-aggregate(FATALITIES~EVTYPE, data = subsetData, "sum")
fatalities<-agrfatalities[order(-agrfatalities$FATALITIES), ][1:10, ]
fatalities
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224

Injuries

agrinjuries<-aggregate(INJURIES~EVTYPE, data = subsetData, "sum")
injuries<-agrinjuries[order(-agrinjuries$INJURIES), ][1:10, ]
injuries
##                EVTYPE INJURIES
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361

Lets analyse graphically,

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.3
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.2.3
# First plot
plotfatalities <-qplot(EVTYPE, data = fatalities, weight = FATALITIES, geom = "bar", fill = I("cyan")) + 
    xlim("TORNADO","EXCESSIVE HEAT","FLASH FLOOD","HEAT","LIGHTNING", "TSTM WIND","FLOOD","RIP CURRENT","HIGH WIND","AVALANCHE")+
    scale_y_continuous("Number of Fatalities") +  theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("Weather Type") + 
    ggtitle("The Top 10 Weather Events\n by fatalities")

# Second plot
plotinjuries<-qplot(EVTYPE, data = injuries, weight = INJURIES, geom = "bar", fill = I("violet")) +   xlim("TORNADO","TSTM WIND","FLOOD","EXCESSIVE HEAT","LIGHTNING", "HEAT","ICE STORM","FLASH FLOOD","THUNDERSTORM WIND","HAIL")+
    scale_y_continuous("Number of injuries") + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("Weather Type") + 
    ggtitle("The Top 10 Weather Events\n by injuries")

grid.arrange(plotfatalities, plotinjuries, ncol = 2)

Tornado is the weather event that produce more fatalities and injuries.

2. Across the United States, which types of events have the greatest economic consequences?

In this section we analyse about the top 10 of weather events that have the greastest economic impact in properties and crop.

Properties Damage

agrproperties<-aggregate(TOTALPROPDMG~EVTYPE, data = subsetData, "sum")
properties<-agrproperties[order(-agrproperties$TOTALPROPDMG), ][1:10, ]
properties
##                EVTYPE TOTALPROPDMG
## 170             FLOOD 144657709807
## 411 HURRICANE/TYPHOON  69305840000
## 834           TORNADO  56947380677
## 670       STORM SURGE  43323536000
## 153       FLASH FLOOD  16822673979
## 244              HAIL  15735267513
## 402         HURRICANE  11868319010
## 848    TROPICAL STORM   7703890550
## 972      WINTER STORM   6688497251
## 359         HIGH WIND   5270046295

Crop Damage

agrcrop<-aggregate(TOTALCROPDMG~EVTYPE, data = subsetData, "sum")
crop<-agrcrop[order(-agrcrop$TOTALCROPDMG), ][1:10, ]
crop
##                EVTYPE TOTALCROPDMG
## 95            DROUGHT  13972566000
## 170             FLOOD   5661968450
## 590       RIVER FLOOD   5029459000
## 427         ICE STORM   5022113500
## 244              HAIL   3025954473
## 402         HURRICANE   2741910000
## 411 HURRICANE/TYPHOON   2607872800
## 153       FLASH FLOOD   1421317100
## 140      EXTREME COLD   1292973000
## 212      FROST/FREEZE   1094086000
# First plot
plotproperties<-qplot(EVTYPE, data = properties, weight = TOTALPROPDMG, geom = "bar", fill = I("cyan")) + 
    xlim("FLOOD","HURRICANE/TYPHOON","TORNADO","STORM SURGE","FLASH FLOOD","HAIL","HURRICANE","TROPICAL STORM","WINTER STORM","HIGH WIND")+
    scale_y_continuous("Prop. Damage") + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("Weather Type") + 
    ggtitle("The Top 10 Weather Events\n with Highest Property Damage")


# Second plot
plotcrop<-qplot(EVTYPE, data = crop, weight = TOTALCROPDMG, geom = "bar", fill = I("violet")) + 
    xlim("DROUGHT","FLOOD","RIVER FLOOD","ICE STORM","HAIL","HURRICANE","HURRICANE/TYPHOON","FLASH FLOOD","EXTREME COLD","FROST/FREEZE")+
    scale_y_continuous("Prop. Crop") + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("Weather Type") + 
    ggtitle("The Top 10 Weather Events\n with Highest Crop Damage")
grid.arrange(plotproperties, plotcrop, ncol = 2)

Flood have maximum impact on Properties and Drought have maximum impact on Crops.