Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

We will be exploring the NOAA Storm Database to answer the following questions related to weather events:

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

During the analysis it was found that the most harmful event is the tornado causing with 5633 deaths and 91346 injuries. In terms of economical loses, floods has been the responsible of most of the properties loses, while drought has been the greater contributor to crops loses.

Data Processing

Getting and loading the data

The NOAA Storm Database to use will be obtained from the following link.

Once the data is downloaded to the destination, it will be extracted. We will keep the date of downloaded for future references.

if (!file.exists("StormData.csv.bz2")) {
    fileURL <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
    download.file(fileURL, destfile='StormData.csv.bz2', method = 'curl')
}
stormData <- read.csv(bzfile('StormData.csv.bz2'),header=TRUE, stringsAsFactors = FALSE)
downloadedData <- date()

We will perform some basic analysis.

summary(stormData)
##     STATE__       BGN_DATE           BGN_TIME          TIME_ZONE        
##  Min.   : 1.0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:19.0   Class :character   Class :character   Class :character  
##  Median :30.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :31.2                                                           
##  3rd Qu.:45.0                                                           
##  Max.   :95.0                                                           
##                                                                         
##      COUNTY     COUNTYNAME           STATE              EVTYPE         
##  Min.   :  0   Length:902297      Length:902297      Length:902297     
##  1st Qu.: 31   Class :character   Class :character   Class :character  
##  Median : 75   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :101                                                           
##  3rd Qu.:131                                                           
##  Max.   :873                                                           
##                                                                        
##    BGN_RANGE      BGN_AZI           BGN_LOCATI          END_DATE        
##  Min.   :   0   Length:902297      Length:902297      Length:902297     
##  1st Qu.:   0   Class :character   Class :character   Class :character  
##  Median :   0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :   1                                                           
##  3rd Qu.:   1                                                           
##  Max.   :3749                                                           
##                                                                         
##    END_TIME           COUNTY_END COUNTYENDN       END_RANGE  
##  Length:902297      Min.   :0    Mode:logical   Min.   :  0  
##  Class :character   1st Qu.:0    NA's:902297    1st Qu.:  0  
##  Mode  :character   Median :0                   Median :  0  
##                     Mean   :0                   Mean   :  1  
##                     3rd Qu.:0                   3rd Qu.:  0  
##                     Max.   :0                   Max.   :925  
##                                                              
##    END_AZI           END_LOCATI            LENGTH           WIDTH     
##  Length:902297      Length:902297      Min.   :   0.0   Min.   :   0  
##  Class :character   Class :character   1st Qu.:   0.0   1st Qu.:   0  
##  Mode  :character   Mode  :character   Median :   0.0   Median :   0  
##                                        Mean   :   0.2   Mean   :   8  
##                                        3rd Qu.:   0.0   3rd Qu.:   0  
##                                        Max.   :2315.0   Max.   :4400  
##                                                                       
##        F               MAG          FATALITIES     INJURIES     
##  Min.   :0        Min.   :    0   Min.   :  0   Min.   :   0.0  
##  1st Qu.:0        1st Qu.:    0   1st Qu.:  0   1st Qu.:   0.0  
##  Median :1        Median :   50   Median :  0   Median :   0.0  
##  Mean   :1        Mean   :   47   Mean   :  0   Mean   :   0.2  
##  3rd Qu.:1        3rd Qu.:   75   3rd Qu.:  0   3rd Qu.:   0.0  
##  Max.   :5        Max.   :22000   Max.   :583   Max.   :1700.0  
##  NA's   :843563                                                 
##     PROPDMG      PROPDMGEXP           CROPDMG       CROPDMGEXP       
##  Min.   :   0   Length:902297      Min.   :  0.0   Length:902297     
##  1st Qu.:   0   Class :character   1st Qu.:  0.0   Class :character  
##  Median :   0   Mode  :character   Median :  0.0   Mode  :character  
##  Mean   :  12                      Mean   :  1.5                     
##  3rd Qu.:   0                      3rd Qu.:  0.0                     
##  Max.   :5000                      Max.   :990.0                     
##                                                                      
##      WFO             STATEOFFIC         ZONENAMES            LATITUDE   
##  Length:902297      Length:902297      Length:902297      Min.   :   0  
##  Class :character   Class :character   Class :character   1st Qu.:2802  
##  Mode  :character   Mode  :character   Mode  :character   Median :3540  
##                                                           Mean   :2875  
##                                                           3rd Qu.:4019  
##                                                           Max.   :9706  
##                                                           NA's   :47    
##    LONGITUDE        LATITUDE_E     LONGITUDE_       REMARKS         
##  Min.   :-14451   Min.   :   0   Min.   :-14455   Length:902297     
##  1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0   Class :character  
##  Median :  8707   Median :   0   Median :     0   Mode  :character  
##  Mean   :  6940   Mean   :1452   Mean   :  3509                     
##  3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735                     
##  Max.   : 17124   Max.   :9706   Max.   :106220                     
##                   NA's   :40                                        
##      REFNUM      
##  Min.   :     1  
##  1st Qu.:225575  
##  Median :451149  
##  Mean   :451149  
##  3rd Qu.:676723  
##  Max.   :902297  
## 
head(stormData)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6
names(stormData)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

From all the columns, only several of them will be revelant to our analysis. These are: * EVTYPE: the type of weather event * FATALITIES: the number of fatalities * INJURIES: the number of injuries * PROPDMG: the amount of property damage (in US dollars) * PROPDMGEXP: a multiplier for PROPDMG * CROPDMG: the amount of crop damage (in US dollars) * CROPDMGEXP: a multiplier for CROPDMG

For more information about the database, there is some documentation available in the following websites: National Weather Service Storm Data Documentation National Climatic Data Center Storm Events FAQ

As explained in the section 2.7 of the manual, the database contains the value of damage using non numeric values. For example, B stands for billions and K for thousands. For this reason we will convert the property damage and crop damage values to numerical. A new variable called TOTALPROPDMG will contain the total damage cost.

stormData$PROPDMGEXP <- as.character(stormData$PROPDMGEXP)
stormData$PROPDMGEXP[toupper(stormData$PROPDMGEXP) == 'H'] <- "2"
stormData$PROPDMGEXP[toupper(stormData$PROPDMGEXP) == 'K'] <- "3"
stormData$PROPDMGEXP[toupper(stormData$PROPDMGEXP) == 'M'] <- "6"
stormData$PROPDMGEXP[toupper(stormData$PROPDMGEXP) == 'B'] <- "9"
stormData$PROPDMGEXP <- as.numeric(stormData$PROPDMGEXP)
stormData$PROPDMGEXP[is.na(stormData$PROPDMGEXP)] <- 0
stormData$TOTALPROPDMG <- stormData$PROPDMG * 10^stormData$PROPDMGEXP

Now for he crop damage values

stormData$CROPDMGEXP <- as.character(stormData$CROPDMGEXP)
stormData$CROPDMGEXP[toupper(stormData$CROPDMGEXP) == 'H'] <- "2"
stormData$CROPDMGEXP[toupper(stormData$CROPDMGEXP) == 'K'] <- "3"
stormData$CROPDMGEXP[toupper(stormData$CROPDMGEXP) == 'M'] <- "6"
stormData$CROPDMGEXP[toupper(stormData$CROPDMGEXP) == 'B'] <- "9"
stormData$CROPDMGEXP <- as.numeric(stormData$CROPDMGEXP)
stormData$CROPDMGEXP[is.na(stormData$CROPDMGEXP)] <- 0
stormData$TOTALCROPDMG <- stormData$CROPDMG * 10^stormData$CROPDMGEXP

Results

1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

To answer this question we will present the results divided by fatalities and injuries. First, we will perform an aggregate function to obtain the sum of all fatalities caused by an specific event. From that, we will order the table in a decreasing order and take the top 10.

sumFatalities <- aggregate(stormData$FATALITIES, by = list(stormData$EVTYPE), "sum")
names(sumFatalities) <- c("Event", "Fatalities")
sumFatalities <- sumFatalities[order(-sumFatalities$Fatalities), ][1:10, ]
sumFatalities
##              Event Fatalities
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224

The same will be done for injuries.

sumInjuries <- aggregate(stormData$INJURIES, by = list(stormData$EVTYPE), "sum")
names(sumInjuries) <- c("Event", "Injuries")
sumInjuries <- sumInjuries[order(-sumInjuries$Injuries), ][1:10, ]
sumInjuries
##                 Event Injuries
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361

We will plot both tables using a boxplot.

par(mfrow = c(1, 2), mar = c(12, 5, 3, 2), mgp = c(3, 1, 0), cex = 0.8, las = 3)
barplot(sumFatalities$Fatalities, names.arg = sumFatalities$Event, col = 'red',
        main = 'Top 10 Weather Events for Fatalities', ylab = 'Number of Fatalities')
barplot(sumInjuries$Injuries, names.arg = sumInjuries$Event, col = 'blue',
        main = 'Top 10 Weather Events for Injuries', ylab = 'Number of Injuries')

plot of chunk unnamed-chunk-7

2. Across the United States, which types of events have the greatest economic consequences?

A similar approach as the first question will be use to solve this problem.

Events that caused the most cost in damage to properties.

sumPropDmg <- aggregate(stormData$TOTALPROPDMG, by = list(stormData$EVTYPE), "sum")
names(sumPropDmg) <- c("Event", "Cost")
sumPropDmg <- sumPropDmg[order(-sumPropDmg$Cost), ][1:10, ]
sumPropDmg
##                 Event      Cost
## 170             FLOOD 1.447e+11
## 411 HURRICANE/TYPHOON 6.931e+10
## 834           TORNADO 5.695e+10
## 670       STORM SURGE 4.332e+10
## 153       FLASH FLOOD 1.682e+10
## 244              HAIL 1.574e+10
## 402         HURRICANE 1.187e+10
## 848    TROPICAL STORM 7.704e+09
## 972      WINTER STORM 6.688e+09
## 359         HIGH WIND 5.270e+09

Events that caused the most cost in damage to crops.

sumCropDmg <- aggregate(stormData$TOTALCROPDMG, by = list(stormData$EVTYPE), "sum")
names(sumCropDmg) <- c("Event", "Cost")
sumCropDmg <- sumCropDmg[order(-sumCropDmg$Cost), ][1:10, ]
sumCropDmg
##                 Event      Cost
## 95            DROUGHT 1.397e+10
## 170             FLOOD 5.662e+09
## 590       RIVER FLOOD 5.029e+09
## 427         ICE STORM 5.022e+09
## 244              HAIL 3.026e+09
## 402         HURRICANE 2.742e+09
## 411 HURRICANE/TYPHOON 2.608e+09
## 153       FLASH FLOOD 1.421e+09
## 140      EXTREME COLD 1.293e+09
## 212      FROST/FREEZE 1.094e+09
library(reshape2)
library(ggplot2)
fatalitiesAndDamage <- merge(x = sumPropDmg, y = sumCropDmg, by = "Event", all = TRUE)
fatalitiesAndDamage <- melt(fatalitiesAndDamage, id.vars = 'Event')
ggplot(fatalitiesAndDamage, aes(Event, value)) +   
        geom_bar(aes(fill = variable), position = "dodge", stat="identity") +
        theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Event Type") + 
        ylab("Damage, USD") + ggtitle("Crop/Property damage  by type")

plot of chunk unnamed-chunk-10