Synopsys

This report is made to make some analysis on the U.S. National Oceanic and Atmospheric Administration’s (NOAA) data about climate disasters. We were able to check the most harmful disasters to population and the most economic harm by event type. The most harmful event is tornado, while the one that causes the most cost is flood.

Load libraries

library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.6.1 (2014-01-04) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.18.0 (2014-02-22) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## 
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## 
## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save
## 
## R.utils v1.33.0 (2014-08-24) successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## 
## The following object is masked from 'package:utils':
## 
##     timestamp
## 
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, parse, warnings
library(ggplot2)
library(scales)
library(reshape2)

Data Processing

First we download the data file, unzip and read it

url <- 'http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
file <- 'storm-data.csv.bz2'

if(!file.exists(file)) {
    download.file(url, file)
    bunzip2(file, overwrite=T, remove=F)
}

if(!'originalData' %in% ls()) {
    originalData <- read.csv(sub('.bz2', '', file))
}

data <- originalData

Than some briefly look

dim(data)
## [1] 902297     37
head(data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

We have 902297 lines in 37 rows.

Clean to just usefull data and remove NAs. We got rid of unecessary fields to this specific analisys to make the data set lighter.

data$date <- as.Date(data$BGN_DATE, format = "%m/%d/%Y %H:%M:%S")
usefullFileds <- c('date', 'EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG',
                   'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')
data <- data[usefullFileds]
data <- data[complete.cases(data), ]

The data goes from 1950-01-03 to 2011-11-30.

The damages variables are splitted in two fields, one with meaningful characters and the other with base 10 exponent as a character. Here’s a function to return the correct multiplier according to the character

getMultiplier <- function(char) {
    exp <- if(is.null(char)) { 0 } 
    else if(char == 'B') { 9 }
    else if(char == 'M') { 6 }
    else if(char == 'K') { 3 }
    else if(char == 'H') { 2 }
    else { 0 }
    10^exp
}

Applying the getMultiplier function to calculate all damages and totaling it in a new column.

data$PROPDMG <- data$PROPDMG * sapply(data$PROPDMGEXP, getMultiplier)
data$CROPDMG <- data$CROPDMG * sapply(data$CROPDMGEXP, getMultiplier)
data$TOTALDMG <- data$PROPDMG + data$CROPDMG

Aggregating data to have the top events of each kind (fatalities, injuries, total damages).

data.sum <- aggregate(cbind(FATALITIES, INJURIES, PROPDMG, CROPDMG, TOTALDMG) ~ EVTYPE,
                      data = data, FUN = sum)

fatalities <- data.sum[order(data.sum$FATALITIES, decreasing=T)[1:10],
                       which(names(data.sum) %in% c('EVTYPE', 'FATALITIES'))]
injuries <- data.sum[order(data.sum$INJURIES, decreasing=T)[1:10],
                     which(names(data.sum) %in% c('EVTYPE', 'INJURIES'))]
damages <- data.sum[order(data.sum$TOTALDMG, decreasing=T)[1:10],
                     which(names(data.sum) %in% c('EVTYPE', 'PROPDMG', 'CROPDMG', 'TOTALDMG'))]

Results

Fatalities

fatalities$EVTYPE <- factor(fatalities$EVTYPE, levels=fatalities$EVTYPE[order(fatalities$FATALITIES, decreasing=T)])
ggplot(fatalities, aes(EVTYPE, FATALITIES)) + geom_bar(stat='identity') +
    xlab("Weather Event") +
    ylab("Fatalities") +
    ggtitle(paste('Total Fatalities by Weather Events in the U.S.\n from ',
                    format(min(data$date), "%Y"), ' - ', format(max(data$date), "%Y"), sep=''))

fatalities
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224

The most harmfull fatalities concearning the human health is tornados. It accounts to around 37% of total fatalities.

Injuries

injuries$EVTYPE <- factor(injuries$EVTYPE, levels=injuries$EVTYPE[order(injuries$INJURIES, decreasing=T)])
ggplot(injuries, aes(EVTYPE, INJURIES)) + geom_bar(stat='identity') +
    xlab("Weather Event") +
    ylab("Injuries") +
    ggtitle(paste('Total Injuries by Weather Events in the U.S.\n from ',
                    format(min(data$date), "%Y"), ' - ', format(max(data$date), "%Y"), sep=''))

injuries
##                EVTYPE INJURIES
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361

The most harmfull injuries concearning the human health is also tornados. It accounts to around 65% of total injuries.

Damages

damagesMelt <- damages[1:3]
names(damagesMelt) <- c('EVTYPE', 'Propriety damages', 'Crop damages')
damagesMelt$EVTYPE <- factor(damages$EVTYPE, levels=damages$EVTYPE[order(damages$TOTALDMG, decreasing=T)])
damagesMelt <- melt(damagesMelt, by= 'EVTYPE')
## Using EVTYPE as id variables
ggplot(damagesMelt, aes(EVTYPE, value, fill=variable)) + geom_bar(stat='identity') +
    xlab("Weather Event") +
    scale_y_continuous('Damages in dollars') +
    ggtitle(paste('Damages by Weather Events in the U.S.\n from ',
                    format(min(data$date), "%Y"), ' - ', format(max(data$date), "%Y"), sep=''))

damages
##                EVTYPE      PROPDMG     CROPDMG     TOTALDMG
## 170             FLOOD 144657709807  5661968450 150319678257
## 411 HURRICANE/TYPHOON  69305840000  2607872800  71913712800
## 834           TORNADO  56925660790   414953270  57340614060
## 670       STORM SURGE  43323536000        5000  43323541000
## 244              HAIL  15727367548  3025537890  18752905438
## 153       FLASH FLOOD  16140812067  1421317100  17562129167
## 95            DROUGHT   1046106000 13972566000  15018672000
## 402         HURRICANE  11868319010  2741910000  14610229010
## 590       RIVER FLOOD   5118945500  5029459000  10148404500
## 427         ICE STORM   3944927860  5022113500   8967041360

The most expensive event is flood, accounting to around 37% of total injuries, but hurricanes, typhoons, tornados and storm surges causes great destructions either.

Conclusion

Run away from these weather events.