NOAA Storm Data Analysis(1950-2011) Wednesday, October 26, 2014

Analysis of DATA Storm Database to predict severe whether events 1. Summary

The intention of this study is to present a high-level analysis of the results of severe weather events in the US, from 1950 to 2011, concerning the total damage caused and the impact on public health.

The analysis is based on the DATA Storm Database. Details on the data may be found at the

National Weather Service Storm Data Documentation and at the National Climatic Data Center Storm Events FAQ The original data may be found at the Coursera Reproducible Research course web site: Storm Data [47Mb]

This study is trying to answer two questions:

This study is separated in three parts:

For reproducability purposes, we have chosen not to supply the cleaned data, but to present a way to load and read the data from it's original source, as described in the summary section.

Load required packages

library(reshape2)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.1

For reasons of effectiveness, speed and computer memory management, we have opted to read only the specific columns of the data, which we are going to work on.

Specifically, we need the Event Types, the fatalities, the injuries, the property cost, the property cost level, the crop cost and the crop cost level. These are the columns no 8 and 23 to 28 of the data set.

Read the only the desired columns from the data file

DATA <- read.csv("repdata_data_StormData.csv", colClasses = (c(rep("NULL", 7), 
    NA, rep("NULL", 14), NA, NA, NA, NA, NA, NA, rep("NULL", 9))), header = T, 
    as.is = T)

2.2. Part 2: Processing the Data

In order to calculate the total cost, the cost values contained in the data must be transformed to actual USD values. The notation used is: â?¢ H = 100$ â?¢ K = 1.000$ â?¢ M = 1.000.000$ â?¢ B = 1.000.000.000$

Convert the H , K , M , B notation values to USD

DATA$PROPDMGVAL <- DATA$PROPDMG * sapply(toupper(DATA$PROPDMGEXP), function(x) {
    switch(x, H = 10^2, K = 10^3, M = 10^6, B = 10^9, 1)
})
DATA$CROPDMGVAL <- DATA$CROPDMG * sapply(toupper(DATA$CROPDMGEXP), function(x) {
    switch(x, H = 10^2, K = 10^3, M = 10^6, B = 10^9, 1)
})

Compute total damage cost and total health events

DATA$COST <- DATA$PROPDMGVAL + DATA$CROPDMGVAL
DATA$HEALTHDMG <- DATA$FATALITIES + DATA$INJURIES

Convert the EVTYPE column to a factopr and set it to 'Ordered' for use in

the plots

DATA$EVTYPE <- factor(DATA$EVTYPE, levels = unique(DATA$EVTYPE), ordered = T)

In order to process the data, we have used the â??meltâ? and â??dcastâ? functions from the reshape2 R package, to aggregate the health events and the damage cost over the event types, by the â??sumâ? function.

Melt the damage costs and compute the aggregate of sums per Event Type

mcost <- melt(DATA, id.vars = "EVTYPE", measure.vars = c("PROPDMGVAL", "CROPDMGVAL", 
    "COST"))
dcost <- dcast(mcost, EVTYPE ~ variable, sum)
dcost <- dcost[order(-dcost$COST), ]
dcost$EVTYPE <- factor(dcost$EVTYPE, levels = unique(dcost$EVTYPE), ordered = T)

Melt the health events and compute the aggregate of sums per Event Type

mhealth <- melt(DATA, id.vars = "EVTYPE", measure.vars = c("FATALITIES", "INJURIES", 
    "HEALTHDMG"))
dhealth <- dcast(mhealth, EVTYPE ~ variable, sum)
dhealth <- dhealth[order(-dhealth$HEALTHDMG), ]
dhealth$EVTYPE <- factor(dhealth$EVTYPE, levels = unique(dhealth$EVTYPE), ordered = T)
  1. Results

In this part we present the results of the analysis. For the presentation purposes we have created two plots, using the ggplot2 graphical R package. 3.1 Health Impact by Event Type

The health impact analysis presents the top 20 events with the higher impact on the population's health. The measure has been calculated by adding the fatalities and injuries caused by each event type. For each event type, the fatalities and the injuries are also shown, in the same plot.

Analyse the impact on health Plot the top 20 Event Types with the higher

impact in the population's health

healthplot <- ggplot(data = dhealth[1:20, ], aes(x = reorder(EVTYPE, -HEALTHDMG), 
    y = HEALTHDMG)) + xlab("Event Type") + ylab("Total Impact")

Create a histogram

healthplot <- healthplot + geom_bar(fill = "red", stat = "identity") + theme(axis.text.x = element_text(angle = 90, 
    hjust = 0.5, vjust = 1))

Add separate lines for FATALITIES and INJURIES

healthplot <- healthplot + geom_line(aes(x = reorder(EVTYPE, -HEALTHDMG), y = FATALITIES, 
    group = 1, colour = "Fatalities"))
healthplot <- healthplot + geom_point(aes(x = reorder(EVTYPE, -HEALTHDMG), y = FATALITIES, 
    group = 1, colour = "Fatalities"))
healthplot <- healthplot + geom_line(aes(x = reorder(EVTYPE, -HEALTHDMG), y = INJURIES, 
    group = 2, colour = "Injuries"))
healthplot <- healthplot + geom_point(aes(x = reorder(EVTYPE, -HEALTHDMG), y = INJURIES, 
    group = 2, colour = "Injuries"))

Set the colours for each line

healthplot <- healthplot + scale_colour_manual("Type of Impact", breaks = c("Fatalities", 
    "Injuries"), values = c("yellow", "blue"))

Set the plot's title

healthplot <- healthplot + ggtitle("Top 20 Health Impacts by Event Type (1950-2001)")

Print the plot

print(healthplot)

plot of chunk unnamed-chunk-13

plot of chunk analyse_impact As we may see, Tornados are responsible for the majority of the health events, both for fatalities and injuries. The table of the top 20 event types is shown below:

print(dhealth[1:20, ])
##                 EVTYPE FATALITIES INJURIES HEALTHDMG
## 1              TORNADO       5633    91346     96979
## 99      EXCESSIVE HEAT       1903     6525      8428
## 2            TSTM WIND        504     6957      7461
## 36               FLOOD        470     6789      7259
## 15           LIGHTNING        816     5230      6046
## 27                HEAT        937     2100      3037
## 20         FLASH FLOOD        978     1777      2755
## 65           ICE STORM         89     1975      2064
## 16   THUNDERSTORM WIND        133     1488      1621
## 8         WINTER STORM        206     1321      1527
## 46           HIGH WIND        248     1137      1385
## 3                 HAIL         15     1361      1376
## 973  HURRICANE/TYPHOON         64     1275      1339
## 53          HEAVY SNOW        127     1021      1148
## 221           WILDFIRE         75      911       986
## 10  THUNDERSTORM WINDS         64      908       972
## 47            BLIZZARD        101      805       906
## 276                FOG         62      734       796
## 18         RIP CURRENT        368      232       600
## 227   WILD/FOREST FIRE         12      545       557

##3.3 Damage by Event Type The Damage analysis presents the top 20 events responsible for the higher damages. The total damage has been calculated by adding the property damages and the crop damages caused by each event type. For each event type, the property and crop damages are also shown, in the same plot.

Analyse the damage caused Plot the top 20 Event Types which caused the

higher damage

costplot <- ggplot(data = dcost[1:20, ], aes(x = reorder(EVTYPE, -COST), y = COST)) + 
    xlab("Event Type") + ylab("Total Cost")

Create a histogram

costplot <- costplot + geom_bar(fill = "blue", stat = "identity", group = 1) + 
    theme(axis.text.x = element_text(angle = 90, hjust = 0.5, vjust = 1))

Add separate lines for property damages and for crop damages

costplot <- costplot + geom_line(aes(x = reorder(EVTYPE, -COST), y = PROPDMGVAL, 
    group = 2, colour = "Property damage"))
costplot <- costplot + geom_point(aes(x = reorder(EVTYPE, -COST), y = PROPDMGVAL, 
    group = 2, colour = "Property damage"))
costplot <- costplot + geom_line(aes(x = reorder(EVTYPE, -COST), y = CROPDMGVAL, 
    group = 3, colour = "Crop damage"))
costplot <- costplot + geom_point(aes(x = reorder(EVTYPE, -COST), y = CROPDMGVAL, 
    group = 3, colour = "Crop damage"))

Set the colours for each line

costplot <- costplot + scale_colour_manual("Type of damage", breaks = c("Property damage", 
    "Crop damage"), values = c("red", "green"))

Set the plot's title

costplot <- costplot + ggtitle("Top 20 Damages by Event Type (1950-2001)")

Print the plot

print(costplot)

plot of chunk unnamed-chunk-20

plot of chunk analyse_damage As we may notice, the order of the event types which have caused the biggest damages is not the same as that of the event types responsible for health events. The biggest damages have been caused by floods and hurricanes and special notice should be made to drough, which has coused sever crop damage over the years. The table of the top 20 event types is shown below:

print(dcost[1:20, ])
##                        EVTYPE PROPDMGVAL CROPDMGVAL      COST
## 36                      FLOOD  1.447e+11  5.662e+09 1.503e+11
## 973         HURRICANE/TYPHOON  6.931e+10  2.608e+09 7.191e+10
## 1                     TORNADO  5.694e+10  4.150e+08 5.735e+10
## 204               STORM SURGE  4.332e+10  5.000e+03 4.332e+10
## 3                        HAIL  1.573e+10  3.026e+09 1.876e+10
## 20                FLASH FLOOD  1.614e+10  1.421e+09 1.756e+10
## 194                   DROUGHT  1.046e+09  1.397e+10 1.502e+10
## 226                 HURRICANE  1.187e+10  2.742e+09 1.461e+10
## 52                RIVER FLOOD  5.119e+09  5.029e+09 1.015e+10
## 65                  ICE STORM  3.945e+09  5.022e+09 8.967e+09
## 209            TROPICAL STORM  7.704e+09  6.783e+08 8.382e+09
## 8                WINTER STORM  6.688e+09  2.694e+07 6.715e+09
## 46                  HIGH WIND  5.270e+09  6.386e+08 5.909e+09
## 221                  WILDFIRE  4.765e+09  2.955e+08 5.061e+09
## 2                   TSTM WIND  4.485e+09  5.540e+08 5.039e+09
## 976          STORM SURGE/TIDE  4.641e+09  8.500e+05 4.642e+09
## 16          THUNDERSTORM WIND  3.483e+09  4.148e+08 3.898e+09
## 13             HURRICANE OPAL  3.173e+09  1.900e+07 3.192e+09
## 227          WILD/FOREST FIRE  3.002e+09  1.068e+08 3.109e+09
## 313 HEAVY RAIN/SEVERE WEATHER  2.500e+09  0.000e+00 2.500e+09