Synopsis

One of the most important and difficult tasks for goverment / municipal managers are related with the effects of wheather events and how to prioritize the resources to handle accordingly.

In order to provide a guide to them, in this report we will use the data of U.S. National Oceanic and Atmospheric Administration’s (NOAA) to evaluate the Economic and Population Impact (Fatalities and Injuries) of all the wheather events in the U.S. from 1950 and end in November 2011.

The Database could be download it from the following URL:

http://www.noaa.gov

With this report We will answer two main questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

We will base our analysis in the following variables that are included in the NOAA Database:

The analysis will be made using R Programming Language, Rstudio to write the report and RPubs to publicate it.

In order to get the most harmful wheather event, we will create a Data Set that aggregate all the events by number of FATALITIES and INJURIES. After that we will make two plot’s of the top-10 events.

Regarding Economic Impact, we will create a Data Set that aggregate (sum) all the events by total amount of damage (property + crop). We will also make a plot of the top-10 events.

The Report will have two main sections:

We hope that this report could be used in order to create some strategy to act accordingly regarding the different wheather events.

This report will not have any specific recommendations.

Data Processing

Read the data into a Dataset

dt <- read.csv("repdata-data-StormData.csv.bz2", stringsAsFactors = FALSE)
colnames(dt)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Let’s answer the two questions:

1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Create a new Data Set with only the EVTYPE and summarize the Injuries and Fatalities by Wheather Event (EVTYPE)

dt2 <- aggregate(cbind(INJURIES,FATALITIES)~EVTYPE,dt,sum, na.rm = TRUE )
colnames(dt2)
## [1] "EVTYPE"     "INJURIES"   "FATALITIES"

For information purposes, let’s create two new column named PERCENTAGE_INJURIES and PERCENTAGE_FATALITIES that calculate the percentage for each event

total_injuries <- sum(dt2$INJURIES)
dt2$PERCENTAGE_INJURIES <- as.numeric((dt2$INJURIES / total_injuries) * 100)

total_fatalities <- sum(dt2$FATALITIES)
dt2$PERCENTAGE_FATALITIES <- as.numeric((dt2$FATALITIES / total_fatalities) * 100)

We will make two plot, Top_10 for Fatalities and Top_10 for Injuries

Top_10 Fatalities

Let’s get the top_10 values for fatalities, for it let’s order the dataset, choose the first 10 rows and add a final rows that include all the other values (other). Finally let’s change it to numeric format.

top10_fatalities <- dt2[with(dt2,order(-dt2$PERCENTAGE_FATALITIES)),]

l <- length(dt2$EVTYPE)

total_other_injuries <- sum(top10_fatalities$INJURIES[11:l])
total_other_fatalities <- sum(top10_fatalities$FATALITIES[11:l])
total_other_p_injuries <- sum(top10_fatalities$PERCENTAGE_INJURIES[11:l])
total_other_p_fatalities <- sum(top10_fatalities$PERCENTAGE_FATALITIES[11:l])

top10_fatalities <- top10_fatalities[1:10,]

top10_fatalities <- rbind(top10_fatalities,
                c("Others",
                  total_other_injuries,
                  total_other_fatalities,
                  total_other_p_injuries,
                  total_other_p_fatalities))

top10_fatalities$INJURIES <- 
  as.numeric(top10_fatalities$INJURIES)

top10_fatalities$FATALITIES <- 
  as.numeric(top10_fatalities$FATALITIES)


top10_fatalities$PERCENTAGE_INJURIES <- 
  as.numeric(top10_fatalities$PERCENTAGE_INJURIES)

top10_fatalities$PERCENTAGE_FATALITIES <- 
  as.numeric(top10_fatalities$PERCENTAGE_FATALITIES)


options(digits = 2)
top10_fatalities
##             EVTYPE INJURIES FATALITIES PERCENTAGE_INJURIES
## 834        TORNADO    91346       5633               65.00
## 130 EXCESSIVE HEAT     6525       1903                4.64
## 153    FLASH FLOOD     1777        978                1.26
## 275           HEAT     2100        937                1.49
## 464      LIGHTNING     5230        816                3.72
## 856      TSTM WIND     6957        504                4.95
## 170          FLOOD     6789        470                4.83
## 585    RIP CURRENT      232        368                0.17
## 359      HIGH WIND     1137        248                0.81
## 19       AVALANCHE      170        224                0.12
## 11          Others    18265       3064               13.00
##     PERCENTAGE_FATALITIES
## 834                  37.2
## 130                  12.6
## 153                   6.5
## 275                   6.2
## 464                   5.4
## 856                   3.3
## 170                   3.1
## 585                   2.4
## 359                   1.6
## 19                    1.5
## 11                   20.2

Let’s plot the result:

library(ggplot2)

g_top10_fatalities <- 
    ggplot(top10_fatalities,aes(x=top10_fatalities$EVTYPE,
                            y=top10_fatalities$FATALITIES, 
                            ymax=max(top10_fatalities$FATALITIES)*1.05,
                            fill=factor(EVTYPE))) +
      geom_bar(stat="identity") + 
        coord_flip() + 
      geom_text(aes(x=top10_fatalities$EVTYPE, y=top10_fatalities$FATALITIES,
                  label=format(top10_fatalities$FATALITIES,digits=3),size=1), 
                  position = position_dodge(width=0.9)) +
      ylab('Fatalities') +
      xlab('Wheather Event') +
      theme(legend.position="none") +
      ggtitle("U.S. Total Fatalities per Wheather Event (Top 10)")
print(g_top10_fatalities)

Top_10 Injuries

Let’s get the top_10 values for injuries, for it let’s order the dataset, choose the first 10 rows and add a final rows that include all the other values (other). Finally let’s change it to numeric format.

top10_injuries <- dt2[with(dt2,order(-dt2$PERCENTAGE_INJURIES)),]

l <- length(dt2$EVTYPE)

total_other_injuries <- sum(top10_injuries$INJURIES[11:l])
total_other_fatalities <- sum(top10_injuries$FATALITIES[11:l])
total_other_p_injuries <- sum(top10_injuries$PERCENTAGE_INJURIES[11:l])
total_other_p_fatalities <- sum(top10_injuries$PERCENTAGE_FATALITIES[11:l])

top10_injuries <- top10_injuries[1:10,]

top10_injuries <- rbind(top10_injuries,
                c("Others",
                  total_other_injuries,
                  total_other_fatalities,
                  total_other_p_injuries,
                  total_other_p_fatalities))

top10_injuries$INJURIES <- 
  as.numeric(top10_injuries$INJURIES)

top10_injuries$FATALITIES <- 
  as.numeric(top10_injuries$FATALITIES)

top10_injuries$PERCENTAGE_INJURIES <- 
  as.numeric(top10_injuries$PERCENTAGE_INJURIES)

top10_injuries$PERCENTAGE_FATALITIES <- 
  as.numeric(top10_injuries$PERCENTAGE_FATALITIES)

options(digits = 2)
top10_injuries
##                EVTYPE INJURIES FATALITIES PERCENTAGE_INJURIES
## 834           TORNADO    91346       5633               65.00
## 856         TSTM WIND     6957        504                4.95
## 170             FLOOD     6789        470                4.83
## 130    EXCESSIVE HEAT     6525       1903                4.64
## 464         LIGHTNING     5230        816                3.72
## 275              HEAT     2100        937                1.49
## 427         ICE STORM     1975         89                1.41
## 153       FLASH FLOOD     1777        978                1.26
## 760 THUNDERSTORM WIND     1488        133                1.06
## 244              HAIL     1361         15                0.97
## 11             Others    14980       3667               10.66
##     PERCENTAGE_FATALITIES
## 834                37.194
## 856                 3.328
## 170                 3.103
## 130                12.565
## 464                 5.388
## 275                 6.187
## 427                 0.588
## 153                 6.458
## 760                 0.878
## 244                 0.099
## 11                 24.213

Let’s plot the result:

g_top10_injuries <- 
    ggplot(top10_injuries,aes(x=top10_injuries$EVTYPE,
                            y=top10_injuries$INJURIES,  
                            fill=factor(EVTYPE))) +
      geom_bar(stat="identity") + 
        coord_flip() + 
      geom_text(aes(x=top10_injuries$EVTYPE, y=top10_injuries$INJURIES,
                    ymax=max(top10_injuries$INJURIES)*1.05,
                    label=format(top10_injuries$INJURIES,digits=3),size=1),
                position = position_dodge(width=0.9)) +
      ylab('Injuries') +
      xlab('Wheather Event') +
      theme(legend.position="none") +
      ggtitle("U.S. Total Injuries per Wheather Event (Top 10)")
print(g_top10_injuries)

2. Across the United States, which types of events have the greatest economic consequences?

Create a new Data Set with only four columns (EVTYPE,PROPDMG, PROPDMGEXP, CROPMDG, CROPDMGEXP). Create a new variable (PROPDMG_2 and CROPDMG_2) that consider the exponential value (k,M,b)

dt3 <- dt[c(8,25:28)]

dt3$PROPDMG_2 <- ifelse(  
            dt3$PROPDMGEXP %in% c("K","k"), 
                dt3$PROPDMG * 1000, 

            (ifelse( dt3$PROPDMGEXP %in% c("M","m"), 
                dt3$PROPDMG * 1000000,

             (ifelse( dt3$PROPDMGEXP %in% c("B","b"), 
                dt3$PROPDMG * 1000000000, 
                dt3$PROPDMG)))))

dt3$CROPDMG_2 <- ifelse(  
            dt3$CROPDMGEXP %in% c("K","k"), 
                dt3$CROPDMG * 1000, 

            (ifelse( dt3$CROPDMGEXP %in% c("M","m"), 
                dt3$CROPDMG * 1000000,

             (ifelse( dt3$CROPDMGEXP %in% c("B","b"), 
                dt3$CROPDMG * 1000000000, 
                dt3$CROPDMG)))))

Lets summarize the damage (property + crop) and create a new dataset

dt4 <- aggregate(PROPDMG_2+CROPDMG_2~EVTYPE,dt3,sum,na.rm = TRUE )
names(dt4) <- c("EVTYPE","ECONOMIC_DAMAGE")

For information purposes, let’s create a new column named PERCENTAGE_ECONOMIC_DAMAGE that calculate the percentage for each event

total_damage <- sum(dt4$ECONOMIC_DAMAGE)
dt4$PERCENTAGE_ECONOMIC_DAMAGE <- 
  as.numeric((dt4$ECONOMIC_DAMAGE / total_damage) * 100)

Let’s get the top_10 values, for it let’s order the dataset, choose the first 10 rows and add a final rows that include all the other values (other)

l <- length(dt4$EVTYPE)

dt5 <- dt4[with(dt4,order(-dt4$PERCENTAGE_ECONOMIC_DAMAGE)),]

top10_economic <- dt5[1:10,]

top10_economic <- rbind(top10_economic,
                c("Others",
                  sum(dt5$ECONOMIC_DAMAGE[11:l]),
                  sum(dt5$PERCENTAGE_ECONOMIC_DAMAGE[11:l])))


top10_economic$ECONOMIC_DAMAGE <- 
  as.numeric(top10_economic$ECONOMIC_DAMAGE)

top10_economic$PERCENTAGE_ECONOMIC_DAMAGE <- 
  as.numeric(top10_economic$PERCENTAGE_ECONOMIC_DAMAGE)

options(digits = 2)
top10_economic
##                EVTYPE ECONOMIC_DAMAGE PERCENTAGE_ECONOMIC_DAMAGE
## 170             FLOOD         1.5e+11                       31.6
## 411 HURRICANE/TYPHOON         7.2e+10                       15.1
## 834           TORNADO         5.7e+10                       12.0
## 670       STORM SURGE         4.3e+10                        9.1
## 244              HAIL         1.9e+10                        3.9
## 153       FLASH FLOOD         1.8e+10                        3.7
## 95            DROUGHT         1.5e+10                        3.2
## 402         HURRICANE         1.5e+10                        3.1
## 590       RIVER FLOOD         1.0e+10                        2.1
## 427         ICE STORM         9.0e+09                        1.9
## 11             Others         6.8e+10                       14.4

Let’s plot it, we will change the scale to billions USD:

top10_economic$ECONOMIC_DAMAGE <- 
  as.numeric(top10_economic$ECONOMIC_DAMAGE) / 1000000000

g_top10_economic <- 
  ggplot(top10_economic,aes(x=top10_economic$EVTYPE,
                            y=top10_economic$ECONOMIC_DAMAGE,   
        fill=factor(EVTYPE))) +
      geom_bar(stat="identity") + 
      coord_flip() + 
    geom_text(aes(x=top10_economic$EVTYPE, 
                  y=top10_economic$ECONOMIC_DAMAGE,
                  ymax=max(top10_economic$ECONOMIC_DAMAGE)*1.05,
                  label=format(top10_economic$ECONOMIC_DAMAGE,digits=3),size=1),
                  position = position_dodge(width=0.9)) +
    ylab('Economic Damage (Billion Dollars)') +
    xlab('Wheather Event') +
    theme(legend.position="none") +
    ggtitle("U.S. Total Economic Damage per Wheather Event (Top 10)")
print(g_top10_economic)

Results

Those are the results of the report:

Top_10 Whether Events in the U.S. by Fatalities

EVTYPE INJURIES FATALITIES PERCENTAGE_INJURIES PERCENTAGE_FATALITIES
834 TORNADO 91346 5633 65.00 37.2
130 EXCESSIVE HEAT 6525 1903 4.64 12.6
153 FLASH FLOOD 1777 978 1.26 6.5
275 HEAT 2100 937 1.49 6.2
464 LIGHTNING 5230 816 3.72 5.4
856 TSTM WIND 6957 504 4.95 3.3
170 FLOOD 6789 470 4.83 3.1
585 RIP CURRENT 232 368 0.17 2.4
359 HIGH WIND 1137 248 0.81 1.6
19 AVALANCHE 170 224 0.12 1.5
11 Others 18265 3064 13.00 20.2

Top_10 Whether Events in the U.S. by Injuries

EVTYPE INJURIES FATALITIES PERCENTAGE_INJURIES PERCENTAGE_FATALITIES
834 TORNADO 91346 5633 65.00 37.19
856 TSTM WIND 6957 504 4.95 3.33
170 FLOOD 6789 470 4.83 3.10
130 EXCESSIVE HEAT 6525 1903 4.64 12.57
464 LIGHTNING 5230 816 3.72 5.39
275 HEAT 2100 937 1.49 6.19
427 ICE STORM 1975 89 1.41 0.59
153 FLASH FLOOD 1777 978 1.26 6.46
760 THUNDERSTORM WIND 1488 133 1.06 0.88
244 HAIL 1361 15 0.97 0.10
11 Others 14980 3667 10.66 24.21

Top_10 Whether Events in the U.S. by Economic Damage

EVTYPE ECONOMIC_DAMAGE PERCENTAGE_ECONOMIC_DAMAGE
170 FLOOD 150 31.6
411 HURRICANE/TYPHOON 72 15.1
834 TORNADO 57 12.0
670 STORM SURGE 43 9.1
244 HAIL 19 3.9
153 FLASH FLOOD 18 3.7
95 DROUGHT 15 3.1
402 HURRICANE 15 3.1
590 RIVER FLOOD 10 2.1
427 ICE STORM 9 1.9
11 Others 68 14.4