1. Synopsis

In this report we analyse the public health and economic impact of the storms and other major weather events in United States. Many of such severe events can result in fatalities, injuries, crop and property damage, and preventing such outcomes is a key concern for federal and state government and for municipalities. The following analysis investigates across the United States, which types of events:

1. are the most harmful with respect to population health?
2. have the greatest economic consequences?

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete. The data source is in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size.

The Database was downloaded from this link pointed to the cloudfront web site: NOAA Storm Database

Information related to this Databes can be found in this link: Documentation.

2. Data Processing

2.1 Data Loading

After establishing working folder for the analysis, must download NOAA Storm Database from the website and load the database placed in the working directory.

## Preparing folder and download the file
bz2Url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(bz2Url, destfile = "./repdata%2Fdata%2FStormData.csv.bz2", method = "curl")

## Readind Database
stormDB <- read.csv(bzfile("repdata%2Fdata%2FStormData.csv.bz2"), header = TRUE)

## Load libraries
library("dplyr")
library("ggplot2")
library("xtable")

## Set properly if different from USA
Sys.setlocale("LC_TIME", "English")
## [1] "English_United States.1252"

2.2 Looking at Columns Names

## Converting data.table to data.frame
stormDP <- tbl_df(stormDB)

## Examining columns names???
colnames(stormDP)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
## What kind of information we have???
str(stormDP)
## tibble [902,297 x 37] (S3: tbl_df/tbl/data.frame)
##  $ STATE__   : num [1:902297] 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr [1:902297] "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr [1:902297] "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr [1:902297] "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num [1:902297] 97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr [1:902297] "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr [1:902297] "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr [1:902297] "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr [1:902297] "" "" "" "" ...
##  $ BGN_LOCATI: chr [1:902297] "" "" "" "" ...
##  $ END_DATE  : chr [1:902297] "" "" "" "" ...
##  $ END_TIME  : chr [1:902297] "" "" "" "" ...
##  $ COUNTY_END: num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi [1:902297] NA NA NA NA NA NA ...
##  $ END_RANGE : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr [1:902297] "" "" "" "" ...
##  $ END_LOCATI: chr [1:902297] "" "" "" "" ...
##  $ LENGTH    : num [1:902297] 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num [1:902297] 100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int [1:902297] 3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num [1:902297] 0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num [1:902297] 15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num [1:902297] 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr [1:902297] "K" "K" "K" "K" ...
##  $ CROPDMG   : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr [1:902297] "" "" "" "" ...
##  $ WFO       : chr [1:902297] "" "" "" "" ...
##  $ STATEOFFIC: chr [1:902297] "" "" "" "" ...
##  $ ZONENAMES : chr [1:902297] "" "" "" "" ...
##  $ LATITUDE  : num [1:902297] 3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num [1:902297] 8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num [1:902297] 3051 0 0 0 0 ...
##  $ LONGITUDE_: num [1:902297] 8806 0 0 0 0 ...
##  $ REMARKS   : chr [1:902297] "" "" "" "" ...
##  $ REFNUM    : num [1:902297] 1 2 3 4 5 6 7 8 9 10 ...

2.3 Data Subsetting: Finding the most harmful events with respect to population health.

Subset the Database on the parameters of interest, Injuries and fatalities. Tables with summaries are created with this new variables for each event: Num (total number), Fatalities, Injuries, FatalitiesAVG (average number of fatalities), InjuriesAVG, PercWithFatalities (percentage of events with at least one dead) PercWithInjuries (percentage of events with at least one injury).

## Table: Total injuries and fatalities for event.
StormTB <- stormDP %>% group_by(EVTYPE) %>% summarize(Num = n(), Fatalities = sum(FATALITIES), Fatalities_AVG = round(mean(FATALITIES), 2), Injuries = sum(INJURIES), Injuries_AVG = round(mean(INJURIES), 2))

## Table: Total events with at least one injury.
InjurTB <- stormDP %>% filter(INJURIES > 0) %>% group_by(EVTYPE) %>% summarize(WithInjuries = n())

## Table: Total events with at least one death.
DeadsTB <- stormDP %>% filter(FATALITIES > 0) %>% group_by(EVTYPE) %>% summarize(WithDeads = n())

## Joining to the summary table.
StormTB <- left_join(StormTB, InjurTB, by = "EVTYPE")
StormTB <- left_join(StormTB, DeadsTB, by = "EVTYPE")

## Percentage with at least one injury.
StormTB <- mutate(StormTB, Perc_with_Injuries = round(WithInjuries / Num * 100, 2))

## Percentage with at least one dead.
StormTB <- mutate(StormTB, Perc_with_Fatalities = round(WithDeads / Num * 100, 2))

## Final summary table for the analysis - Storm Harmful Events
StormHE <- StormTB %>% select(EVTYPE, Num, Fatalities, Fatalities_AVG, Perc_with_Fatalities, Injuries, Injuries_AVG, Perc_with_Injuries) %>% arrange(desc(Num))

2.4 Data Subsetting: Finding the events that have the larger economic consequences.

In order to organize the monetary values it’s necesary to transform the exponents for property and crop damage for each level. So the exponents are transformed in numeric values before the calculation of the total value for damages.

## See the values found for PROPDMGEXP
unique(stormDP$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
## We took the information result to transform in numeric values the PROPDMGEXP
stormDP$PropExpN <- 0
stormDP$PropExpN[stormDP$PROPDMGEXP == ""] <- 1
stormDP$PropExpN[stormDP$PROPDMGEXP == "-"] <- 0
stormDP$PropExpN[stormDP$PROPDMGEXP == "?"] <- 0
stormDP$PropExpN[stormDP$PROPDMGEXP == "+"] <- 0
stormDP$PropExpN[stormDP$PROPDMGEXP == "0"] <- 1
stormDP$PropExpN[stormDP$PROPDMGEXP == "1"] <- 10
stormDP$PropExpN[stormDP$PROPDMGEXP == "2"] <- 100
stormDP$PropExpN[stormDP$PROPDMGEXP == "3"] <- 1000
stormDP$PropExpN[stormDP$PROPDMGEXP == "4"] <- 10000
stormDP$PropExpN[stormDP$PROPDMGEXP == "5"] <- 100000
stormDP$PropExpN[stormDP$PROPDMGEXP == "6"] <- 1000000
stormDP$PropExpN[stormDP$PROPDMGEXP == "7"] <- 10000000
stormDP$PropExpN[stormDP$PROPDMGEXP == "8"] <- 100000000
stormDP$PropExpN[stormDP$PROPDMGEXP == "B"] <- 1000000000
stormDP$PropExpN[stormDP$PROPDMGEXP == "h"] <- 100
stormDP$PropExpN[stormDP$PROPDMGEXP == "H"] <- 100
stormDP$PropExpN[stormDP$PROPDMGEXP == "K"] <- 1000
stormDP$PropExpN[stormDP$PROPDMGEXP == "m"] <- 1000000
stormDP$PropExpN[stormDP$PROPDMGEXP == "M"] <- 1000000

## Final numeric value for property damages.
stormDP$PropDMGN <- stormDP$PROPDMG*stormDP$PropExpN
## Let's take a peak into the values 
summary(stormDP$PropDMGN)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 0.000e+00 0.000e+00 4.746e+05 5.000e+02 1.150e+11
## See the values found for CROPDMGEXP
unique(stormDP$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"
## We took the information result to transform in numeric values the CROPDMGEXP
stormDP$CropExpN <- 0
stormDP$CropExpN[stormDP$CROPDMGEXP == ""] <- 1
stormDP$CropExpN[stormDP$CROPDMGEXP == "?"] <- 0
stormDP$CropExpN[stormDP$CROPDMGEXP == "0"] <- 1
stormDP$CropExpN[stormDP$CROPDMGEXP == "2"] <- 100
stormDP$CropExpN[stormDP$CROPDMGEXP == "B"] <- 1000000000
stormDP$CropExpN[stormDP$CROPDMGEXP == "k"] <- 1000
stormDP$CropExpN[stormDP$CROPDMGEXP == "K"] <- 1000
stormDP$CropExpN[stormDP$CROPDMGEXP == "m"] <- 1000000
stormDP$CropExpN[stormDP$CROPDMGEXP == "M"] <- 1000000

## Final numeric value for crop damages.
stormDP$CropDMGN <- stormDP$CROPDMG*stormDP$CropExpN
## Let's take a peak into the values 
summary(stormDP$CropDMGN)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 0.000e+00 0.000e+00 5.442e+04 0.000e+00 5.000e+09
## Final summary table for the analysis - Storm Economics Consequences
StormEC <- stormDP %>% group_by(EVTYPE) %>% summarize(PropDam = round(sum(PropDMGN), 2), PropDam_AVG = round(mean(PropDMGN), 2), CropDam = sum(CropDMGN), CropDam_AVG = round(mean(CropDMGN), 2), TotalDamages = round(sum(PropDMGN) + sum(CropDMGN), 2), TotalDamages_AVG = round(mean(sum(PropDMGN) + sum(CropDMGN)), 2)) %>% arrange(desc(TotalDamages))

3. Results

3.1 Which types of events are most harmful with respect to population health?

Let´see the Head and the graph for the events with the larger number of fatalities.

## Top 10 for FATALITIES Events
print(xtable(as.data.frame(StormHE %>% arrange(desc(Fatalities)))[1:10, ], auto = TRUE), type = "html")
EVTYPE Num Fatalities Fatalities_AVG Perc_with_Fatalities Injuries Injuries_AVG Perc_with_Injuries
1 TORNADO 60652 5633 0.09 2.64 91346 1.51 12.70
2 EXCESSIVE HEAT 1678 1903 1.13 34.39 6525 3.89 9.83
3 FLASH FLOOD 54277 978 0.02 1.17 1777 0.03 0.70
4 HEAT 767 937 1.22 23.21 2100 2.74 6.00
5 LIGHTNING 15754 816 0.05 4.82 5230 0.33 17.83
6 TSTM WIND 219940 504 0.00 0.18 6957 0.03 1.21
7 FLOOD 25326 470 0.02 1.18 6789 0.27 0.61
8 RIP CURRENT 470 368 0.78 69.57 232 0.49 25.32
9 HIGH WIND 20212 248 0.01 0.91 1137 0.06 2.03
10 AVALANCHE 386 224 0.58 45.08 170 0.44 27.72
# Top 10 Events

## Modification for the graph label
levels(StormHE$EVTYPE) <- gsub(" ", "\n", levels(StormHE$EVTYPE))

## Order the column decreasing view
StormHE$EVTYPE <- factor(StormHE$EVTYPE, levels = StormHE$EVTYPE[order(StormHE$Fatalities, decreasing = TRUE)])

## Graph - Storm Harmful Events
g <- ggplot(head(as.data.frame(StormHE), n = 8), aes(EVTYPE, Fatalities))
g + geom_bar(stat = 'identity') + labs(title = "Top 8 Fatalities events in USA", x = "Event Type", y = "Fatalities Count")

## Top 10 for INJURIES Events
print(xtable(as.data.frame(StormHE %>% arrange(desc(Injuries)))[1:10, ], auto = TRUE), type = "html")
EVTYPE Num Fatalities Fatalities_AVG Perc_with_Fatalities Injuries Injuries_AVG Perc_with_Injuries
1 TORNADO 60652 5633 0.09 2.64 91346 1.51 12.70
2 TSTM WIND 219940 504 0.00 0.18 6957 0.03 1.21
3 FLOOD 25326 470 0.02 1.18 6789 0.27 0.61
4 EXCESSIVE HEAT 1678 1903 1.13 34.39 6525 3.89 9.83
5 LIGHTNING 15754 816 0.05 4.82 5230 0.33 17.83
6 HEAT 767 937 1.22 23.21 2100 2.74 6.00
7 ICE STORM 2006 89 0.04 2.84 1975 0.98 3.09
8 FLASH FLOOD 54277 978 0.02 1.17 1777 0.03 0.70
9 THUNDERSTORM WIND 82563 133 0.00 0.13 1488 0.02 0.74
10 HAIL 288661 15 0.00 0.00 1361 0.00 0.10
# Top 10 Events

## Modification for the graph label
levels(StormHE$EVTYPE) <- gsub(" ", "\n", levels(StormHE$EVTYPE))

## Order the column decreasing view
StormHE$EVTYPE <- factor(StormHE$EVTYPE, levels = StormHE$EVTYPE[order(StormHE$Injuries, decreasing = TRUE)])

## Graph - Storm Injuries Events
j <- ggplot(head(as.data.frame(StormHE), n = 8), aes(EVTYPE, Injuries))
j + geom_bar(stat = 'identity') + labs(title = "Top 8 Injuries events in USA", x = "Event Type", y = "Injuries Count")

Analysis: The graphics shows us, that Tornados caused the maximun number of fatalities and injuries, and for this reason it’s the most harmful with respect to population health.

3.2 Which types of events have the greatest economic consequences?

Let´see the Head and the graph for the events with the greatest economic consequences.

## Top 10 for Economic Consequences
print(xtable(as.data.frame(StormEC)[1:10, ], digits = 0, auto = TRUE), type = "html")
## Warning in storage.mode(x) <- "integer": NAs introduced by coercion to integer
## range
EVTYPE PropDam PropDam_AVG CropDam CropDam_AVG TotalDamages TotalDamages_AVG
1 FLOOD 144657709807 5711826 NA 223563 150319678257 150319678257
2 HURRICANE/TYPHOON 69305840000 787566364 NA 29634918 71913712800 71913712800
3 TORNADO 56947380617 938920 414953270 6842 57362333887 57362333887
4 STORM SURGE 43323536000 165990559 5000 19 43323541000 43323541000
5 HAIL 15735267513 54511 NA 10483 18761221986 18761221986
6 FLASH FLOOD 16822673979 309941 1421317100 26186 18243991079 18243991079
7 DROUGHT 1046106000 420461 NA 5615983 15018672000 15018672000
8 HURRICANE 11868319010 68208730 NA 15758103 14610229010 14610229010
9 RIVER FLOOD 5118945500 29589280 NA 29072017 10148404500 10148404500
10 ICE STORM 3944927860 1966564 NA 2503546 8967041360 8967041360
# Top 10 Events

## Modification for the graph label
levels(StormEC$EVTYPE) <- gsub(" ", "\n", levels(StormEC$EVTYPE))

## Order the column decreasing view
StormEC$EVTYPE <- factor(StormEC$EVTYPE, levels = StormEC$EVTYPE[order(StormEC$TotalDamages, decreasing = TRUE)])

## Graph - Storm Harmful Events
h <- ggplot(head(as.data.frame(StormEC), n = 8), aes(EVTYPE, TotalDamages/1000000000))
h + geom_bar(stat = 'identity') + labs(title="Top 8 events causing Economic Consequences", x = "Event Type", y = "Cost in dollars")

Analysis: The graphics shows us, that Flood caused the maximun number of economic losses (150 billions of dollars), and for this reason this event causes the mosts losses and the major impact in economics consequences.