In this report we analyse the public health and economic impact of the storms and other major weather events in United States. Many of such severe events can result in fatalities, injuries, crop and property damage, and preventing such outcomes is a key concern for federal and state government and for municipalities. The following analysis investigates across the United States, which types of events:
1. are the most harmful with respect to population health?
2. have the greatest economic consequences?
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete. The data source is in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size.
The Database was downloaded from this link pointed to the cloudfront web site: NOAA Storm Database
Information related to this Databes can be found in this link: Documentation.
After establishing working folder for the analysis, must download NOAA Storm Database from the website and load the database placed in the working directory.
## Preparing folder and download the file
bz2Url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(bz2Url, destfile = "./repdata%2Fdata%2FStormData.csv.bz2", method = "curl")
## Readind Database
stormDB <- read.csv(bzfile("repdata%2Fdata%2FStormData.csv.bz2"), header = TRUE)
## Load libraries
library("dplyr")
library("ggplot2")
library("xtable")
## Set properly if different from USA
Sys.setlocale("LC_TIME", "English")
## [1] "English_United States.1252"
## Converting data.table to data.frame
stormDP <- tbl_df(stormDB)
## Examining columns names???
colnames(stormDP)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
## What kind of information we have???
str(stormDP)
## tibble [902,297 x 37] (S3: tbl_df/tbl/data.frame)
## $ STATE__ : num [1:902297] 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr [1:902297] "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr [1:902297] "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr [1:902297] "CST" "CST" "CST" "CST" ...
## $ COUNTY : num [1:902297] 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr [1:902297] "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr [1:902297] "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr [1:902297] "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr [1:902297] "" "" "" "" ...
## $ BGN_LOCATI: chr [1:902297] "" "" "" "" ...
## $ END_DATE : chr [1:902297] "" "" "" "" ...
## $ END_TIME : chr [1:902297] "" "" "" "" ...
## $ COUNTY_END: num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi [1:902297] NA NA NA NA NA NA ...
## $ END_RANGE : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr [1:902297] "" "" "" "" ...
## $ END_LOCATI: chr [1:902297] "" "" "" "" ...
## $ LENGTH : num [1:902297] 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num [1:902297] 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int [1:902297] 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num [1:902297] 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num [1:902297] 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num [1:902297] 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr [1:902297] "K" "K" "K" "K" ...
## $ CROPDMG : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr [1:902297] "" "" "" "" ...
## $ WFO : chr [1:902297] "" "" "" "" ...
## $ STATEOFFIC: chr [1:902297] "" "" "" "" ...
## $ ZONENAMES : chr [1:902297] "" "" "" "" ...
## $ LATITUDE : num [1:902297] 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num [1:902297] 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num [1:902297] 3051 0 0 0 0 ...
## $ LONGITUDE_: num [1:902297] 8806 0 0 0 0 ...
## $ REMARKS : chr [1:902297] "" "" "" "" ...
## $ REFNUM : num [1:902297] 1 2 3 4 5 6 7 8 9 10 ...
Subset the Database on the parameters of interest, Injuries and fatalities. Tables with summaries are created with this new variables for each event: Num (total number), Fatalities, Injuries, FatalitiesAVG (average number of fatalities), InjuriesAVG, PercWithFatalities (percentage of events with at least one dead) PercWithInjuries (percentage of events with at least one injury).
## Table: Total injuries and fatalities for event.
StormTB <- stormDP %>% group_by(EVTYPE) %>% summarize(Num = n(), Fatalities = sum(FATALITIES), Fatalities_AVG = round(mean(FATALITIES), 2), Injuries = sum(INJURIES), Injuries_AVG = round(mean(INJURIES), 2))
## Table: Total events with at least one injury.
InjurTB <- stormDP %>% filter(INJURIES > 0) %>% group_by(EVTYPE) %>% summarize(WithInjuries = n())
## Table: Total events with at least one death.
DeadsTB <- stormDP %>% filter(FATALITIES > 0) %>% group_by(EVTYPE) %>% summarize(WithDeads = n())
## Joining to the summary table.
StormTB <- left_join(StormTB, InjurTB, by = "EVTYPE")
StormTB <- left_join(StormTB, DeadsTB, by = "EVTYPE")
## Percentage with at least one injury.
StormTB <- mutate(StormTB, Perc_with_Injuries = round(WithInjuries / Num * 100, 2))
## Percentage with at least one dead.
StormTB <- mutate(StormTB, Perc_with_Fatalities = round(WithDeads / Num * 100, 2))
## Final summary table for the analysis - Storm Harmful Events
StormHE <- StormTB %>% select(EVTYPE, Num, Fatalities, Fatalities_AVG, Perc_with_Fatalities, Injuries, Injuries_AVG, Perc_with_Injuries) %>% arrange(desc(Num))
In order to organize the monetary values it’s necesary to transform the exponents for property and crop damage for each level. So the exponents are transformed in numeric values before the calculation of the total value for damages.
## See the values found for PROPDMGEXP
unique(stormDP$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
## We took the information result to transform in numeric values the PROPDMGEXP
stormDP$PropExpN <- 0
stormDP$PropExpN[stormDP$PROPDMGEXP == ""] <- 1
stormDP$PropExpN[stormDP$PROPDMGEXP == "-"] <- 0
stormDP$PropExpN[stormDP$PROPDMGEXP == "?"] <- 0
stormDP$PropExpN[stormDP$PROPDMGEXP == "+"] <- 0
stormDP$PropExpN[stormDP$PROPDMGEXP == "0"] <- 1
stormDP$PropExpN[stormDP$PROPDMGEXP == "1"] <- 10
stormDP$PropExpN[stormDP$PROPDMGEXP == "2"] <- 100
stormDP$PropExpN[stormDP$PROPDMGEXP == "3"] <- 1000
stormDP$PropExpN[stormDP$PROPDMGEXP == "4"] <- 10000
stormDP$PropExpN[stormDP$PROPDMGEXP == "5"] <- 100000
stormDP$PropExpN[stormDP$PROPDMGEXP == "6"] <- 1000000
stormDP$PropExpN[stormDP$PROPDMGEXP == "7"] <- 10000000
stormDP$PropExpN[stormDP$PROPDMGEXP == "8"] <- 100000000
stormDP$PropExpN[stormDP$PROPDMGEXP == "B"] <- 1000000000
stormDP$PropExpN[stormDP$PROPDMGEXP == "h"] <- 100
stormDP$PropExpN[stormDP$PROPDMGEXP == "H"] <- 100
stormDP$PropExpN[stormDP$PROPDMGEXP == "K"] <- 1000
stormDP$PropExpN[stormDP$PROPDMGEXP == "m"] <- 1000000
stormDP$PropExpN[stormDP$PROPDMGEXP == "M"] <- 1000000
## Final numeric value for property damages.
stormDP$PropDMGN <- stormDP$PROPDMG*stormDP$PropExpN
## Let's take a peak into the values
summary(stormDP$PropDMGN)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 0.000e+00 0.000e+00 4.746e+05 5.000e+02 1.150e+11
## See the values found for CROPDMGEXP
unique(stormDP$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
## We took the information result to transform in numeric values the CROPDMGEXP
stormDP$CropExpN <- 0
stormDP$CropExpN[stormDP$CROPDMGEXP == ""] <- 1
stormDP$CropExpN[stormDP$CROPDMGEXP == "?"] <- 0
stormDP$CropExpN[stormDP$CROPDMGEXP == "0"] <- 1
stormDP$CropExpN[stormDP$CROPDMGEXP == "2"] <- 100
stormDP$CropExpN[stormDP$CROPDMGEXP == "B"] <- 1000000000
stormDP$CropExpN[stormDP$CROPDMGEXP == "k"] <- 1000
stormDP$CropExpN[stormDP$CROPDMGEXP == "K"] <- 1000
stormDP$CropExpN[stormDP$CROPDMGEXP == "m"] <- 1000000
stormDP$CropExpN[stormDP$CROPDMGEXP == "M"] <- 1000000
## Final numeric value for crop damages.
stormDP$CropDMGN <- stormDP$CROPDMG*stormDP$CropExpN
## Let's take a peak into the values
summary(stormDP$CropDMGN)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000e+00 0.000e+00 0.000e+00 5.442e+04 0.000e+00 5.000e+09
## Final summary table for the analysis - Storm Economics Consequences
StormEC <- stormDP %>% group_by(EVTYPE) %>% summarize(PropDam = round(sum(PropDMGN), 2), PropDam_AVG = round(mean(PropDMGN), 2), CropDam = sum(CropDMGN), CropDam_AVG = round(mean(CropDMGN), 2), TotalDamages = round(sum(PropDMGN) + sum(CropDMGN), 2), TotalDamages_AVG = round(mean(sum(PropDMGN) + sum(CropDMGN)), 2)) %>% arrange(desc(TotalDamages))
Let´see the Head and the graph for the events with the larger number of fatalities.
## Top 10 for FATALITIES Events
print(xtable(as.data.frame(StormHE %>% arrange(desc(Fatalities)))[1:10, ], auto = TRUE), type = "html")
| EVTYPE | Num | Fatalities | Fatalities_AVG | Perc_with_Fatalities | Injuries | Injuries_AVG | Perc_with_Injuries | |
|---|---|---|---|---|---|---|---|---|
| 1 | TORNADO | 60652 | 5633 | 0.09 | 2.64 | 91346 | 1.51 | 12.70 |
| 2 | EXCESSIVE HEAT | 1678 | 1903 | 1.13 | 34.39 | 6525 | 3.89 | 9.83 |
| 3 | FLASH FLOOD | 54277 | 978 | 0.02 | 1.17 | 1777 | 0.03 | 0.70 |
| 4 | HEAT | 767 | 937 | 1.22 | 23.21 | 2100 | 2.74 | 6.00 |
| 5 | LIGHTNING | 15754 | 816 | 0.05 | 4.82 | 5230 | 0.33 | 17.83 |
| 6 | TSTM WIND | 219940 | 504 | 0.00 | 0.18 | 6957 | 0.03 | 1.21 |
| 7 | FLOOD | 25326 | 470 | 0.02 | 1.18 | 6789 | 0.27 | 0.61 |
| 8 | RIP CURRENT | 470 | 368 | 0.78 | 69.57 | 232 | 0.49 | 25.32 |
| 9 | HIGH WIND | 20212 | 248 | 0.01 | 0.91 | 1137 | 0.06 | 2.03 |
| 10 | AVALANCHE | 386 | 224 | 0.58 | 45.08 | 170 | 0.44 | 27.72 |
# Top 10 Events
## Modification for the graph label
levels(StormHE$EVTYPE) <- gsub(" ", "\n", levels(StormHE$EVTYPE))
## Order the column decreasing view
StormHE$EVTYPE <- factor(StormHE$EVTYPE, levels = StormHE$EVTYPE[order(StormHE$Fatalities, decreasing = TRUE)])
## Graph - Storm Harmful Events
g <- ggplot(head(as.data.frame(StormHE), n = 8), aes(EVTYPE, Fatalities))
g + geom_bar(stat = 'identity') + labs(title = "Top 8 Fatalities events in USA", x = "Event Type", y = "Fatalities Count")
## Top 10 for INJURIES Events
print(xtable(as.data.frame(StormHE %>% arrange(desc(Injuries)))[1:10, ], auto = TRUE), type = "html")
| EVTYPE | Num | Fatalities | Fatalities_AVG | Perc_with_Fatalities | Injuries | Injuries_AVG | Perc_with_Injuries | |
|---|---|---|---|---|---|---|---|---|
| 1 | TORNADO | 60652 | 5633 | 0.09 | 2.64 | 91346 | 1.51 | 12.70 |
| 2 | TSTM WIND | 219940 | 504 | 0.00 | 0.18 | 6957 | 0.03 | 1.21 |
| 3 | FLOOD | 25326 | 470 | 0.02 | 1.18 | 6789 | 0.27 | 0.61 |
| 4 | EXCESSIVE HEAT | 1678 | 1903 | 1.13 | 34.39 | 6525 | 3.89 | 9.83 |
| 5 | LIGHTNING | 15754 | 816 | 0.05 | 4.82 | 5230 | 0.33 | 17.83 |
| 6 | HEAT | 767 | 937 | 1.22 | 23.21 | 2100 | 2.74 | 6.00 |
| 7 | ICE STORM | 2006 | 89 | 0.04 | 2.84 | 1975 | 0.98 | 3.09 |
| 8 | FLASH FLOOD | 54277 | 978 | 0.02 | 1.17 | 1777 | 0.03 | 0.70 |
| 9 | THUNDERSTORM WIND | 82563 | 133 | 0.00 | 0.13 | 1488 | 0.02 | 0.74 |
| 10 | HAIL | 288661 | 15 | 0.00 | 0.00 | 1361 | 0.00 | 0.10 |
# Top 10 Events
## Modification for the graph label
levels(StormHE$EVTYPE) <- gsub(" ", "\n", levels(StormHE$EVTYPE))
## Order the column decreasing view
StormHE$EVTYPE <- factor(StormHE$EVTYPE, levels = StormHE$EVTYPE[order(StormHE$Injuries, decreasing = TRUE)])
## Graph - Storm Injuries Events
j <- ggplot(head(as.data.frame(StormHE), n = 8), aes(EVTYPE, Injuries))
j + geom_bar(stat = 'identity') + labs(title = "Top 8 Injuries events in USA", x = "Event Type", y = "Injuries Count")
Analysis: The graphics shows us, that Tornados caused the maximun number of fatalities and injuries, and for this reason it’s the most harmful with respect to population health.
Let´see the Head and the graph for the events with the greatest economic consequences.
## Top 10 for Economic Consequences
print(xtable(as.data.frame(StormEC)[1:10, ], digits = 0, auto = TRUE), type = "html")
## Warning in storage.mode(x) <- "integer": NAs introduced by coercion to integer
## range
| EVTYPE | PropDam | PropDam_AVG | CropDam | CropDam_AVG | TotalDamages | TotalDamages_AVG | |
|---|---|---|---|---|---|---|---|
| 1 | FLOOD | 144657709807 | 5711826 | NA | 223563 | 150319678257 | 150319678257 |
| 2 | HURRICANE/TYPHOON | 69305840000 | 787566364 | NA | 29634918 | 71913712800 | 71913712800 |
| 3 | TORNADO | 56947380617 | 938920 | 414953270 | 6842 | 57362333887 | 57362333887 |
| 4 | STORM SURGE | 43323536000 | 165990559 | 5000 | 19 | 43323541000 | 43323541000 |
| 5 | HAIL | 15735267513 | 54511 | NA | 10483 | 18761221986 | 18761221986 |
| 6 | FLASH FLOOD | 16822673979 | 309941 | 1421317100 | 26186 | 18243991079 | 18243991079 |
| 7 | DROUGHT | 1046106000 | 420461 | NA | 5615983 | 15018672000 | 15018672000 |
| 8 | HURRICANE | 11868319010 | 68208730 | NA | 15758103 | 14610229010 | 14610229010 |
| 9 | RIVER FLOOD | 5118945500 | 29589280 | NA | 29072017 | 10148404500 | 10148404500 |
| 10 | ICE STORM | 3944927860 | 1966564 | NA | 2503546 | 8967041360 | 8967041360 |
# Top 10 Events
## Modification for the graph label
levels(StormEC$EVTYPE) <- gsub(" ", "\n", levels(StormEC$EVTYPE))
## Order the column decreasing view
StormEC$EVTYPE <- factor(StormEC$EVTYPE, levels = StormEC$EVTYPE[order(StormEC$TotalDamages, decreasing = TRUE)])
## Graph - Storm Harmful Events
h <- ggplot(head(as.data.frame(StormEC), n = 8), aes(EVTYPE, TotalDamages/1000000000))
h + geom_bar(stat = 'identity') + labs(title="Top 8 events causing Economic Consequences", x = "Event Type", y = "Cost in dollars")
Analysis: The graphics shows us, that Flood caused the maximun number of economic losses (150 billions of dollars), and for this reason this event causes the mosts losses and the major impact in economics consequences.