One of the most important and difficult tasks for goverment / municipal managers are related with the effects of wheather events and how to prioritize the resources to handle accordingly.
In order to provide a guide to them, in this report we will use the data of U.S. National Oceanic and Atmospheric Administration’s (NOAA) to evaluate the Economic and Population Impact (Fatalities and Injuries) of all the wheather events in the U.S. from 1950 and end in November 2011.
The Database could be download it from the following URL:
With this report We will answer two main questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
We will base our analysis in the following variables that are included in the NOAA Database:
The analysis will be made using R Programming Language, Rstudio to write the report and RPubs to publicate it.
In order to get the most harmful wheather event, we will create a Data Set that aggregate all the events by number of FATALITIES and INJURIES. After that we will make two plot’s of the top-10 events.
Regarding Economic Impact, we will create a Data Set that aggregate (sum) all the events by total amount of damage (property + crop). We will also make a plot of the top-10 events.
The Report will have two main sections:
We hope that this report could be used in order to create some strategy to act accordingly regarding the different wheather events.
This report will not have any specific recommendations.
Read the data into a Dataset
dt <- read.csv("repdata-data-StormData.csv.bz2", stringsAsFactors = FALSE)
colnames(dt)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
Let’s answer the two questions:
Create a new Data Set with only the EVTYPE and summarize the Injuries and Fatalities by Wheather Event (EVTYPE)
dt2 <- aggregate(cbind(INJURIES,FATALITIES)~EVTYPE,dt,sum, na.rm = TRUE )
colnames(dt2)
## [1] "EVTYPE" "INJURIES" "FATALITIES"
For information purposes, let’s create two new column named PERCENTAGE_INJURIES and PERCENTAGE_FATALITIES that calculate the percentage for each event
total_injuries <- sum(dt2$INJURIES)
dt2$PERCENTAGE_INJURIES <- as.numeric((dt2$INJURIES / total_injuries) * 100)
total_fatalities <- sum(dt2$FATALITIES)
dt2$PERCENTAGE_FATALITIES <- as.numeric((dt2$FATALITIES / total_fatalities) * 100)
We will make two plot, Top_10 for Fatalities and Top_10 for Injuries
Let’s get the top_10 values for fatalities, for it let’s order the dataset, choose the first 10 rows and add a final rows that include all the other values (other). Finally let’s change it to numeric format.
top10_fatalities <- dt2[with(dt2,order(-dt2$PERCENTAGE_FATALITIES)),]
l <- length(dt2$EVTYPE)
total_other_injuries <- sum(top10_fatalities$INJURIES[11:l])
total_other_fatalities <- sum(top10_fatalities$FATALITIES[11:l])
total_other_p_injuries <- sum(top10_fatalities$PERCENTAGE_INJURIES[11:l])
total_other_p_fatalities <- sum(top10_fatalities$PERCENTAGE_FATALITIES[11:l])
top10_fatalities <- top10_fatalities[1:10,]
top10_fatalities <- rbind(top10_fatalities,
c("Others",
total_other_injuries,
total_other_fatalities,
total_other_p_injuries,
total_other_p_fatalities))
top10_fatalities$INJURIES <-
as.numeric(top10_fatalities$INJURIES)
top10_fatalities$FATALITIES <-
as.numeric(top10_fatalities$FATALITIES)
top10_fatalities$PERCENTAGE_INJURIES <-
as.numeric(top10_fatalities$PERCENTAGE_INJURIES)
top10_fatalities$PERCENTAGE_FATALITIES <-
as.numeric(top10_fatalities$PERCENTAGE_FATALITIES)
options(digits = 2)
top10_fatalities
## EVTYPE INJURIES FATALITIES PERCENTAGE_INJURIES
## 834 TORNADO 91346 5633 65.00
## 130 EXCESSIVE HEAT 6525 1903 4.64
## 153 FLASH FLOOD 1777 978 1.26
## 275 HEAT 2100 937 1.49
## 464 LIGHTNING 5230 816 3.72
## 856 TSTM WIND 6957 504 4.95
## 170 FLOOD 6789 470 4.83
## 585 RIP CURRENT 232 368 0.17
## 359 HIGH WIND 1137 248 0.81
## 19 AVALANCHE 170 224 0.12
## 11 Others 18265 3064 13.00
## PERCENTAGE_FATALITIES
## 834 37.2
## 130 12.6
## 153 6.5
## 275 6.2
## 464 5.4
## 856 3.3
## 170 3.1
## 585 2.4
## 359 1.6
## 19 1.5
## 11 20.2
Let’s plot the result:
library(ggplot2)
g_top10_fatalities <-
ggplot(top10_fatalities,aes(x=top10_fatalities$EVTYPE,
y=top10_fatalities$FATALITIES,
ymax=max(top10_fatalities$FATALITIES)*1.05,
fill=factor(EVTYPE))) +
geom_bar(stat="identity") +
coord_flip() +
geom_text(aes(x=top10_fatalities$EVTYPE, y=top10_fatalities$FATALITIES,
label=format(top10_fatalities$FATALITIES,digits=3),size=1),
position = position_dodge(width=0.9)) +
ylab('Fatalities') +
xlab('Wheather Event') +
theme(legend.position="none") +
ggtitle("U.S. Total Fatalities per Wheather Event (Top 10)")
print(g_top10_fatalities)
Let’s get the top_10 values for injuries, for it let’s order the dataset, choose the first 10 rows and add a final rows that include all the other values (other). Finally let’s change it to numeric format.
top10_injuries <- dt2[with(dt2,order(-dt2$PERCENTAGE_INJURIES)),]
l <- length(dt2$EVTYPE)
total_other_injuries <- sum(top10_injuries$INJURIES[11:l])
total_other_fatalities <- sum(top10_injuries$FATALITIES[11:l])
total_other_p_injuries <- sum(top10_injuries$PERCENTAGE_INJURIES[11:l])
total_other_p_fatalities <- sum(top10_injuries$PERCENTAGE_FATALITIES[11:l])
top10_injuries <- top10_injuries[1:10,]
top10_injuries <- rbind(top10_injuries,
c("Others",
total_other_injuries,
total_other_fatalities,
total_other_p_injuries,
total_other_p_fatalities))
top10_injuries$INJURIES <-
as.numeric(top10_injuries$INJURIES)
top10_injuries$FATALITIES <-
as.numeric(top10_injuries$FATALITIES)
top10_injuries$PERCENTAGE_INJURIES <-
as.numeric(top10_injuries$PERCENTAGE_INJURIES)
top10_injuries$PERCENTAGE_FATALITIES <-
as.numeric(top10_injuries$PERCENTAGE_FATALITIES)
options(digits = 2)
top10_injuries
## EVTYPE INJURIES FATALITIES PERCENTAGE_INJURIES
## 834 TORNADO 91346 5633 65.00
## 856 TSTM WIND 6957 504 4.95
## 170 FLOOD 6789 470 4.83
## 130 EXCESSIVE HEAT 6525 1903 4.64
## 464 LIGHTNING 5230 816 3.72
## 275 HEAT 2100 937 1.49
## 427 ICE STORM 1975 89 1.41
## 153 FLASH FLOOD 1777 978 1.26
## 760 THUNDERSTORM WIND 1488 133 1.06
## 244 HAIL 1361 15 0.97
## 11 Others 14980 3667 10.66
## PERCENTAGE_FATALITIES
## 834 37.194
## 856 3.328
## 170 3.103
## 130 12.565
## 464 5.388
## 275 6.187
## 427 0.588
## 153 6.458
## 760 0.878
## 244 0.099
## 11 24.213
Let’s plot the result:
g_top10_injuries <-
ggplot(top10_injuries,aes(x=top10_injuries$EVTYPE,
y=top10_injuries$INJURIES,
fill=factor(EVTYPE))) +
geom_bar(stat="identity") +
coord_flip() +
geom_text(aes(x=top10_injuries$EVTYPE, y=top10_injuries$INJURIES,
ymax=max(top10_injuries$INJURIES)*1.05,
label=format(top10_injuries$INJURIES,digits=3),size=1),
position = position_dodge(width=0.9)) +
ylab('Injuries') +
xlab('Wheather Event') +
theme(legend.position="none") +
ggtitle("U.S. Total Injuries per Wheather Event (Top 10)")
print(g_top10_injuries)
Create a new Data Set with only four columns (EVTYPE,PROPDMG, PROPDMGEXP, CROPMDG, CROPDMGEXP). Create a new variable (PROPDMG_2 and CROPDMG_2) that consider the exponential value (k,M,b)
dt3 <- dt[c(8,25:28)]
dt3$PROPDMG_2 <- ifelse(
dt3$PROPDMGEXP %in% c("K","k"),
dt3$PROPDMG * 1000,
(ifelse( dt3$PROPDMGEXP %in% c("M","m"),
dt3$PROPDMG * 1000000,
(ifelse( dt3$PROPDMGEXP %in% c("B","b"),
dt3$PROPDMG * 1000000000,
dt3$PROPDMG)))))
dt3$CROPDMG_2 <- ifelse(
dt3$CROPDMGEXP %in% c("K","k"),
dt3$CROPDMG * 1000,
(ifelse( dt3$CROPDMGEXP %in% c("M","m"),
dt3$CROPDMG * 1000000,
(ifelse( dt3$CROPDMGEXP %in% c("B","b"),
dt3$CROPDMG * 1000000000,
dt3$CROPDMG)))))
Lets summarize the damage (property + crop) and create a new dataset
dt4 <- aggregate(PROPDMG_2+CROPDMG_2~EVTYPE,dt3,sum,na.rm = TRUE )
names(dt4) <- c("EVTYPE","ECONOMIC_DAMAGE")
For information purposes, let’s create a new column named PERCENTAGE_ECONOMIC_DAMAGE that calculate the percentage for each event
total_damage <- sum(dt4$ECONOMIC_DAMAGE)
dt4$PERCENTAGE_ECONOMIC_DAMAGE <-
as.numeric((dt4$ECONOMIC_DAMAGE / total_damage) * 100)
Let’s get the top_10 values, for it let’s order the dataset, choose the first 10 rows and add a final rows that include all the other values (other)
l <- length(dt4$EVTYPE)
dt5 <- dt4[with(dt4,order(-dt4$PERCENTAGE_ECONOMIC_DAMAGE)),]
top10_economic <- dt5[1:10,]
top10_economic <- rbind(top10_economic,
c("Others",
sum(dt5$ECONOMIC_DAMAGE[11:l]),
sum(dt5$PERCENTAGE_ECONOMIC_DAMAGE[11:l])))
top10_economic$ECONOMIC_DAMAGE <-
as.numeric(top10_economic$ECONOMIC_DAMAGE)
top10_economic$PERCENTAGE_ECONOMIC_DAMAGE <-
as.numeric(top10_economic$PERCENTAGE_ECONOMIC_DAMAGE)
options(digits = 2)
top10_economic
## EVTYPE ECONOMIC_DAMAGE PERCENTAGE_ECONOMIC_DAMAGE
## 170 FLOOD 1.5e+11 31.6
## 411 HURRICANE/TYPHOON 7.2e+10 15.1
## 834 TORNADO 5.7e+10 12.0
## 670 STORM SURGE 4.3e+10 9.1
## 244 HAIL 1.9e+10 3.9
## 153 FLASH FLOOD 1.8e+10 3.7
## 95 DROUGHT 1.5e+10 3.2
## 402 HURRICANE 1.5e+10 3.1
## 590 RIVER FLOOD 1.0e+10 2.1
## 427 ICE STORM 9.0e+09 1.9
## 11 Others 6.8e+10 14.4
Let’s plot it, we will change the scale to billions USD:
top10_economic$ECONOMIC_DAMAGE <-
as.numeric(top10_economic$ECONOMIC_DAMAGE) / 1000000000
g_top10_economic <-
ggplot(top10_economic,aes(x=top10_economic$EVTYPE,
y=top10_economic$ECONOMIC_DAMAGE,
fill=factor(EVTYPE))) +
geom_bar(stat="identity") +
coord_flip() +
geom_text(aes(x=top10_economic$EVTYPE,
y=top10_economic$ECONOMIC_DAMAGE,
ymax=max(top10_economic$ECONOMIC_DAMAGE)*1.05,
label=format(top10_economic$ECONOMIC_DAMAGE,digits=3),size=1),
position = position_dodge(width=0.9)) +
ylab('Economic Damage (Billion Dollars)') +
xlab('Wheather Event') +
theme(legend.position="none") +
ggtitle("U.S. Total Economic Damage per Wheather Event (Top 10)")
print(g_top10_economic)
Those are the results of the report:
| EVTYPE | INJURIES | FATALITIES | PERCENTAGE_INJURIES | PERCENTAGE_FATALITIES | |
|---|---|---|---|---|---|
| 834 | TORNADO | 91346 | 5633 | 65.00 | 37.2 |
| 130 | EXCESSIVE HEAT | 6525 | 1903 | 4.64 | 12.6 |
| 153 | FLASH FLOOD | 1777 | 978 | 1.26 | 6.5 |
| 275 | HEAT | 2100 | 937 | 1.49 | 6.2 |
| 464 | LIGHTNING | 5230 | 816 | 3.72 | 5.4 |
| 856 | TSTM WIND | 6957 | 504 | 4.95 | 3.3 |
| 170 | FLOOD | 6789 | 470 | 4.83 | 3.1 |
| 585 | RIP CURRENT | 232 | 368 | 0.17 | 2.4 |
| 359 | HIGH WIND | 1137 | 248 | 0.81 | 1.6 |
| 19 | AVALANCHE | 170 | 224 | 0.12 | 1.5 |
| 11 | Others | 18265 | 3064 | 13.00 | 20.2 |
| EVTYPE | INJURIES | FATALITIES | PERCENTAGE_INJURIES | PERCENTAGE_FATALITIES | |
|---|---|---|---|---|---|
| 834 | TORNADO | 91346 | 5633 | 65.00 | 37.19 |
| 856 | TSTM WIND | 6957 | 504 | 4.95 | 3.33 |
| 170 | FLOOD | 6789 | 470 | 4.83 | 3.10 |
| 130 | EXCESSIVE HEAT | 6525 | 1903 | 4.64 | 12.57 |
| 464 | LIGHTNING | 5230 | 816 | 3.72 | 5.39 |
| 275 | HEAT | 2100 | 937 | 1.49 | 6.19 |
| 427 | ICE STORM | 1975 | 89 | 1.41 | 0.59 |
| 153 | FLASH FLOOD | 1777 | 978 | 1.26 | 6.46 |
| 760 | THUNDERSTORM WIND | 1488 | 133 | 1.06 | 0.88 |
| 244 | HAIL | 1361 | 15 | 0.97 | 0.10 |
| 11 | Others | 14980 | 3667 | 10.66 | 24.21 |
| EVTYPE | ECONOMIC_DAMAGE | PERCENTAGE_ECONOMIC_DAMAGE | |
|---|---|---|---|
| 170 | FLOOD | 150 | 31.6 |
| 411 | HURRICANE/TYPHOON | 72 | 15.1 |
| 834 | TORNADO | 57 | 12.0 |
| 670 | STORM SURGE | 43 | 9.1 |
| 244 | HAIL | 19 | 3.9 |
| 153 | FLASH FLOOD | 18 | 3.7 |
| 95 | DROUGHT | 15 | 3.1 |
| 402 | HURRICANE | 15 | 3.1 |
| 590 | RIVER FLOOD | 10 | 2.1 |
| 427 | ICE STORM | 9 | 1.9 |
| 11 | Others | 68 | 14.4 |