This report is intended to show briefly most deadly nature event across the US and how these severe incidents have direct impact on people health (injuries and fatalities) and costs on properties damage. The main dataset to drive the analysis is taken from the National Weather Service Storm Data Documentation As advised in project instructions, more recent years should be considered more complete information provided on datasets of NOAA’s National Weather Service.
The dataset used to address the analysis is located on the page of National Weather Service under the following link DataSet
Dataset for files
setwd("/Users/elobo/HomeRstudio/Curso5/PeerAssessment2/Data")
file_names <- list.files()
file_names
## [1] "Stormdata_2005.csv" "Stormdata_2006.csv" "Stormdata_2007.csv"
## [4] "Stormdata_2008.csv" "Stormdata_2009.csv" "stormdata_2010.csv"
## [7] "stormdata_2011.csv" "stormdata_2012.csv" "stormdata_2013.csv"
# Loading info into one compiled dataset. Taking into account period 2005-2013 mainly.
out.file<-""
for(i in 1:9){
message("Looping through i value: ", i )
file <- read.csv(file_names[i], header=TRUE, stringsAsFactors=TRUE)
out.file <- rbind(out.file, file)
}
## Looping through i value: 1
## Looping through i value: 2
## Looping through i value: 3
## Looping through i value: 4
## Looping through i value: 5
## Looping through i value: 6
## Looping through i value: 7
## Looping through i value: 8
## Looping through i value: 9
# Loading needed libraries
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(lattice)
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
In order to prepare the dataset suitable for proper mnipulation, some transformation should be done:
## Needed Data Transformation
DataIn <- out.file #safe copy of original output file
# we need to convert target variable of death & injuries to integer
DataIn$INJURIES_DIRECT <- as.integer(DataIn$INJURIES_DIRECT)
DataIn$INJURIES_INDIRECT <- as.integer(DataIn$INJURIES_INDIRECT)
DataIn$DEATHS_DIRECT <- as.integer(DataIn$DEATHS_DIRECT)
DataIn$DEATHS_INDIRECT <- as.integer(DataIn$DEATHS_INDIRECT)
# get rid off of missing values
DataIn <- DataIn[complete.cases(DataIn$INJURIES_DIRECT),]
Then we check an overall conditions on main variable for basis calculation
# checking main variable for analisys basis
summary(DataIn$EVENT_TYPE)
str(DataIn$EVENT_TYPE)
sum(is.na(DataIn$EVENT_TYPE))
DataIn <- DataIn[!is.na(DataIn$EVENT_TYPE),]
subset(DataIn, EVENT_TYPE =="NA")
DataIn$YEAR <- as.factor(DataIn$YEAR)
sum(is.na(DataIn$YEAR))
## IMPACT ON FATALITIES & INJURIES BY YEAR
Deaths1 <- aggregate(DEATHS_DIRECT ~ YEAR, data=DataIn, sum)
Deaths2 <- aggregate(DEATHS_INDIRECT ~ YEAR, data=DataIn, sum)
TotalFatalities <- merge(Deaths1, Deaths2)
TotalFatalities <- TotalFatalities %>% mutate(TOTAL_FATALITIES = DEATHS_DIRECT + DEATHS_INDIRECT)
TotalFatalities[order(TotalFatalities$TOTAL_FATALITIES, decreasing = T),]
## YEAR DEATHS_DIRECT DEATHS_INDIRECT TOTAL_FATALITIES
## 7 2011 1096 240 1336
## 4 2008 572 255 827
## 8 2012 536 170 706
## 3 2007 515 190 705
## 6 2010 491 186 677
## 5 2009 373 185 558
## 2 2006 542 0 542
## 9 2013 391 94 485
## 1 2005 469 0 469
Deaths1 <- aggregate(INJURIES_DIRECT ~ YEAR, data=DataIn, sum)
Deaths2 <- aggregate(INJURIES_INDIRECT ~ YEAR, data=DataIn, sum)
TotalInjuries <- merge(Deaths1, Deaths2)
TotalInjuries <- TotalInjuries %>% mutate(TOTAL_INJURIES = INJURIES_DIRECT + INJURIES_INDIRECT)
TotalInjuries[order(TotalInjuries$TOTAL_INJURIES, decreasing = T),]
## YEAR INJURIES_DIRECT INJURIES_INDIRECT TOTAL_INJURIES
## 7 2011 8859 1039 9898
## 4 2008 2905 3615 6520
## 3 2007 3737 812 4549
## 8 2012 2652 633 3285
## 6 2010 2374 860 3234
## 2 2006 3195 0 3195
## 9 2013 2267 837 3104
## 5 2009 1876 961 2837
## 1 2005 1834 0 1834
## IMPACT ON FATALITIES/INJURIES BY EVENTTYPE & YEAR
Deaths1 <- aggregate(DEATHS_DIRECT ~ EVENT_TYPE + YEAR, data=DataIn, sum)
Deaths2 <- aggregate(DEATHS_INDIRECT ~ EVENT_TYPE + YEAR, data=DataIn, sum)
TotFatalbyType <- merge(Deaths1, Deaths2)
TotFatalbyType <- TotFatalbyType %>% mutate(TOTAL_FATALITIES = DEATHS_DIRECT + DEATHS_INDIRECT) %>% select(YEAR, EVENT_TYPE, TOTAL_FATALITIES )
sum1 <- TotFatalbyType[order(TotFatalbyType$TOTAL_FATALITIES, decreasing = T),]
head(sum1, 20)
## YEAR EVENT_TYPE TOTAL_FATALITIES
## 344 2011 Tornado 567
## 160 2006 Heat 252
## 159 2005 Heat 158
## 96 2011 Excessive Heat 131
## 341 2008 Tornado 131
## 401 2008 Winter Weather 105
## 97 2012 Excessive Heat 97
## 165 2011 Heat 88
## 202 2012 High Wind 87
## 340 2007 Tornado 81
## 403 2010 Winter Weather 79
## 111 2010 Flash Flood 75
## 95 2010 Excessive Heat 74
## 164 2010 Heat 73
## 108 2007 Flash Flood 71
## 112 2011 Flash Flood 71
## 345 2012 Tornado 71
## 166 2012 Heat 70
## 109 2008 Flash Flood 65
## 402 2009 Winter Weather 64
Deaths1 <- aggregate(INJURIES_DIRECT ~ EVENT_TYPE + YEAR, data=DataIn, sum)
Deaths2 <- aggregate(INJURIES_INDIRECT ~ EVENT_TYPE + YEAR, data=DataIn, sum)
TotInjurbyType <- merge(Deaths1, Deaths2)
TotInjurbyType <- TotInjurbyType %>% mutate(TOTAL_INJURIES = INJURIES_DIRECT + INJURIES_INDIRECT) %>% select(YEAR, EVENT_TYPE,TOTAL_INJURIES)
sum2 <- TotInjurbyType[order(TotInjurbyType$TOTAL_INJURIES, decreasing = T),]
head(sum2, 20)
## YEAR EVENT_TYPE TOTAL_INJURIES
## 344 2011 Tornado 5646
## 209 2008 Hurricane (Typhoon) 2413
## 341 2008 Tornado 1731
## 96 2011 Excessive Heat 1594
## 160 2006 Heat 1513
## 92 2007 Excessive Heat 1188
## 165 2011 Heat 936
## 339 2006 Tornado 881
## 98 2013 Excessive Heat 857
## 345 2012 Tornado 846
## 343 2010 Tornado 707
## 161 2007 Heat 698
## 340 2007 Tornado 667
## 401 2008 Winter Weather 666
## 167 2013 Heat 664
## 97 2012 Excessive Heat 638
## 402 2009 Winter Weather 589
## 338 2005 Tornado 537
## 166 2012 Heat 505
## 95 2010 Excessive Heat 494
Check which event had a maximun deadly rate by year
maxfatalities <- filter(sum1, TOTAL_FATALITIES %in% with(sum1, tapply(TOTAL_FATALITIES, YEAR, max)))
maxfatalities
## YEAR EVENT_TYPE TOTAL_FATALITIES
## 1 2011 Tornado 567
## 2 2006 Heat 252
## 3 2005 Heat 158
## 4 2011 Excessive Heat 131
## 5 2008 Tornado 131
## 6 2012 Excessive Heat 97
## 7 2007 Tornado 81
## 8 2010 Winter Weather 79
## 9 2009 Winter Weather 64
## 10 2013 Excessive Heat 61
Check which event had a maximun injuries rate by year
maxinjuries <- filter(sum2, TOTAL_INJURIES %in% with(sum2, tapply(TOTAL_INJURIES, YEAR, max)))
maxinjuries
## YEAR EVENT_TYPE TOTAL_INJURIES
## 1 2011 Tornado 5646
## 2 2008 Hurricane (Typhoon) 2413
## 3 2006 Heat 1513
## 4 2007 Excessive Heat 1188
## 5 2013 Excessive Heat 857
## 6 2012 Tornado 846
## 7 2010 Tornado 707
## 8 2009 Winter Weather 589
## 9 2005 Tornado 537
Taking a plot to check graphically the most impacting weather type of events on fatalities and injuries
p1 <- ggplot(maxfatalities, aes(YEAR, TOTAL_FATALITIES, fill=YEAR)) +
geom_bar(stat = "identity", alpha=1/2) + facet_grid( . ~ EVENT_TYPE) + theme_bw() + labs(x="Year", y="Fatalities [#]") +
ggtitle(expression("Most Impacting Climate Events across US [Period 2005-2013]")) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
p1
p2 <- ggplot(maxinjuries, aes(YEAR, TOTAL_INJURIES, fill=YEAR)) +
geom_bar(stat = "identity", alpha=1/2) + facet_grid( . ~ EVENT_TYPE) + theme_bw() + labs(x="Year", y="Injuries [#]") +
ggtitle(expression("Most Impacting Climate Events across US [Period 2005-2013]")) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
p2
Taking a look at two of the most severe events: Heat and Tornados
Deaths1 <- aggregate(DEATHS_DIRECT ~ STATE + EVENT_TYPE + YEAR, data=DataIn, sum)
Deaths2 <- aggregate(DEATHS_INDIRECT ~ STATE + EVENT_TYPE + YEAR, data=DataIn, sum)
FatalState <- merge(Deaths1, Deaths2)
FatalState <- FatalState %>% mutate(TOTAL_FATALITIES = DEATHS_DIRECT + DEATHS_INDIRECT) %>% select(STATE, YEAR, EVENT_TYPE, TOTAL_FATALITIES)
p3 <- ggplot(filter(FatalState,EVENT_TYPE %in% c("Heat", "Tornado")), aes(STATE, TOTAL_FATALITIES, fill=YEAR)) +
geom_bar(stat = "identity", alpha=1/2) + facet_grid( . ~ EVENT_TYPE) + theme_bw() + labs(x="Year", y="Fatalities [#]") +
ggtitle(expression("Most Impacting Climate Events across US [Period 2005-2013]")) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
p3
Deaths1 <- aggregate(INJURIES_DIRECT ~ STATE + EVENT_TYPE + YEAR, data=DataIn, sum)
Deaths2 <- aggregate(INJURIES_INDIRECT ~ STATE + EVENT_TYPE + YEAR, data=DataIn, sum)
InjuState <- merge(Deaths1, Deaths2)
InjuState <- InjuState %>% mutate(TOTAL_INJURIES = INJURIES_DIRECT + INJURIES_INDIRECT) %>% select(STATE, YEAR, EVENT_TYPE, TOTAL_INJURIES)
p4 <- ggplot(filter(InjuState,EVENT_TYPE %in% c("Heat", "Tornado")), aes(STATE, TOTAL_INJURIES, fill=YEAR)) +
geom_bar(stat = "identity", alpha=1/2) + facet_grid( . ~ EVENT_TYPE) + theme_bw() + labs(x="Year", y="Injuries [#]") +
ggtitle(expression("Most Impacting Climate Events across US [Period 2005-2013]")) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
#grid.arrange(p3, p4, ncol=2, nrow =1)
Take the cost of damage and set it into a equivalent scale, for this case “K” is the basis for calculations
DataIn$DAMAGE_PROPERTY <- as.character(DataIn$DAMAGE_PROPERTY)
value_B <- grep("B", DataIn$DAMAGE_PROPERTY)
for(i in 1:length(value_B)){
DataIn$DAMAGE_PROPERTY[value_B[i]] <- extract_numeric(DataIn$DAMAGE_PROPERTY[value_B[i]])*1000000
}
value_M <- grep("M", DataIn$DAMAGE_PROPERTY)
for(i in 1:length(value_M)){
DataIn$DAMAGE_PROPERTY[value_M[i]] <- extract_numeric(DataIn$DAMAGE_PROPERTY[value_M[i]])*1000
}
# For the rest of values wich remaing with "K" indictor
DataIn$DAMAGE_PROPERTY<- extract_numeric(DataIn$DAMAGE_PROPERTY)
DataIn2 <- DataIn[complete.cases(DataIn$DAMAGE_PROPERTY),]
#Property Damage by STATE in Billions
PDMG_STAT <- aggregate(DAMAGE_PROPERTY ~ STATE + YEAR, data=DataIn2, sum)
PDMG_STAT <- PDMG_STAT %>% mutate(DAMAGE_PROPERTY = DAMAGE_PROPERTY/1000000)
head(PDMG_STAT[order(PDMG_STAT$DAMAGE_PROPERTY, decreasing = T),],20)
## STATE YEAR DAMAGE_PROPERTY
## 26 LOUISIANA 2005 50.6773235
## 32 MISSISSIPPI 2005 21.7760894
## 498 NEW JERSEY 2012 12.8000585
## 428 MISSOURI 2011 3.3726548
## 391 ALABAMA 2011 2.4763409
## 13 FLORIDA 2005 1.9914335
## 328 ARIZONA 2010 1.9615733
## 381 TENNESSEE 2010 1.8582183
## 240 OHIO 2008 1.6601980
## 52 TEXAS 2005 1.6354822
## 249 TEXAS 2008 1.5366927
## 214 IOWA 2008 1.2748661
## 223 LOUISIANA 2008 1.2431781
## 435 NEW YORK 2011 1.1991875
## 262 ARKANSAS 2009 1.1161169
## 447 TEXAS 2011 1.0995390
## 529 COLORADO 2013 0.9703768
## 175 OKLAHOMA 2007 0.9154703
## 8 CALIFORNIA 2005 0.8848858
## 241 OKLAHOMA 2008 0.8344753
Checking Luisiana as the most affected in property damage
PDMG_STAT2 <- aggregate(DAMAGE_PROPERTY ~ EVENT_TYPE +STATE + YEAR, data=DataIn2, sum)
PDMG_STAT2 <- PDMG_STAT2 %>% filter(STATE=="LOUISIANA") %>% mutate(DAMAGE_PROPERTY = DAMAGE_PROPERTY /1000) # In Million Dollars
head(PDMG_STAT2[order(PDMG_STAT2$DAMAGE_PROPERTY, decreasing = T), ],20)
## EVENT_TYPE STATE YEAR DAMAGE_PROPERTY
## 5 Storm Surge/Tide LOUISIANA 2005 30637.71100
## 3 Hurricane (Typhoon) LOUISIANA 2005 19978.58600
## 40 Hurricane (Typhoon) LOUISIANA 2008 518.92250
## 43 Storm Surge/Tide LOUISIANA 2008 493.85500
## 123 Storm Surge/Tide LOUISIANA 2012 493.50000
## 132 Flood LOUISIANA 2013 193.06000
## 114 Flash Flood LOUISIANA 2012 175.99700
## 34 Flash Flood LOUISIANA 2008 123.98700
## 121 Hurricane (Typhoon) LOUISIANA 2012 117.34050
## 47 Tropical Storm LOUISIANA 2008 83.03100
## 131 Flash Flood LOUISIANA 2013 63.16100
## 8 Tropical Storm LOUISIANA 2005 48.00000
## 106 Tornado LOUISIANA 2011 37.42720
## 84 Tornado LOUISIANA 2010 36.62700
## 65 Tornado LOUISIANA 2009 33.66125
## 94 Flood LOUISIANA 2011 24.70000
## 54 Flood LOUISIANA 2009 24.57300
## 64 Thunderstorm Wind LOUISIANA 2009 16.54330
## 46 Tornado LOUISIANA 2008 13.76330
## 126 Tornado LOUISIANA 2012 12.05400
On the other hand, the weather conditions that most cause Injuries on population are Tornado & Hurricane
This behavior can also be seen accross each state, where Alabama seems to be one most impacted by Tornados. On the contrary, the state of Texas shows the most fatalities number because of Heat