Synopsis

This report is intended to show briefly most deadly nature event across the US and how these severe incidents have direct impact on people health (injuries and fatalities) and costs on properties damage. The main dataset to drive the analysis is taken from the National Weather Service Storm Data Documentation As advised in project instructions, more recent years should be considered more complete information provided on datasets of NOAA’s National Weather Service.

Loading and Processing the Raw Data

The dataset used to address the analysis is located on the page of National Weather Service under the following link DataSet

Dataset for files

        setwd("/Users/elobo/HomeRstudio/Curso5/PeerAssessment2/Data")
        file_names <- list.files()
        file_names
## [1] "Stormdata_2005.csv" "Stormdata_2006.csv" "Stormdata_2007.csv"
## [4] "Stormdata_2008.csv" "Stormdata_2009.csv" "stormdata_2010.csv"
## [7] "stormdata_2011.csv" "stormdata_2012.csv" "stormdata_2013.csv"
        # Loading info into one compiled dataset. Taking into account period 2005-2013 mainly.      
        out.file<-""
        for(i in 1:9){
                message("Looping through i value: ", i )
                file <- read.csv(file_names[i], header=TRUE, stringsAsFactors=TRUE)
                out.file <- rbind(out.file, file)
        }
## Looping through i value: 1
## Looping through i value: 2
## Looping through i value: 3
## Looping through i value: 4
## Looping through i value: 5
## Looping through i value: 6
## Looping through i value: 7
## Looping through i value: 8
## Looping through i value: 9
        # Loading needed libraries
        library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
        library(tidyr)
        library(ggplot2)
        library(lattice)
        library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine

Processing the Raw Data

In order to prepare the dataset suitable for proper mnipulation, some transformation should be done:

## Needed Data Transformation
        DataIn <- out.file #safe copy of original output file
        
        # we need to convert target variable of death & injuries to integer
        DataIn$INJURIES_DIRECT  <- as.integer(DataIn$INJURIES_DIRECT)
        DataIn$INJURIES_INDIRECT <- as.integer(DataIn$INJURIES_INDIRECT)
        DataIn$DEATHS_DIRECT    <- as.integer(DataIn$DEATHS_DIRECT)
        DataIn$DEATHS_INDIRECT  <- as.integer(DataIn$DEATHS_INDIRECT)
        
        # get rid off of missing values
        DataIn <- DataIn[complete.cases(DataIn$INJURIES_DIRECT),]

Then we check an overall conditions on main variable for basis calculation

        # checking main variable for analisys basis
        summary(DataIn$EVENT_TYPE)
        str(DataIn$EVENT_TYPE)
        sum(is.na(DataIn$EVENT_TYPE))
        DataIn <- DataIn[!is.na(DataIn$EVENT_TYPE),]
        subset(DataIn, EVENT_TYPE =="NA")
        
        
        DataIn$YEAR <- as.factor(DataIn$YEAR)
        sum(is.na(DataIn$YEAR))

Results

1. Across the United States, which types of events (as indicated in the 𝙴𝚅𝚃𝚈𝙿𝙴 variable) are most harmful with respect to population health?

        ## IMPACT ON FATALITIES & INJURIES BY YEAR

        Deaths1 <- aggregate(DEATHS_DIRECT ~ YEAR, data=DataIn, sum)
        Deaths2 <- aggregate(DEATHS_INDIRECT ~ YEAR, data=DataIn, sum)
        TotalFatalities <- merge(Deaths1, Deaths2)
        TotalFatalities <- TotalFatalities %>% mutate(TOTAL_FATALITIES = DEATHS_DIRECT + DEATHS_INDIRECT)
        TotalFatalities[order(TotalFatalities$TOTAL_FATALITIES, decreasing = T),]
##   YEAR DEATHS_DIRECT DEATHS_INDIRECT TOTAL_FATALITIES
## 7 2011          1096             240             1336
## 4 2008           572             255              827
## 8 2012           536             170              706
## 3 2007           515             190              705
## 6 2010           491             186              677
## 5 2009           373             185              558
## 2 2006           542               0              542
## 9 2013           391              94              485
## 1 2005           469               0              469
        Deaths1 <- aggregate(INJURIES_DIRECT ~ YEAR, data=DataIn, sum)
        Deaths2 <- aggregate(INJURIES_INDIRECT ~ YEAR, data=DataIn, sum)
        TotalInjuries <- merge(Deaths1, Deaths2)
        TotalInjuries <- TotalInjuries %>% mutate(TOTAL_INJURIES = INJURIES_DIRECT + INJURIES_INDIRECT)
        TotalInjuries[order(TotalInjuries$TOTAL_INJURIES, decreasing = T),]
##   YEAR INJURIES_DIRECT INJURIES_INDIRECT TOTAL_INJURIES
## 7 2011            8859              1039           9898
## 4 2008            2905              3615           6520
## 3 2007            3737               812           4549
## 8 2012            2652               633           3285
## 6 2010            2374               860           3234
## 2 2006            3195                 0           3195
## 9 2013            2267               837           3104
## 5 2009            1876               961           2837
## 1 2005            1834                 0           1834
        ## IMPACT ON FATALITIES/INJURIES BY EVENTTYPE & YEAR
        
        Deaths1 <- aggregate(DEATHS_DIRECT ~ EVENT_TYPE + YEAR, data=DataIn, sum)
        Deaths2 <- aggregate(DEATHS_INDIRECT ~ EVENT_TYPE + YEAR, data=DataIn, sum)
        TotFatalbyType <- merge(Deaths1, Deaths2)
        TotFatalbyType <- TotFatalbyType %>% mutate(TOTAL_FATALITIES = DEATHS_DIRECT + DEATHS_INDIRECT) %>% select(YEAR, EVENT_TYPE, TOTAL_FATALITIES )
        sum1 <- TotFatalbyType[order(TotFatalbyType$TOTAL_FATALITIES, decreasing = T),]
        head(sum1, 20)
##     YEAR     EVENT_TYPE TOTAL_FATALITIES
## 344 2011        Tornado              567
## 160 2006           Heat              252
## 159 2005           Heat              158
## 96  2011 Excessive Heat              131
## 341 2008        Tornado              131
## 401 2008 Winter Weather              105
## 97  2012 Excessive Heat               97
## 165 2011           Heat               88
## 202 2012      High Wind               87
## 340 2007        Tornado               81
## 403 2010 Winter Weather               79
## 111 2010    Flash Flood               75
## 95  2010 Excessive Heat               74
## 164 2010           Heat               73
## 108 2007    Flash Flood               71
## 112 2011    Flash Flood               71
## 345 2012        Tornado               71
## 166 2012           Heat               70
## 109 2008    Flash Flood               65
## 402 2009 Winter Weather               64
        Deaths1 <- aggregate(INJURIES_DIRECT ~ EVENT_TYPE + YEAR, data=DataIn, sum)
        Deaths2 <- aggregate(INJURIES_INDIRECT ~ EVENT_TYPE + YEAR, data=DataIn, sum)
        TotInjurbyType <- merge(Deaths1, Deaths2)
        TotInjurbyType <- TotInjurbyType %>% mutate(TOTAL_INJURIES = INJURIES_DIRECT + INJURIES_INDIRECT) %>% select(YEAR, EVENT_TYPE,TOTAL_INJURIES)
        sum2 <- TotInjurbyType[order(TotInjurbyType$TOTAL_INJURIES, decreasing = T),]
        head(sum2, 20)
##     YEAR          EVENT_TYPE TOTAL_INJURIES
## 344 2011             Tornado           5646
## 209 2008 Hurricane (Typhoon)           2413
## 341 2008             Tornado           1731
## 96  2011      Excessive Heat           1594
## 160 2006                Heat           1513
## 92  2007      Excessive Heat           1188
## 165 2011                Heat            936
## 339 2006             Tornado            881
## 98  2013      Excessive Heat            857
## 345 2012             Tornado            846
## 343 2010             Tornado            707
## 161 2007                Heat            698
## 340 2007             Tornado            667
## 401 2008      Winter Weather            666
## 167 2013                Heat            664
## 97  2012      Excessive Heat            638
## 402 2009      Winter Weather            589
## 338 2005             Tornado            537
## 166 2012                Heat            505
## 95  2010      Excessive Heat            494

Check which event had a maximun deadly rate by year

        maxfatalities <- filter(sum1, TOTAL_FATALITIES %in% with(sum1, tapply(TOTAL_FATALITIES, YEAR, max)))
        maxfatalities
##    YEAR     EVENT_TYPE TOTAL_FATALITIES
## 1  2011        Tornado              567
## 2  2006           Heat              252
## 3  2005           Heat              158
## 4  2011 Excessive Heat              131
## 5  2008        Tornado              131
## 6  2012 Excessive Heat               97
## 7  2007        Tornado               81
## 8  2010 Winter Weather               79
## 9  2009 Winter Weather               64
## 10 2013 Excessive Heat               61

Check which event had a maximun injuries rate by year

        maxinjuries <- filter(sum2, TOTAL_INJURIES %in% with(sum2, tapply(TOTAL_INJURIES, YEAR, max)))
        maxinjuries
##   YEAR          EVENT_TYPE TOTAL_INJURIES
## 1 2011             Tornado           5646
## 2 2008 Hurricane (Typhoon)           2413
## 3 2006                Heat           1513
## 4 2007      Excessive Heat           1188
## 5 2013      Excessive Heat            857
## 6 2012             Tornado            846
## 7 2010             Tornado            707
## 8 2009      Winter Weather            589
## 9 2005             Tornado            537

Taking a plot to check graphically the most impacting weather type of events on fatalities and injuries

       p1 <- ggplot(maxfatalities, aes(YEAR, TOTAL_FATALITIES, fill=YEAR)) +
                geom_bar(stat = "identity", alpha=1/2) + facet_grid( . ~ EVENT_TYPE) + theme_bw()  + labs(x="Year", y="Fatalities [#]") +
                ggtitle(expression("Most Impacting Climate Events across US [Period 2005-2013]")) +
                theme(axis.text.x = element_text(angle = 90, hjust = 1)) 
      

        p1

        p2 <- ggplot(maxinjuries, aes(YEAR, TOTAL_INJURIES, fill=YEAR)) +
                geom_bar(stat = "identity", alpha=1/2) + facet_grid( . ~ EVENT_TYPE) + theme_bw()  + labs(x="Year", y="Injuries [#]") +
                ggtitle(expression("Most Impacting Climate Events across US [Period 2005-2013]")) +
                theme(axis.text.x = element_text(angle = 90, hjust = 1))
   

        p2

Taking a look at two of the most severe events: Heat and Tornados

        Deaths1 <- aggregate(DEATHS_DIRECT ~ STATE + EVENT_TYPE + YEAR, data=DataIn, sum)
        Deaths2 <- aggregate(DEATHS_INDIRECT ~ STATE + EVENT_TYPE + YEAR, data=DataIn, sum)
        FatalState <- merge(Deaths1, Deaths2)
        FatalState <- FatalState %>% mutate(TOTAL_FATALITIES = DEATHS_DIRECT + DEATHS_INDIRECT) %>% select(STATE, YEAR, EVENT_TYPE, TOTAL_FATALITIES)

        p3 <- ggplot(filter(FatalState,EVENT_TYPE %in% c("Heat", "Tornado")), aes(STATE, TOTAL_FATALITIES, fill=YEAR)) +
                geom_bar(stat = "identity", alpha=1/2) + facet_grid( . ~ EVENT_TYPE) + theme_bw()  + labs(x="Year", y="Fatalities [#]") +
                ggtitle(expression("Most Impacting Climate Events across US [Period 2005-2013]")) +
                theme(axis.text.x = element_text(angle = 90, hjust = 1))

        p3

        Deaths1 <- aggregate(INJURIES_DIRECT ~ STATE + EVENT_TYPE + YEAR, data=DataIn, sum)
        Deaths2 <- aggregate(INJURIES_INDIRECT ~ STATE + EVENT_TYPE + YEAR, data=DataIn, sum)
        InjuState <- merge(Deaths1, Deaths2)
        InjuState <- InjuState %>% mutate(TOTAL_INJURIES = INJURIES_DIRECT + INJURIES_INDIRECT) %>% select(STATE, YEAR, EVENT_TYPE, TOTAL_INJURIES)
        
        p4 <- ggplot(filter(InjuState,EVENT_TYPE %in% c("Heat", "Tornado")), aes(STATE, TOTAL_INJURIES, fill=YEAR)) +
                geom_bar(stat = "identity", alpha=1/2) + facet_grid( . ~ EVENT_TYPE) + theme_bw()  + labs(x="Year", y="Injuries [#]") +
                ggtitle(expression("Most Impacting Climate Events across US [Period 2005-2013]")) +
                theme(axis.text.x = element_text(angle = 90, hjust = 1))
        
        #grid.arrange(p3, p4, ncol=2, nrow =1)

2. Across the United States, which types of events have the greatest economic consequences?

Take the cost of damage and set it into a equivalent scale, for this case “K” is the basis for calculations

  DataIn$DAMAGE_PROPERTY <- as.character(DataIn$DAMAGE_PROPERTY)
                
  value_B <- grep("B", DataIn$DAMAGE_PROPERTY)
        for(i in 1:length(value_B)){
                DataIn$DAMAGE_PROPERTY[value_B[i]] <- extract_numeric(DataIn$DAMAGE_PROPERTY[value_B[i]])*1000000
        }
                        
                        
  value_M <- grep("M", DataIn$DAMAGE_PROPERTY)
        for(i in 1:length(value_M)){
                DataIn$DAMAGE_PROPERTY[value_M[i]] <- extract_numeric(DataIn$DAMAGE_PROPERTY[value_M[i]])*1000
        }     

  # For the rest of values wich remaing with "K" indictor
  DataIn$DAMAGE_PROPERTY<- extract_numeric(DataIn$DAMAGE_PROPERTY)
        
  DataIn2 <- DataIn[complete.cases(DataIn$DAMAGE_PROPERTY),]
                        
  #Property Damage by STATE  in Billions              
   PDMG_STAT <- aggregate(DAMAGE_PROPERTY ~ STATE + YEAR, data=DataIn2, sum)
   PDMG_STAT <- PDMG_STAT %>% mutate(DAMAGE_PROPERTY = DAMAGE_PROPERTY/1000000)
        
  head(PDMG_STAT[order(PDMG_STAT$DAMAGE_PROPERTY, decreasing = T),],20)
##           STATE YEAR DAMAGE_PROPERTY
## 26    LOUISIANA 2005      50.6773235
## 32  MISSISSIPPI 2005      21.7760894
## 498  NEW JERSEY 2012      12.8000585
## 428    MISSOURI 2011       3.3726548
## 391     ALABAMA 2011       2.4763409
## 13      FLORIDA 2005       1.9914335
## 328     ARIZONA 2010       1.9615733
## 381   TENNESSEE 2010       1.8582183
## 240        OHIO 2008       1.6601980
## 52        TEXAS 2005       1.6354822
## 249       TEXAS 2008       1.5366927
## 214        IOWA 2008       1.2748661
## 223   LOUISIANA 2008       1.2431781
## 435    NEW YORK 2011       1.1991875
## 262    ARKANSAS 2009       1.1161169
## 447       TEXAS 2011       1.0995390
## 529    COLORADO 2013       0.9703768
## 175    OKLAHOMA 2007       0.9154703
## 8    CALIFORNIA 2005       0.8848858
## 241    OKLAHOMA 2008       0.8344753

Checking Luisiana as the most affected in property damage

        PDMG_STAT2 <- aggregate(DAMAGE_PROPERTY ~ EVENT_TYPE +STATE + YEAR, data=DataIn2, sum)
        PDMG_STAT2 <- PDMG_STAT2 %>% filter(STATE=="LOUISIANA") %>% mutate(DAMAGE_PROPERTY = DAMAGE_PROPERTY /1000) # In Million Dollars
        head(PDMG_STAT2[order(PDMG_STAT2$DAMAGE_PROPERTY, decreasing = T), ],20)
##              EVENT_TYPE     STATE YEAR DAMAGE_PROPERTY
## 5      Storm Surge/Tide LOUISIANA 2005     30637.71100
## 3   Hurricane (Typhoon) LOUISIANA 2005     19978.58600
## 40  Hurricane (Typhoon) LOUISIANA 2008       518.92250
## 43     Storm Surge/Tide LOUISIANA 2008       493.85500
## 123    Storm Surge/Tide LOUISIANA 2012       493.50000
## 132               Flood LOUISIANA 2013       193.06000
## 114         Flash Flood LOUISIANA 2012       175.99700
## 34          Flash Flood LOUISIANA 2008       123.98700
## 121 Hurricane (Typhoon) LOUISIANA 2012       117.34050
## 47       Tropical Storm LOUISIANA 2008        83.03100
## 131         Flash Flood LOUISIANA 2013        63.16100
## 8        Tropical Storm LOUISIANA 2005        48.00000
## 106             Tornado LOUISIANA 2011        37.42720
## 84              Tornado LOUISIANA 2010        36.62700
## 65              Tornado LOUISIANA 2009        33.66125
## 94                Flood LOUISIANA 2011        24.70000
## 54                Flood LOUISIANA 2009        24.57300
## 64    Thunderstorm Wind LOUISIANA 2009        16.54330
## 46              Tornado LOUISIANA 2008        13.76330
## 126             Tornado LOUISIANA 2012        12.05400

Conclusions

  1. For the question about which are the event that have big impact on population health, we can say that Fatalities, leaving are most affected by Tornados and Heat, having a really big impact on year 2011.

On the other hand, the weather conditions that most cause Injuries on population are Tornado & Hurricane
This behavior can also be seen accross each state, where Alabama seems to be one most impacted by Tornados. On the contrary, the state of Texas shows the most fatalities number because of Heat

  1. Respect to economics reason, in therms of most damage to properties we found state Luisiana and Missisipe, as most state affected. The climate event more contributing on this effect was “Storm Surge/Tide”" and “Hurricane” for Luisiana,