Synopsis

Using the National Weather Service Storm Data we search for which type of storm event causes more economic and public health damages across the USA. Based on our findings, Tornadoes are the main cause of deaths and injuries. Floods are the main cause of property damage and droughts the main cause of crop damage. We also map how those consequences have impacted the states of the US between 1950 and 2011.

Data Processing

First, we are going to load the dplyr and ggplot2 libraries, to help manage the dataframes and to plot our results.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

Now, we will download the file to our previosly created data folder, and load it to the StormData variable.

if(!file.exists("./data/stormdata.csv")){
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",destfile="./data/stormdata.csv")
}  

StormData <- read.csv("./data/stormdata.csv")

To determine which events are most harmful, we will need to look at the variables FATALITIES, INJURIES, PROPDMGand CROPDMG. But, examining the National Weather Service Storm Data Documentation, we notice that the variables PROPDMGEXP and CROPDMGEXP are a kind of multiplier for their respective damage. So, we look the frequency of that variables

table(StormData$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5      6 
## 465934      1      8      5    216     25     13      4      4     28      4 
##      7      8      B      h      H      K      m      M 
##      5      1     40      1      6 424665      7  11330
table(StormData$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994

We need to discover how that works. Thank’s to Flying Disc, we came up with the function convertMultiplier to determine how each entry multiplies its variable.

convertMultiplier <- function(x){
  if (x == "B"){
    return(1000000000)
  } else if  (x=="M" | x=="m"){
    return(1000000)
  } else if (x=="K" | x=="k"){
    return(1000)
  } else if (x=="H" | x=="h"){
    return(100)
  } else if (x %in% as.character(c(0:8))){
    return(10)
  } else {
    return(1)
  }
}

Then, using that function, we will create the new variables: DamageProperty, DamageCrop and TotalDamage.

StormData$DamageProperty <- StormData$PROPDMG*as.numeric(lapply(StormData$PROPDMGEXP,FUN=convertMultiplier))
StormData$DamageCrop <- StormData$CROPDMG *as.numeric(lapply(StormData$CROPDMGEXP,FUN=convertMultiplier))
StormData$TotalDamage <- StormData$DamageCrop+StormData$DamageProperty

Exploratory Analisys

Now, we will create a new dataframe called StormDataEventsSummary to easily search for the events we need. We will group the StormData dataframe by event:

StormDataEventsSummary <- StormData %>% group_by(EVTYPE) %>%
  summarise(Fatalities=sum(FATALITIES),Injuries=sum(INJURIES),
            DamageProperty=sum(DamageProperty),DamageCrop=sum(DamageCrop),
            TotalDamage=sum(TotalDamage),events=length(EVTYPE),.groups="drop")
head(StormDataEventsSummary)
## # A tibble: 6 x 7
##   EVTYPE        Fatalities Injuries DamageProperty DamageCrop TotalDamage events
##   <chr>              <dbl>    <dbl>          <dbl>      <dbl>       <dbl>  <int>
## 1 "   HIGH SUR~          0        0         200000          0      200000      1
## 2 " COASTAL FL~          0        0              0          0           0      1
## 3 " FLASH FLOO~          0        0          50000          0       50000      1
## 4 " LIGHTNING"           0        0              0          0           0      1
## 5 " TSTM WIND"           0        0        8100000          0     8100000      4
## 6 " TSTM WIND ~          0        0           8000          0        8000      1

With that dataframe, we can search for what we need.

Events that cause more fatalities

With that, the five events that have more fatalities are:

top5Fatalities <- StormDataEventsSummary %>% arrange(desc(Fatalities)) %>% select(EVTYPE,Fatalities) %>% head(5)
head(top5Fatalities)
## # A tibble: 5 x 2
##   EVTYPE         Fatalities
##   <chr>               <dbl>
## 1 TORNADO              5633
## 2 EXCESSIVE HEAT       1903
## 3 FLASH FLOOD           978
## 4 HEAT                  937
## 5 LIGHTNING             816

The states that suffer the most from those events are

StormData %>% group_by(STATE) %>% filter(EVTYPE %in% top5Fatalities$EVTYPE)  %>%
  summarize(Fatalities=sum(FATALITIES),.groups='drop') %>% 
  arrange(desc(Fatalities)) %>% head(5)
## # A tibble: 5 x 2
##   STATE Fatalities
##   <chr>      <dbl>
## 1 IL          1215
## 2 TX          1067
## 3 MO           710
## 4 AL           684
## 5 PA           533

Illinois, Texas, Missouri, Alabama and Pennsylvania.

Events that cause more injuries

The same way we did before:

top5Injuries <- StormDataEventsSummary %>% arrange(desc(Injuries)) %>% select(EVTYPE,Injuries) %>% head(5)
head(top5Injuries)
## # A tibble: 5 x 2
##   EVTYPE         Injuries
##   <chr>             <dbl>
## 1 TORNADO           91346
## 2 TSTM WIND          6957
## 3 FLOOD              6789
## 4 EXCESSIVE HEAT     6525
## 5 LIGHTNING          5230

The states that suffer the most from those events are

StormData %>% group_by(STATE) %>% filter(EVTYPE %in% top5Injuries$EVTYPE)  %>%
  summarize(Injuries=sum(INJURIES),.groups='drop') %>% 
  arrange(desc(Injuries)) %>% head(5)
## # A tibble: 5 x 2
##   STATE Injuries
##   <chr>    <dbl>
## 1 TX       15290
## 2 AL        8442
## 3 MO        8174
## 4 MS        6504
## 5 AR        5390

Texas, Alabama, Missouri, Mississipi, Arkansas.

Events that cause more damage to properties

Now, for the events that have more damage to properties.

top5DamageProperty <- StormDataEventsSummary %>% arrange(desc(DamageProperty)) %>% select(EVTYPE,DamageProperty) %>% head(5)

head(top5DamageProperty)
## # A tibble: 5 x 2
##   EVTYPE            DamageProperty
##   <chr>                      <dbl>
## 1 FLOOD               144657709807
## 2 HURRICANE/TYPHOON    69305840000
## 3 TORNADO              56937162900
## 4 STORM SURGE          43323536000
## 5 FLASH FLOOD          16140815218

The states that suffer the most from those events are

StormData %>% group_by(STATE) %>% filter(EVTYPE %in% top5DamageProperty$EVTYPE )  %>%
  summarize(DamageProperty=sum(DamageProperty),.groups='drop') %>% 
  arrange(desc(DamageProperty)) %>% head(5)
## # A tibble: 5 x 2
##   STATE DamageProperty
##   <chr>          <dbl>
## 1 CA      117127356965
## 2 LA       54735277990
## 3 FL       31036822693
## 4 MS       28665469630
## 5 AL       11357170060

California, Louisiania, Florida, Mississipi, Alabama.

Events that cause more damage to crops

And finally, the events that cause more damage to crops.

top5DamageCrop <- StormDataEventsSummary %>% arrange(desc(DamageCrop)) %>% select(EVTYPE,DamageCrop) %>% head(5)

head(top5DamageCrop)
## # A tibble: 5 x 2
##   EVTYPE       DamageCrop
##   <chr>             <dbl>
## 1 DROUGHT     13972566000
## 2 FLOOD        5661968450
## 3 RIVER FLOOD  5029459000
## 4 ICE STORM    5022113500
## 5 HAIL         3025954653

The states that suffers the most from those events are

StormData %>% group_by(STATE) %>% filter(EVTYPE %in% top5DamageCrop$EVTYPE )  %>%
  summarize(DamageCrop=sum(DamageCrop),.groups='drop') %>% 
  arrange(desc(DamageCrop)) %>% head(5)
## # A tibble: 5 x 2
##   STATE DamageCrop
##   <chr>      <dbl>
## 1 TX    6915808600
## 2 IL    5330037600
## 3 MS    5016506000
## 4 IA    3893394450
## 5 NE    1490910650

Which are Texas, Illinois, Missouri, Iowa and Nebraska.

Results

Tornado is the main reason of both fatalities (5633 deaths) and injuries (9.134610^{4} injuries) in Storm Data. Now we will see in a state-level both variables. First, we need to tidy our dataframe to make type (Fatalitity, Injury) a factor variable for each state.

statesTornado <- StormData %>% group_by(STATE) %>% filter(EVTYPE=="TORNADO")  %>%   summarize(Fatalities=sum(FATALITIES),Injuries=sum(INJURIES),.groups='drop')
Fatalities <- statesTornado[,c(1,2)]
names(Fatalities) <- c("STATE","Quantity")
Injuries <- statesTornado[,c(1,3)]
names(Injuries) <- c("STATE","Quantity")
healthDamage <- rbind(Fatalities,Injuries)
healthDamage$type <- c(rep(c("Fatality","Injury"),each=52))

Now we plot both fatalities and injuries by state:

g <- ggplot(healthDamage,aes(STATE,Quantity,fill=type))
g+geom_bar(stat="identity", position=position_dodge())+
  scale_fill_manual(values=c('blue','orange'))+
  theme(axis.text.x = element_text(angle=90))+
  labs(x="State",y="Quantity",title="Injuries and Deaths caused by tornadoes in the US by State")
Deaths and injuries caused by tornadoes by state

Deaths and injuries caused by tornadoes by state

Flood is the main cause for property damage in the US. Filtering our dataset for flood as the type of event, we have:

statesFlood<- StormData %>% group_by(STATE) %>% filter(EVTYPE=="FLOOD")  %>%
  summarize(DamageProperty=sum(DamageProperty),.groups='drop')

And plotting a bar plot of property damage (in billions) by state:

g <- ggplot(statesFlood,aes(STATE,DamageProperty/1000000000))
g+geom_bar(stat="identity", position=position_dodge())+
  theme(axis.text.x = element_text(angle=90))+
  labs(x="State",y="Damage (in Billions of Dollars)",title="Property Damage caused by flood in the US by State")
Property Damage caused by floods by state

Property Damage caused by floods by state

Finally, droughts are the main cause of crop damage in the US. Filtering our dataset for flood as the type of event, we have:

statesDrought<- StormData %>% group_by(STATE) %>% filter(EVTYPE=="DROUGHT")  %>%
  summarize(DamageCrop=sum(DamageCrop),.groups='drop')

And plotting our result by state,

g <- ggplot(statesDrought,aes(STATE,DamageCrop/1000000))
g+geom_bar(stat="identity", position=position_dodge())+
  theme(axis.text.x = element_text(angle=90))+
  labs(x="State",y="Damage to crops (in millions)",title="Crop Damage caused by droughts in the US by State")
Crop Damage caused by droughts by state

Crop Damage caused by droughts by state