Synopsis

The goal of this report is to provide sufficient information to a government or a municipal manager who might be responsible for preparing for severe weather events and will need to prioritize resources for different types of events.

To that effect the report will only consider the last 10 years (2001-2011) as they will be the most representative of what can happen in the following years.

Data processing

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

First the data is downloaded.

    if (!file.exists("repdata-data-StormData.bz2")) {
        download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                      destfile = "repdata-data-StormData.bz2", method="curl")
    }

Then it is loaded from the “.bz2” file and cached.

    dfStorm <- read.csv(bzfile("repdata-data-StormData.bz2"), header=T, stringsAsFactors = F)

Columns are selected and formatted appropriately: we keep the BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP columns.

Data is filtered to keep the last 10 years (2001-2011) as they will be the most representative of what can happen in the following years.

    tbldfStorm <- select(tbl_df(dfStorm), BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, 
                         PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
                  mutate(BGN_DATE = mdy_hms(BGN_DATE)) %>% 
                  filter(BGN_DATE > ymd("2001-01-01"))

The PROPDMGEXP and CROPDMGEXP columns are transformed to numerical values in order to compute economical damages.

    map <- c("","B","K","M")
    value <- c(1,1000000000,1000,1000000)
    for (i in seq_along(map)) {
        tbldfStorm$PROPDMGEXP[which(tbldfStorm$PROPDMGEXP==map[i])] <- value[i]
        tbldfStorm$CROPDMGEXP[which(tbldfStorm$CROPDMGEXP==map[i])] <- value[i]
    }
    tbldfStorm <- mutate(tbldfStorm, PROPDMGEXP = as.numeric(PROPDMGEXP),
                                     CROPDMGEXP = as.numeric(CROPDMGEXP))

Finally some summaries are printed:

    summary(select(tbldfStorm, FATALITIES, INJURIES))
##    FATALITIES           INJURIES       
##  Min.   :  0.00000   Min.   :0.00e+00  
##  1st Qu.:  0.00000   1st Qu.:0.00e+00  
##  Median :  0.00000   Median :0.00e+00  
##  Mean   :  0.01129   Mean   :6.61e-02  
##  3rd Qu.:  0.00000   3rd Qu.:0.00e+00  
##  Max.   :158.00000   Max.   :1.15e+03
    summary(select(tbldfStorm, PROPDMGEXP, CROPDMGEXP))
##    PROPDMGEXP          CROPDMGEXP       
##  Min.   :0.000e+00   Min.   :1.000e+00  
##  1st Qu.:1.000e+00   1st Qu.:1.000e+00  
##  Median :1.000e+03   Median :1.000e+03  
##  Mean   :6.873e+04   Mean   :1.096e+04  
##  3rd Qu.:1.000e+03   3rd Qu.:1.000e+03  
##  Max.   :1.000e+09   Max.   :1.000e+09

Results

Question 1

Across the United States, which types of events (as indicated in the 𝙴𝚅𝚃𝚈𝙿𝙴 variable) are most harmful with respect to population health?

  • Data is grouped by event types
  • Fatalities and injuries are summed over events
  • Only the twenty most damaging events are retained
dfSum <- group_by(tbldfStorm, EVTYPE) %>% 
         summarise(Fatalities=sum(FATALITIES), Injuries=sum(INJURIES)) %>%
         filter(Fatalities+Injuries>0) %>%
         mutate(totalH=Fatalities+Injuries) %>%
         arrange(totalH)

dfPlot <- tail(select(dfSum, EVTYPE, Fatalities, Injuries), n=20)
dfPlot$EVTYPE <- factor(dfPlot$EVTYPE, levels=dfPlot$EVTYPE)

dfPlot <- melt(dfPlot,id.vars = "EVTYPE", 
               measure.vars = c("Fatalities","Injuries"),
               variable.name="Casualties")

The data is plotted in order to compare fatalities with injuries.

ggplot(dfPlot,aes(EVTYPE,value)) + 
    geom_bar(aes(fill=Casualties), stat = "identity") + 
    coord_flip() + 
    theme_bw() +
    ylab("Total injured or killed") +
    xlab("Wheather events") +
    ggtitle("Human casualties between 2001 and 2011") +
    theme(legend.position=c(0.7,0.7),
          legend.text = element_text(size=16),
          legend.title = element_blank())
Figure 1: Human casualties between 2001 and 2011 for the 20 most harmeful weather events

Figure 1: Human casualties between 2001 and 2011 for the 20 most harmeful weather events

From the plot we can see that Tornado and Excessive heat are the two most harmful events for both injuries and fatalities. After that some events cause more injuries or fatalities. As expected there are more injuries than fatalities.

Finally we check whether there is a tendance of increase during the last 10 years.

print(group_by(tbldfStorm, Year=year(tbldfStorm$BGN_DATE)) %>%
             summarise(Fatalities=sum(FATALITIES), Injuries=sum(INJURIES)) %>%
             mutate(Total=Fatalities+Injuries))
## Source: local data frame [11 x 4]
## 
##     Year Fatalities Injuries Total
##    (dbl)      (dbl)    (dbl) (dbl)
## 1   2001        469     2716  3185
## 2   2002        498     3155  3653
## 3   2003        443     2931  3374
## 4   2004        370     2426  2796
## 5   2005        469     1834  2303
## 6   2006        599     3368  3967
## 7   2007        421     2191  2612
## 8   2008        488     2703  3191
## 9   2009        333     1354  1687
## 10  2010        425     1855  2280
## 11  2011       1002     7792  8794

It does not seem that there is a tendance of increase except for year 2011.

Question 2:

Across the United States, which types of events have the greatest economic consequences?

  • Property and crop damages are converted into dollars using the “*EXP" columns
  • Data is grouped by event types
  • Damages are summed over events
  • Dollars are converted into “Billion units”
  • Only the twenty most damaging events are retained
dfSumEc <- mutate(tbldfStorm, totalPROP=PROPDMG*PROPDMGEXP,
                              totalCROP=CROPDMG*CROPDMGEXP) %>%
           select(EVTYPE,totalPROP,totalCROP) %>%
           group_by(EVTYPE) %>% 
           summarise(Properties=sum(totalPROP), Crop=sum(totalCROP)) %>%  
           filter(Properties+Crop>0) %>%
           mutate(Properties=Properties/1e9,Crop=Crop/1e9, total=Properties+Crop) %>%    
           arrange(total)

dfPlotEc <- tail(select(dfSumEc, EVTYPE, Properties, Crop), n=20)
dfPlotEc$EVTYPE <- factor(dfPlotEc$EVTYPE, levels=dfPlotEc$EVTYPE)

dfPlotEc <- melt(dfPlotEc,id.vars = "EVTYPE", 
               measure.vars = c("Properties","Crop"),
               variable.name="Type")

The data is plotted in order to compare properties and crop damages.

ggplot(dfPlotEc,aes(EVTYPE,value)) + 
    geom_bar(aes(fill=Type), stat = "identity") + 
    coord_flip() + 
    theme_bw() +
    ylab("Total damages in billion $") +
    xlab("Wheather events") +
    ggtitle("Damages in billion dollars between 2001 and 2011") +
    theme(legend.position=c(0.7,0.7),
          legend.text = element_text(size=16),
          legend.title = element_blank())
Figure 2: Damages in billion dollars between 2001 and 2011 for the 20 most harmful weather events

Figure 2: Damages in billion dollars between 2001 and 2011 for the 20 most harmful weather events

From the plot we can see that Flood and Hurricane/Typhoon are the most devastating economically. In general the crop damages cost are lower that properties damages except for the drought event.

Finally we check whether there is a tendance of increase during the last 10 years.

group_by(tbldfStorm, Year=year(tbldfStorm$BGN_DATE)) %>%
    summarise(Properties=sum(PROPDMG*PROPDMGEXP), Crop=sum(CROPDMG*CROPDMGEXP)) %>%
    mutate(Total=Properties+Crop) %>%
    mutate(Properties=format(Properties,big.mark="'"),
           Crop=format(Crop,big.mark="'"),
           Total=format(Total,big.mark="'"))
## Source: local data frame [11 x 4]
## 
##     Year      Properties          Crop           Total
##    (dbl)           (chr)         (chr)           (chr)
## 1   2001  10'026'988'670 1'780'588'100  11'807'576'770
## 2   2002   4'100'882'450 1'410'368'140   5'511'250'590
## 3   2003  10'254'548'240 1'143'070'350  11'397'618'590
## 4   2004  25'346'598'870 1'452'177'850  26'798'776'720
## 5   2005  96'789'791'170 4'035'202'300 100'824'993'470
## 6   2006 121'937'434'190 3'534'238'700 125'471'672'890
## 7   2007   5'788'934'160 1'691'152'000   7'480'086'160
## 8   2008  15'568'383'080 2'209'793'000  17'778'176'080
## 9   2009   5'227'204'130   522'220'000   5'749'424'130
## 10  2010   9'246'487'640 1'785'286'000  11'031'773'640
## 11  2011  20'888'981'960   666'742'000  21'555'723'960

We can see that the years 2005 and 2006 have had lots of damages. Year 2011 has also a high total value of damages.