Synopsis

Weather events can be hazardous for the human life. They also affect the public health and cause considerable damage to properties and crops each year. As a result the US government spends a lot of resources each year. In this analysis we use the ‘Storm Data’ released by the National Climatic Data Center. We examine which are the most fatal weather events and which cause the most injuries. Also, we compare the damage caused to properties and crops. Finally, we give two separate plots for the damage caused to crops and properties.

Data Processing

The libraries that we will use in this analysis are

library(lubridate)
library(plyr)
library(dplyr)
library(lattice)

We set the proper working directory

setwd("D:/Data Science/Coursera/Reprod research/Final Assignment")

and we create the file to download the data

  if(!file.exists("./data")){
                dir.create("./data")
        }

Then we download the data file from the given URL

fileUrl <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

download.file(fileUrl, destfile = "./data/data.csv.bz2")

The “bz2” file is compressed in a way that R can directly read it as a “csv” file. We will read only the data of the relevant columns.

relcol <- c ("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")

Data <- read.csv("./data/data.csv.bz2")[ ,relcol] 

The columns describe data as follows:

  1. BGN_DATE is the time of the event

  2. EVTYPE column is a description of the type of the event.

  3. FATALITIES column is a count of the fatal effect of an event on human life

  4. INJURIES column is a count of injuries caused by the event

  5. PROPDMG column is a count of the damage caused on properties by the event

  6. PROPDMGEXP column is an orders of magnitude count for the damage caused on properties by the event

  7. CROPDMG column is a count of the damage caused on crops by the event

  8. CROPDMGEXP column is an orders of magnitude count for the damage caused on crops by the event

Using the lubridate package we make the first column as date and we keep only the year of the event

Data$BGN_DATE <- year(mdy_hms(Data$BGN_DATE))

Also, the EXP factors need some manipulation. In particular, we make the following ‘revalues’ for the two ‘EXP’ columns (" “=0,”-“=0,”?“=0,”+“=0,”B“=9,”h“=2,”H“=2,”k“=6,”K“=3,”m“=6,”M“=6). Using the ‘dplyr’ package we make the adjustments

Data$PROPDMGEXP <- revalue(Data$PROPDMGEXP,c(" " =0,  "-"=0, "?"=0, "+"=0, "B"=9, "h"=2, "H"=2, "K"=3, "m"=6, "M"=6))
Data$CROPDMGEXP <- revalue(Data$CROPDMGEXP,c(" "=0,  "-"=0, "?"=0, "+"=0, "B"=9, "h"=2, "H"=2,"k"=3, "K"=3, "m"=6, "M"=6))

After this we can compute the actual values of the damage on properties and crops respectively

Data1 <- mutate(Data, Property_Damage = as.numeric(PROPDMG) * 10^as.numeric(PROPDMGEXP))[,c(-5,-6)]

Data2 <- mutate(Data1, Crops_Damage = as.numeric(CROPDMG) * 10^as.numeric(CROPDMGEXP))[,c(-5,-6)]

rm(Data, Data1)
head(Data2)
##   BGN_DATE  EVTYPE FATALITIES INJURIES Property_Damage Crops_Damage
## 1     1950 TORNADO          0       15         2500000            0
## 2     1950 TORNADO          0        0          250000            0
## 3     1951 TORNADO          0        2         2500000            0
## 4     1951 TORNADO          0        2          250000            0
## 5     1951 TORNADO          0        2          250000            0
## 6     1951 TORNADO          0        6          250000            0

The number of events reported significantly varies over the years. For that reason we expect that during the 50’s for example, tornadoes may have been reported more systematically than thunderstorms. On the other hand, we need a time period that is more than a decade which is a typical time interval for weather oscillations. For that reason we set the threshold of counts of events to be of order 10^4.

which.min(which(table(Data2$BGN_DATE) >10^4))
## 1989 
##    1

Thus, we will take into account observations from 1989 and after. The sub setting is done as follows

Data2 <- Data2[Data2$BGN_DATE>=1989,]

Finally, the data set that we will use consists of the sum of each variable for each event type

 Data_sum <- aggregate(Data2[,-c(1,2)], by=list(Event = Data2$EVTYPE),  sum)

Results

In the following two subsections we will investigate the effect of weather events on human life and on the economy respectively.

Fatalities and Injuries

From the last data set, we can easily inspect the top ten lethal weather events for the human life, for the time period 1989-2011.

head(Data_sum[order(-Data_sum$FATALITIES),],10)
##              Event FATALITIES INJURIES Property_Damage Crops_Damage
## 130 EXCESSIVE HEAT       1903     6525    7.753700e+08    492600000
## 834        TORNADO       1802    27944    3.222879e+12  10269737000
## 153    FLASH FLOOD        978     1777    1.682267e+12  19039070000
## 275           HEAT        937     2100    1.797000e+08     66954000
## 464      LIGHTNING        816     5230    9.303794e+10    365729000
## 170          FLOOD        470     6789    1.446577e+13  21753275000
## 585    RIP CURRENT        368      232    1.000000e+05            0
## 856      TSTM WIND        356     5404    4.484928e+11  11320985000
## 359      HIGH WIND        248     1137    5.270046e+11   2288040000
## 19       AVALANCHE        224      170    3.721800e+08            0

The most fatal event for human life is Excessive heat. Then, follows the Tornado and third comes the Flash flood.

On the other hand, the top ten events that Injuries where reported are

head(Data_sum[order(-Data_sum$INJURIES),],10)
##                 Event FATALITIES INJURIES Property_Damage Crops_Damage
## 834           TORNADO       1802    27944    3.222879e+12  10269737000
## 170             FLOOD        470     6789    1.446577e+13  21753275000
## 130    EXCESSIVE HEAT       1903     6525    7.753700e+08    492600000
## 856         TSTM WIND        356     5404    4.484928e+11  11320985000
## 464         LIGHTNING        816     5230    9.303794e+10    365729000
## 275              HEAT        937     2100    1.797000e+08     66954000
## 427         ICE STORM         89     1975    3.944928e+11    186850000
## 153       FLASH FLOOD        978     1777    1.682267e+12  19039070000
## 760 THUNDERSTORM WIND        133     1488    3.483122e+11   6992705000
## 972      WINTER STORM        206     1321    6.688497e+11    220390000

In case of Injuries, as we can see in the above table, the Tornado is again the first and way ahead of the other events. However, second comes the Flood and third the Excessive Heat.

The Tornadoes have almost 30.000 counts of injuries while the next event with less counts is about 6.000 counts. We will create a horizontal bar plot (to easily see the names of the events) and we will limit the x-axis to 8.000 counts.

dat_pl <- Data_sum[order(-Data_sum$INJURIES),][1:20,]

barchart(reorder(Event, INJURIES) ~ FATALITIES + INJURIES, dat_pl, xlab = "Counts", xlim=c(0,8000), auto.key=list(columns = 2),par.settings=list(superpose.polygon=list(col=c("red", "green"))))

From this plot we conclude that the ‘king’ of lethal events for human is Tornado. Then is Excessive heat with a little less counts of injuries than flood but four times more fatal. Also, notice that heat and flash flood have relatively small count of Injuries but they are the third most fatal events.

Property and Crops Damage

The weather events apart from the effect on human health cause damage on properties and crops. The top ten weather events concerning the damage on properties are

head(Data_sum[order(-round(Data_sum$Property_Damage)),],10)
##                 Event FATALITIES INJURIES Property_Damage Crops_Damage
## 170             FLOOD        470     6789    1.446577e+13  21753275000
## 411 HURRICANE/TYPHOON         64     1275    6.930584e+12   1464465100
## 670       STORM SURGE         13       38    4.332354e+12       500000
## 834           TORNADO       1802    27944    3.222879e+12  10269737000
## 153       FLASH FLOOD        978     1777    1.682267e+12  19039070000
## 244              HAIL         15     1162    1.573527e+12  60161277030
## 402         HURRICANE         61       46    1.186832e+12   2999310000
## 848    TROPICAL STORM         58      340    7.703891e+11   1195720000
## 972      WINTER STORM        206     1321    6.688497e+11    220390000
## 359         HIGH WIND        248     1137    5.270046e+11   2288040000

and for Crops

head(Data_sum[order(-Data_sum$Crops_Damage),],10)
##                  Event FATALITIES INJURIES Property_Damage Crops_Damage
## 244               HAIL         15     1162    1.573527e+12  60161277030
## 170              FLOOD        470     6789    1.446577e+13  21753275000
## 153        FLASH FLOOD        978     1777    1.682267e+12  19039070000
## 95             DROUGHT          0        4    1.046106e+11  14595735000
## 856          TSTM WIND        356     5404    4.484928e+11  11320985000
## 834            TORNADO       1802    27944    3.222879e+12  10269737000
## 760  THUNDERSTORM WIND        133     1488    3.483122e+11   6992705000
## 402          HURRICANE         61       46    1.186832e+12   2999310000
## 359          HIGH WIND        248     1137    5.270046e+11   2288040000
## 786 THUNDERSTORM WINDS         64      908    1.944591e+11   2014708080

From the previous two tables we can see that the damage on property is two to three orders of magnitude larger than the damage on crops. Among the events the flood is the most catastrophic for properties and the Hail for crops. In the following to plots we can inspect the damage caused on properties and crops respectively

barchart(reorder(Event, Property_Damage) ~ Property_Damage*10^-12, dat_pl, main = "Total Damage to Properties", xlab = "Damage (Trillions of USD)", col = "blue")

barchart(reorder(Event, Crops_Damage) ~ Crops_Damage*10^-9, dat_pl, main = "Total Damage to Crops", xlab = "Damage (Billions of USD)", col = "yellow")

Concluding Remarks

Concluding, the greatest economic cosequencies for properties come from flood, then for hurricane and tornadoes are third. For crops, hail is the most disastrous weather event, then flood and wind while tornado is fifth. Among the weather events Tornadoes and Floods seems the most disastrous since they both have a high impact on human health and they are in the top three concerning damages to properties and in the top five concerning damages to crops.