Weather events can be hazardous for the human life. They also affect the public health and cause considerable damage to properties and crops each year. As a result the US government spends a lot of resources each year. In this analysis we use the ‘Storm Data’ released by the National Climatic Data Center. We examine which are the most fatal weather events and which cause the most injuries. Also, we compare the damage caused to properties and crops. Finally, we give two separate plots for the damage caused to crops and properties.
The libraries that we will use in this analysis are
library(lubridate)
library(plyr)
library(dplyr)
library(lattice)
We set the proper working directory
setwd("D:/Data Science/Coursera/Reprod research/Final Assignment")
and we create the file to download the data
if(!file.exists("./data")){
dir.create("./data")
}
Then we download the data file from the given URL
fileUrl <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = "./data/data.csv.bz2")
The “bz2” file is compressed in a way that R can directly read it as a “csv” file. We will read only the data of the relevant columns.
relcol <- c ("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
Data <- read.csv("./data/data.csv.bz2")[ ,relcol]
The columns describe data as follows:
BGN_DATE is the time of the event
EVTYPE column is a description of the type of the event.
FATALITIES column is a count of the fatal effect of an event on human life
INJURIES column is a count of injuries caused by the event
PROPDMG column is a count of the damage caused on properties by the event
PROPDMGEXP column is an orders of magnitude count for the damage caused on properties by the event
CROPDMG column is a count of the damage caused on crops by the event
CROPDMGEXP column is an orders of magnitude count for the damage caused on crops by the event
Using the lubridate package we make the first column as date and we keep only the year of the event
Data$BGN_DATE <- year(mdy_hms(Data$BGN_DATE))
Also, the EXP factors need some manipulation. In particular, we make the following ‘revalues’ for the two ‘EXP’ columns (" “=0,”-“=0,”?“=0,”+“=0,”B“=9,”h“=2,”H“=2,”k“=6,”K“=3,”m“=6,”M“=6). Using the ‘dplyr’ package we make the adjustments
Data$PROPDMGEXP <- revalue(Data$PROPDMGEXP,c(" " =0, "-"=0, "?"=0, "+"=0, "B"=9, "h"=2, "H"=2, "K"=3, "m"=6, "M"=6))
Data$CROPDMGEXP <- revalue(Data$CROPDMGEXP,c(" "=0, "-"=0, "?"=0, "+"=0, "B"=9, "h"=2, "H"=2,"k"=3, "K"=3, "m"=6, "M"=6))
After this we can compute the actual values of the damage on properties and crops respectively
Data1 <- mutate(Data, Property_Damage = as.numeric(PROPDMG) * 10^as.numeric(PROPDMGEXP))[,c(-5,-6)]
Data2 <- mutate(Data1, Crops_Damage = as.numeric(CROPDMG) * 10^as.numeric(CROPDMGEXP))[,c(-5,-6)]
rm(Data, Data1)
head(Data2)
## BGN_DATE EVTYPE FATALITIES INJURIES Property_Damage Crops_Damage
## 1 1950 TORNADO 0 15 2500000 0
## 2 1950 TORNADO 0 0 250000 0
## 3 1951 TORNADO 0 2 2500000 0
## 4 1951 TORNADO 0 2 250000 0
## 5 1951 TORNADO 0 2 250000 0
## 6 1951 TORNADO 0 6 250000 0
The number of events reported significantly varies over the years. For that reason we expect that during the 50’s for example, tornadoes may have been reported more systematically than thunderstorms. On the other hand, we need a time period that is more than a decade which is a typical time interval for weather oscillations. For that reason we set the threshold of counts of events to be of order 10^4.
which.min(which(table(Data2$BGN_DATE) >10^4))
## 1989
## 1
Thus, we will take into account observations from 1989 and after. The sub setting is done as follows
Data2 <- Data2[Data2$BGN_DATE>=1989,]
Finally, the data set that we will use consists of the sum of each variable for each event type
Data_sum <- aggregate(Data2[,-c(1,2)], by=list(Event = Data2$EVTYPE), sum)
In the following two subsections we will investigate the effect of weather events on human life and on the economy respectively.
From the last data set, we can easily inspect the top ten lethal weather events for the human life, for the time period 1989-2011.
head(Data_sum[order(-Data_sum$FATALITIES),],10)
## Event FATALITIES INJURIES Property_Damage Crops_Damage
## 130 EXCESSIVE HEAT 1903 6525 7.753700e+08 492600000
## 834 TORNADO 1802 27944 3.222879e+12 10269737000
## 153 FLASH FLOOD 978 1777 1.682267e+12 19039070000
## 275 HEAT 937 2100 1.797000e+08 66954000
## 464 LIGHTNING 816 5230 9.303794e+10 365729000
## 170 FLOOD 470 6789 1.446577e+13 21753275000
## 585 RIP CURRENT 368 232 1.000000e+05 0
## 856 TSTM WIND 356 5404 4.484928e+11 11320985000
## 359 HIGH WIND 248 1137 5.270046e+11 2288040000
## 19 AVALANCHE 224 170 3.721800e+08 0
The most fatal event for human life is Excessive heat. Then, follows the Tornado and third comes the Flash flood.
On the other hand, the top ten events that Injuries where reported are
head(Data_sum[order(-Data_sum$INJURIES),],10)
## Event FATALITIES INJURIES Property_Damage Crops_Damage
## 834 TORNADO 1802 27944 3.222879e+12 10269737000
## 170 FLOOD 470 6789 1.446577e+13 21753275000
## 130 EXCESSIVE HEAT 1903 6525 7.753700e+08 492600000
## 856 TSTM WIND 356 5404 4.484928e+11 11320985000
## 464 LIGHTNING 816 5230 9.303794e+10 365729000
## 275 HEAT 937 2100 1.797000e+08 66954000
## 427 ICE STORM 89 1975 3.944928e+11 186850000
## 153 FLASH FLOOD 978 1777 1.682267e+12 19039070000
## 760 THUNDERSTORM WIND 133 1488 3.483122e+11 6992705000
## 972 WINTER STORM 206 1321 6.688497e+11 220390000
In case of Injuries, as we can see in the above table, the Tornado is again the first and way ahead of the other events. However, second comes the Flood and third the Excessive Heat.
The Tornadoes have almost 30.000 counts of injuries while the next event with less counts is about 6.000 counts. We will create a horizontal bar plot (to easily see the names of the events) and we will limit the x-axis to 8.000 counts.
dat_pl <- Data_sum[order(-Data_sum$INJURIES),][1:20,]
barchart(reorder(Event, INJURIES) ~ FATALITIES + INJURIES, dat_pl, xlab = "Counts", xlim=c(0,8000), auto.key=list(columns = 2),par.settings=list(superpose.polygon=list(col=c("red", "green"))))
From this plot we conclude that the ‘king’ of lethal events for human is Tornado. Then is Excessive heat with a little less counts of injuries than flood but four times more fatal. Also, notice that heat and flash flood have relatively small count of Injuries but they are the third most fatal events.
The weather events apart from the effect on human health cause damage on properties and crops. The top ten weather events concerning the damage on properties are
head(Data_sum[order(-round(Data_sum$Property_Damage)),],10)
## Event FATALITIES INJURIES Property_Damage Crops_Damage
## 170 FLOOD 470 6789 1.446577e+13 21753275000
## 411 HURRICANE/TYPHOON 64 1275 6.930584e+12 1464465100
## 670 STORM SURGE 13 38 4.332354e+12 500000
## 834 TORNADO 1802 27944 3.222879e+12 10269737000
## 153 FLASH FLOOD 978 1777 1.682267e+12 19039070000
## 244 HAIL 15 1162 1.573527e+12 60161277030
## 402 HURRICANE 61 46 1.186832e+12 2999310000
## 848 TROPICAL STORM 58 340 7.703891e+11 1195720000
## 972 WINTER STORM 206 1321 6.688497e+11 220390000
## 359 HIGH WIND 248 1137 5.270046e+11 2288040000
and for Crops
head(Data_sum[order(-Data_sum$Crops_Damage),],10)
## Event FATALITIES INJURIES Property_Damage Crops_Damage
## 244 HAIL 15 1162 1.573527e+12 60161277030
## 170 FLOOD 470 6789 1.446577e+13 21753275000
## 153 FLASH FLOOD 978 1777 1.682267e+12 19039070000
## 95 DROUGHT 0 4 1.046106e+11 14595735000
## 856 TSTM WIND 356 5404 4.484928e+11 11320985000
## 834 TORNADO 1802 27944 3.222879e+12 10269737000
## 760 THUNDERSTORM WIND 133 1488 3.483122e+11 6992705000
## 402 HURRICANE 61 46 1.186832e+12 2999310000
## 359 HIGH WIND 248 1137 5.270046e+11 2288040000
## 786 THUNDERSTORM WINDS 64 908 1.944591e+11 2014708080
From the previous two tables we can see that the damage on property is two to three orders of magnitude larger than the damage on crops. Among the events the flood is the most catastrophic for properties and the Hail for crops. In the following to plots we can inspect the damage caused on properties and crops respectively
barchart(reorder(Event, Property_Damage) ~ Property_Damage*10^-12, dat_pl, main = "Total Damage to Properties", xlab = "Damage (Trillions of USD)", col = "blue")
barchart(reorder(Event, Crops_Damage) ~ Crops_Damage*10^-9, dat_pl, main = "Total Damage to Crops", xlab = "Damage (Billions of USD)", col = "yellow")
Concluding, the greatest economic cosequencies for properties come from flood, then for hurricane and tornadoes are third. For crops, hail is the most disastrous weather event, then flood and wind while tornado is fifth. Among the weather events Tornadoes and Floods seems the most disastrous since they both have a high impact on human health and they are in the top three concerning damages to properties and in the top five concerning damages to crops.