Weather events have an effect on our everyday life. Storms, tornadoes, earthquakes and many others causes not only heavy property damage, but also serious injures and deaths. Preventing such outcomes to the extent possible is a key concern.
This project uses U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database as a primary source of information. This database has some important info about major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. More info can be found here.
Main questions to answer are:
First of all, we need to download data, and, if necessarily, make some changes. Raw data.
filename<-"WeatherRepData.bz2"
if(!file.exists(filename))
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", filename)
data<-read.csv(filename)
As all questions require us to gain information about different type of events, it would be better to split data, based on event - EVTYPE
EVlist<-split(data, data$EVTYPE)
This dataset has many variables - 37 to be exact.
dim(data)
## [1] 902297 37
names(data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
But to answer this question we’ll be using only two “fatalities” and “injuries”.
We can use these variables to split them by weather event, then we can use either “which.max” function to find the most harmful event, or sort them and print as many as we would like. Five events in each category will work fine.
sort(sapply(split(data$INJURIES, data$EVTYPE), sum), decreasing = TRUE)[1:10]
## TORNADO TSTM WIND FLOOD EXCESSIVE HEAT
## 91346 6957 6789 6525
## LIGHTNING HEAT ICE STORM FLASH FLOOD
## 5230 2100 1975 1777
## THUNDERSTORM WIND HAIL
## 1488 1361
sort(sapply(split(data$FATALITIES, data$EVTYPE), sum), decreasing = TRUE)[1:10]
## TORNADO EXCESSIVE HEAT FLASH FLOOD HEAT LIGHTNING
## 5633 1903 978 937 816
## TSTM WIND FLOOD RIP CURRENT HIGH WIND AVALANCHE
## 504 470 368 248 224
A graph showing most Injuries and most Fatal cases from different sources.
library(ggplot2)
library(ggpubr)
INJLAB<-labels(sort(sapply(split(data$INJURIES, data$EVTYPE), sum), decreasing = TRUE)[1:10])
FATLAB<-labels(sort(sapply(split(data$FATALITIES, data$EVTYPE), sum), decreasing = TRUE)[1:10])
x <- data.frame(ID=1:10,Injures=sort(sapply(split(data$INJURIES, data$EVTYPE), sum), decreasing = TRUE)[1:10],
Fatal=sort(sapply(split(data$FATALITIES, data$EVTYPE), sum), decreasing = TRUE)[1:10] )
dp<-ggplot(data = x, aes(ID,log10(Injures)))+geom_point(col="blue")+
geom_text(aes(ID,log10(Injures)-0.2,label = INJLAB), size =2.2)+coord_cartesian(xlim=c(0.8,11))
pd<-ggplot(data = x, aes(ID,log10(Fatal)))+geom_point(col="red")+
geom_text(aes(ID,log10(Fatal)-0.2,label = FATLAB),size=2.5)+coord_cartesian(xlim=c(0.8,11))
ggarrange(dp,pd, nrow=2, ncol=1)
Based on these results we can say that, tornadoes caused higher number of injures and deaths, heat events caused a lot of deaths too.
And again, we need to look for something that describes damage done to property.
names(data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
The variables are “PROPDMG” and “CROPDMG”. Make the same steps, as in previous question.
The graph, that shows the most destructive events.
PROPLAB<-labels(sort(sapply(split(data$PROPDMG, data$EVTYPE), sum), decreasing = TRUE)[1:10])
CROPLAB<-labels(sort(sapply(split(data$CROPDMG, data$EVTYPE), sum), decreasing = TRUE)[1:10])
x <- data.frame(ID=1:10,CROPDMG=sort(sapply(split(data$CROPDMG, data$EVTYPE), sum), decreasing = TRUE)[1:10],
PROPDMG=sort(sapply(split(data$PROPDMG, data$EVTYPE), sum), decreasing = TRUE)[1:10] )
dp<-ggplot(data = x, aes(ID,log(CROPDMG, base = 10)))+geom_point(col="blue")+
geom_text(aes(ID,log(CROPDMG, base = 10)-0.05,label = CROPLAB),size=2.5)+coord_cartesian(xlim=c(0.8,11))
pd<-ggplot(data = x, aes(ID,log(PROPDMG, base=10)))+geom_point(col="red")+
geom_text(aes(ID,log(PROPDMG, base=10)-0.05,label = PROPLAB), size=2.5)+coord_cartesian(xlim=c(0.8,11))
ggarrange(dp,pd, nrow=2, ncol=1)
The text result:
sort(sapply(split(data$PROPDMG, data$EVTYPE), sum), decreasing = TRUE)[1:10]
## TORNADO FLASH FLOOD TSTM WIND FLOOD
## 3212258.2 1420124.6 1335965.6 899938.5
## THUNDERSTORM WIND HAIL LIGHTNING THUNDERSTORM WINDS
## 876844.2 688693.4 603351.8 446293.2
## HIGH WIND WINTER STORM
## 324731.6 132720.6
sort(sapply(split(data$CROPDMG, data$EVTYPE), sum), decreasing = TRUE)[1:10]
## HAIL FLASH FLOOD FLOOD TSTM WIND
## 579596.28 179200.46 168037.88 109202.60
## TORNADO THUNDERSTORM WIND DROUGHT THUNDERSTORM WINDS
## 100018.52 66791.45 33898.62 18684.93
## HIGH WIND HEAVY RAIN
## 17283.21 11122.80
We can see that tornadoes and Hails are most devastating to property.
From the research we made, we defined 10 the most dangerous events for human lives and property.
For lives resulting in injures:
labels(sort(sapply(split(data$INJURIES, data$EVTYPE), sum), decreasing = TRUE)[1:10])
## [1] "TORNADO" "TSTM WIND" "FLOOD"
## [4] "EXCESSIVE HEAT" "LIGHTNING" "HEAT"
## [7] "ICE STORM" "FLASH FLOOD" "THUNDERSTORM WIND"
## [10] "HAIL"
For lives resulting in death:
labels(sort(sapply(split(data$FATALITIES, data$EVTYPE), sum), decreasing = TRUE)[1:10])
## [1] "TORNADO" "EXCESSIVE HEAT" "FLASH FLOOD" "HEAT"
## [5] "LIGHTNING" "TSTM WIND" "FLOOD" "RIP CURRENT"
## [9] "HIGH WIND" "AVALANCHE"
For property:
labels(sort(sapply(split(data$PROPDMG, data$EVTYPE), sum), decreasing = TRUE)[1:10])
## [1] "TORNADO" "FLASH FLOOD" "TSTM WIND"
## [4] "FLOOD" "THUNDERSTORM WIND" "HAIL"
## [7] "LIGHTNING" "THUNDERSTORM WINDS" "HIGH WIND"
## [10] "WINTER STORM"
For crops:
labels(sort(sapply(split(data$CROPDMG, data$EVTYPE), sum), decreasing = TRUE)[1:10])
## [1] "HAIL" "FLASH FLOOD" "FLOOD"
## [4] "TSTM WIND" "TORNADO" "THUNDERSTORM WIND"
## [7] "DROUGHT" "THUNDERSTORM WINDS" "HIGH WIND"
## [10] "HEAVY RAIN"