Synopsys

Weather events have an effect on our everyday life. Storms, tornadoes, earthquakes and many others causes not only heavy property damage, but also serious injures and deaths. Preventing such outcomes to the extent possible is a key concern.

This project uses U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database as a primary source of information. This database has some important info about major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. More info can be found here.

Main questions to answer are:

Data Processing

First of all, we need to download data, and, if necessarily, make some changes. Raw data.

filename<-"WeatherRepData.bz2"
if(!file.exists(filename))
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", filename)
data<-read.csv(filename)

As all questions require us to gain information about different type of events, it would be better to split data, based on event - EVTYPE

EVlist<-split(data, data$EVTYPE)

Question 1: harmful to health

This dataset has many variables - 37 to be exact.

dim(data)
## [1] 902297     37
names(data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

But to answer this question we’ll be using only two “fatalities” and “injuries”.

We can use these variables to split them by weather event, then we can use either “which.max” function to find the most harmful event, or sort them and print as many as we would like. Five events in each category will work fine.

sort(sapply(split(data$INJURIES, data$EVTYPE), sum), decreasing = TRUE)[1:10]
##           TORNADO         TSTM WIND             FLOOD    EXCESSIVE HEAT 
##             91346              6957              6789              6525 
##         LIGHTNING              HEAT         ICE STORM       FLASH FLOOD 
##              5230              2100              1975              1777 
## THUNDERSTORM WIND              HAIL 
##              1488              1361
sort(sapply(split(data$FATALITIES, data$EVTYPE), sum), decreasing = TRUE)[1:10]
##        TORNADO EXCESSIVE HEAT    FLASH FLOOD           HEAT      LIGHTNING 
##           5633           1903            978            937            816 
##      TSTM WIND          FLOOD    RIP CURRENT      HIGH WIND      AVALANCHE 
##            504            470            368            248            224

A graph showing most Injuries and most Fatal cases from different sources.

library(ggplot2)
library(ggpubr)

INJLAB<-labels(sort(sapply(split(data$INJURIES, data$EVTYPE), sum), decreasing = TRUE)[1:10])
FATLAB<-labels(sort(sapply(split(data$FATALITIES, data$EVTYPE), sum), decreasing = TRUE)[1:10])


x <- data.frame(ID=1:10,Injures=sort(sapply(split(data$INJURIES, data$EVTYPE), sum), decreasing = TRUE)[1:10],
                Fatal=sort(sapply(split(data$FATALITIES, data$EVTYPE), sum), decreasing = TRUE)[1:10] )

dp<-ggplot(data = x, aes(ID,log10(Injures)))+geom_point(col="blue")+
    geom_text(aes(ID,log10(Injures)-0.2,label = INJLAB), size =2.2)+coord_cartesian(xlim=c(0.8,11))
pd<-ggplot(data = x, aes(ID,log10(Fatal)))+geom_point(col="red")+
    geom_text(aes(ID,log10(Fatal)-0.2,label = FATLAB),size=2.5)+coord_cartesian(xlim=c(0.8,11))
ggarrange(dp,pd, nrow=2, ncol=1)

Based on these results we can say that, tornadoes caused higher number of injures and deaths, heat events caused a lot of deaths too.

Quesion 2: greatest economic consequences

And again, we need to look for something that describes damage done to property.

names(data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

The variables are “PROPDMG” and “CROPDMG”. Make the same steps, as in previous question.

The graph, that shows the most destructive events.

PROPLAB<-labels(sort(sapply(split(data$PROPDMG, data$EVTYPE), sum), decreasing = TRUE)[1:10])
CROPLAB<-labels(sort(sapply(split(data$CROPDMG, data$EVTYPE), sum), decreasing = TRUE)[1:10])

x <- data.frame(ID=1:10,CROPDMG=sort(sapply(split(data$CROPDMG, data$EVTYPE), sum), decreasing = TRUE)[1:10],
                PROPDMG=sort(sapply(split(data$PROPDMG, data$EVTYPE), sum), decreasing = TRUE)[1:10] )
dp<-ggplot(data = x, aes(ID,log(CROPDMG, base = 10)))+geom_point(col="blue")+
    geom_text(aes(ID,log(CROPDMG, base = 10)-0.05,label = CROPLAB),size=2.5)+coord_cartesian(xlim=c(0.8,11))
pd<-ggplot(data = x, aes(ID,log(PROPDMG, base=10)))+geom_point(col="red")+
    geom_text(aes(ID,log(PROPDMG, base=10)-0.05,label = PROPLAB), size=2.5)+coord_cartesian(xlim=c(0.8,11))
ggarrange(dp,pd, nrow=2, ncol=1)

The text result:

sort(sapply(split(data$PROPDMG, data$EVTYPE), sum), decreasing = TRUE)[1:10]
##            TORNADO        FLASH FLOOD          TSTM WIND              FLOOD 
##          3212258.2          1420124.6          1335965.6           899938.5 
##  THUNDERSTORM WIND               HAIL          LIGHTNING THUNDERSTORM WINDS 
##           876844.2           688693.4           603351.8           446293.2 
##          HIGH WIND       WINTER STORM 
##           324731.6           132720.6
sort(sapply(split(data$CROPDMG, data$EVTYPE), sum), decreasing = TRUE)[1:10]
##               HAIL        FLASH FLOOD              FLOOD          TSTM WIND 
##          579596.28          179200.46          168037.88          109202.60 
##            TORNADO  THUNDERSTORM WIND            DROUGHT THUNDERSTORM WINDS 
##          100018.52           66791.45           33898.62           18684.93 
##          HIGH WIND         HEAVY RAIN 
##           17283.21           11122.80

We can see that tornadoes and Hails are most devastating to property.

Results

From the research we made, we defined 10 the most dangerous events for human lives and property.

For lives resulting in injures:

labels(sort(sapply(split(data$INJURIES, data$EVTYPE), sum), decreasing = TRUE)[1:10])
##  [1] "TORNADO"           "TSTM WIND"         "FLOOD"            
##  [4] "EXCESSIVE HEAT"    "LIGHTNING"         "HEAT"             
##  [7] "ICE STORM"         "FLASH FLOOD"       "THUNDERSTORM WIND"
## [10] "HAIL"

For lives resulting in death:

labels(sort(sapply(split(data$FATALITIES, data$EVTYPE), sum), decreasing = TRUE)[1:10])
##  [1] "TORNADO"        "EXCESSIVE HEAT" "FLASH FLOOD"    "HEAT"          
##  [5] "LIGHTNING"      "TSTM WIND"      "FLOOD"          "RIP CURRENT"   
##  [9] "HIGH WIND"      "AVALANCHE"

For property:

labels(sort(sapply(split(data$PROPDMG, data$EVTYPE), sum), decreasing = TRUE)[1:10])
##  [1] "TORNADO"            "FLASH FLOOD"        "TSTM WIND"         
##  [4] "FLOOD"              "THUNDERSTORM WIND"  "HAIL"              
##  [7] "LIGHTNING"          "THUNDERSTORM WINDS" "HIGH WIND"         
## [10] "WINTER STORM"

For crops:

labels(sort(sapply(split(data$CROPDMG, data$EVTYPE), sum), decreasing = TRUE)[1:10])
##  [1] "HAIL"               "FLASH FLOOD"        "FLOOD"             
##  [4] "TSTM WIND"          "TORNADO"            "THUNDERSTORM WIND" 
##  [7] "DROUGHT"            "THUNDERSTORM WINDS" "HIGH WIND"         
## [10] "HEAVY RAIN"