## Loading libraries
library(dplyr)
library(lubridate)
Storms are severe weather events which cause damage to people and buildings. It would be of great value, if we could predict weather and especially storms. Before predection a precise analysis of historical weather data may give us some insight in the damage an loss caused by severe events. Using the National Weather Storm Data Documentation provided by the U.S. National Oceanic and Atmospheric Administration’s (NOAA)[www.noaa.gov] we will answer two questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
URL <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(URL, destfile = "test.bz2", method="curl")
storm<-read.csv("test.bz2")
## There are
evlen<- length(unique(storm$EVTYPE))
## events named in the database
## Fatalies and injuries are summarized in two columns of the database named:FATALIES,INJURIES
## reduce dataframe to necessary variables
storm1<-storm[,c(1,2,7,8,23:28)]
## Change BGN_DATE from factor to date
storm1$BGN_DATE<-as.Date(storm1$BGN_DATE, format="%d/%m/%Y")
## Extract year
storm1$year<-year(storm1$BGN_DATE)
For everybody who reads this analysis it is essential to understand the problems arising from the quality of the data. There are 985 types of events in the file.
Data are very inconsistent and are spreaded over 61 years. As the value of money changes during this time, economic data therefore are difficult to handle. Instead of calculating the economic consequences on a common base, I will use another approach.
For each question the top 10 events for each year and each category ( harm to people, economic consequenses) are extracted.
From these “top-events” the causing events are classified.
Harm to people are “Fatalites” and “Injuries”. If they are correlated which each other, it would be enough to calculate one of them.
## Plotting FATALITIES versus INJURIES
plot(storm1$INJURIES,storm1$FATALITIES,main="Correlation between INJURIES and FATALITIES", xlab="INJURIES", ylab="FATALITIES")
## Correlation
cr<- cor(storm1$INJURIES,storm1$FATALITIES)
Correlation is 0.3216808 , thus increasing injuries are not directly correlated to increasing fatalities. Consequentely I will look on both effects seperately.
## order by INJURIES
## Split by year
str_sph<-split(storm1,storm1$year)
## order by INJURIES
str_spordh<-lapply(str_sph,function(x) x[order(-x$INJURIES),])
## extract first 10 events ( with highest number of Injuries)
str_spord10h<-lapply(str_spordh, function(x) x[1:10,])
## Concatenate split dataframes
storm_eventh<-do.call(rbind,str_spord10h)
How does the causing events in this subset look like ? ’r table(droplevels(storm_eventh$EVTYPE)) `
Most of them seem to belong to “wind related” items, thus making an extra column with windrelated events
## make extra column marking all wind related events
storm_eventh$type<-ifelse(grepl("STORM",storm_eventh$EVTYPE)==TRUE|grepl("TORN",storm_eventh$EVTYPE)==TRUE|grepl("HURR",storm_eventh$EVTYPE)==TRUE,"windrelated","")
## Sum of windrelated events
s10_inj<-sum(storm_eventh$type=="windrelated")
l10_inj<-nrow(storm_eventh)
l10p_inj<-round((s10_inj/l10_inj) *100)
# These above calculated events are now ordered decreasing by injuries
inj_order<-storm_eventh[order(-storm_eventh[,6]),]
# First 50 are extracted
inj_order<-inj_order[1:50,]
There are 499 windrelated events causing the Top 10 injuries each year.
The total number of top 10 events is 620 corresponding to a rate of 80 percent
The 50 severe weather events with most injured persons are caused by:
table(droplevels(inj_order$EVTYPE))
##
## EXCESSIVE HEAT HEAT HURRICANE/TYPHOON ICE STORM
## 3 4 1 1
## TORNADO
## 41
## order by FATALITIES
str_spordhf<-lapply(str_sph,function(x) x[order(-x$FATALITIES),])
## extract first 10 events ( with highest number of fatalities)
str_spord10hf<-lapply(str_spordhf, function(x) x[1:10,])
## Concatenate split dataframes
storm_eventhf<-do.call(rbind,str_spord10hf)
## make extra column marking all wind related events
storm_eventhf$type<-ifelse(grepl("STORM",storm_eventhf$EVTYPE)==TRUE|grepl("TORN",storm_eventhf$EVTYPE)==TRUE|grepl("HURR",storm_eventhf$EVTYPE)==TRUE,"windrelated","")
## Sum of windrelated events
s10_fat<-sum(storm_eventhf$type=="windrelated")
l10_fat<-nrow(storm_eventh)
l10p_fat<-round((s10_fat/l10_fat) *100)
fat_order<-storm_eventh[order(-storm_eventhf[,6]),]
# First 50 are extracted
fat_order<-fat_order[1:50,]
There are 396 windrelated events causing the top 10 fatalities each year.
The total number of top 10 events is 620 corresponding to a rate of 64 percent
The 50 severe weather events with most injured persons are caused by:
table(droplevels(fat_order$EVTYPE))
##
## BLIZZARD EXCESSIVE HEAT HEAT
## 1 3 1
## HEAVY SNOW MARINE THUNDERSTORM WIND RIP CURRENTS
## 1 1 1
## THUNDERSTORM WIND TORNADO
## 1 41
## Function for dealing with PROPDMG and PROPDMGEXP
## Including only k,m and b values because question is about most harmful
## Function for translating values in PROPDMGEXP into numbers and multiply with PROPDMG into "Total Damage" $tdmg
exfun<-function(x,y){
if (x=="k"|x=="K"){
y*1000
}else if(x=="m"|x=="M") {y*1000000
}else if (x=="b"|x=="B") {
y*1000000000}
else {y}
}
storm1$tdmg<-mapply(exfun,storm1$PROPDMGEXP,storm1$PROPDMG)
## Split by year
str_sp<-split(storm1,storm1$year)
## order by tdmg
str_spord<-lapply(str_sp,function(x) x[order(-x$tdmg),])
## extract first 10 events ( with highest tdmc)
str_spord10<-lapply(str_spord, function(x) x[1:10,])
## Concatenate split dataframes
storm_event<-do.call(rbind,str_spord10)
## make extra column marking all wind related events
storm_event$type<-ifelse(grepl("STORM",storm_event$EVTYPE)==TRUE|grepl("TORN",storm_event$EVTYPE)==TRUE|grepl("HURR",storm_event$EVTYPE)==TRUE,"windrelated","")
## Sm of windrelated events
s10<-sum(storm_event$type=="windrelated")
l10<-nrow(storm_event)
l10p<-round((s10/l10) *100)
If we look at the ten most expensive events in each year since 1950 we conclude, that there are 518 events of total 620 events are wind related.
This is equal to 84 percent
From the Storm database three parameters
were calculated.
For all three parameters the type of event causing the highest amount of “damage” was calculated. As a result I found for all three parameters “windrelated” events as the major cause for damage From the Top-10 events for each year and each event the following ration was caused bey “wind”
Wind related events are most harmful for the US causing harm to people and causing the greatest economic consequences