Public Health and Economic Consquences of Various Weather Events in the US from 1950 to 2011

Methodology

Database

Database included in the current study contain data form January 1950 to November 2011 and this information will be used for the current study and this data can be found as bz2 file available at the following link.

Preprocessing

Various libraries that were needed in the analysis were firsr loaded.

#libraries
library(car)
library(ggplot2)
library(dplyr)
library(data.table)
library(stringdist)
library(Hmisc)
library(gridExtra)

Data was downloaded and extracted and then read as csv using fread() at data.table package that allows reading large data sets in a small amount of time.

setwd("c:/users/ahmed/desktop/assignment 4")
storm<- fread("storm.csv", sep = ",")

Needed columns were extracted form data. These are columns that contain State, Date and Time, Fatalities, Injuries, Crop and Property damage in USD.

cols<- c("BGN_DATE", "STATE", "EVTYPE", "FATALITIES", "INJURIES", agrep("DMG", names(storm), value = T))

process data to determine public health consequences which are injuries and fatalities.

*Extract events that caused either fatalities and injuries and exclude others to reduce the dataset to the lowest number of observations possible.

health<- storm %>% select(one_of(cols)) %>% filter(INJURIES >0 | FATALITIES >0)
length(unique(health$EVTYPE))

## [1] 220

*The number of events was too high due to spelling and non-unified method of recording data over time so clustering was performed to cluster similar events together using stringdist package. This was done using a function clusterfunc(). Distance matrices were constructed throught that fucnction using jw method and jw=0.32 was used as it was found to produce best results.

#Create fucntion that takes data frame and column to cluster on to return df with clusters columns
clustfunc<- function(data=data,x=x,i){
data[,x]<- tolower(data[,x])
matrix<- stringdistmatrix(unique(data[,x]), unique(data[,x]), method = "jw")
rownames(matrix)<- unique(data[,x])
clus<- hclust(as.dist(matrix))
clusters<- as.data.frame(cutree(clus, h=i))
clusters$Event<- row.names(clusters)
row.names(clusters)<- NULL
colnames(clusters)<- c("Cluster", "Event")
df<- merge(x=data, y=clusters, by.x = names(data[x]),  by.y = "Event", all.x = T, all.y = F)
}

*Now we apply this clusterfunc to the health dataframe.

df<- clustfunc(data=health, x=3, i=0.32)
length(unique(df$Cluster))

## [1] 80

TSTM wind and Thunderstorm wind are the same so they can be further combined (clusters 9 and 2).

df$Cluster[df$Cluster==9]<- 2

Data summary should now bw constructed to determine the Sum() of fatalities and injuries summarized by Cluster and The most common event in the cluster will be used to represent it. Eventfreq function was constructed for that purpose.

#Most common event in a cluster
eventfreq<- function(x){
        names(sort(table(x), decreasing = T)[1])
}

Summarize data then derive 2 data frames with fatalities and injuries sorted in a descending order, respectively.

#Summary
healthcon<- df %>% arrange(Cluster) %>% group_by(Cluster) %>% summarise(Event= eventfreq(EVTYPE), Fatalities= sum(FATALITIES), Injuries=sum(INJURIES))
#Fatalities df
fats<- arrange(healthcon,desc(Fatalities))
#Injuries df
injs<- arrange(healthcon, desc(Injuries))

Process data to determine economin health consequences which are injuries and fatalities.

Select events that affected proprties or crops

dmg<- storm %>% select(one_of(cols)) %>% filter(PROPDMG > 0 | CROPDMG > 0)

use clustfunc to do same as before

# jw value of 0.42 was a better value for clustering events
df2<-clustfunc(dmg,3, i=0.42)
length(unique(df2$EVTYPE)) #before clustering

## [1] 397

length(unique(df2$Cluster)) #after clustering

## [1] 70

Recode() from cars package was used to transfrom k = 1000, m=10^6 and **b=10^9** after changing all characters into lower cases. Other values were assigned asNA`.

#function to convert exponents to numeric values
df2<-df2 %>% mutate(PROPDMGEXP=tolower(PROPDMGEXP), CROPDMGEXP=tolower(CROPDMGEXP))
df2$PROPDMGEXP<- Recode(df2$PROPDMGEXP, "'k'=1000; 'm'=10^6; 'b'=10^9; else=NA", as.factor.result = F)
df2$CROPDMGEXP<- Recode(df2$CROPDMGEXP, "'k'=1000; 'm'=10^6; 'b'=10^9; else=NA", as.factor.result = F)

Two 2 variables were computed to estiamte property.damage and crop.damage in USD.

#Calculate property and crop damage
df2<- df2 %>% mutate(property.damage=PROPDMG*PROPDMGEXP, 
                     crop.damage=CROPDMGEXP*CROPDMG)

Data was summarized by calculataing sum() of property damages and crop damages summarized by clusters and the mosr common event in the cluster was used to define the cluster using eventfreq(). Two data frames,prop and crop, were generated sorted *descendingly** according to damage.

#Calculate most frequent event in the cluster and aggregate by cluster
ecocon<- df2 %>% arrange(Cluster) %>% group_by(Cluster) %>% 
        summarise(Event= eventfreq(EVTYPE), property.damage= sum(property.damage, na.rm=T),
                  crop.damage=sum(crop.damage, na.rm= T))

#Create 2 data frames for prporty and crop damage arranged in ascending order
prop<- ecocon %>% arrange(desc(property.damage))
crop<- ecocon %>% arrange(desc(crop.damage))

Plots

Public health consequences plots Bar charts were chosen to visualize fatalities and injuries according to top 10 events

##ggplot
p1<- ggplot(data=fats[1:10,], aes(y=Fatalities, x=reorder(capitalize(Event), Fatalities)))+
                    geom_bar(stat="identity", fill="cyan3")+ coord_flip() +theme_light() + 
        geom_text(data=fats[1:10,],aes(label= Fatalities,vjust=0)) +
        labs(x="Events", y="Fatalities") +
        ggtitle("Fatalities by Event type", 
subtitle = "This charts shows top 10 causes of fatalities from 1950 to 2011")

p2<- ggplot(data=injs[1:10,], aes(y=Injuries, x=reorder(capitalize(Event), Injuries)))+
        geom_bar(stat="identity", fill= "#FF9999")+ coord_flip() +labs(x="Events", y="Injuries") +
                geom_text(data=injs[1:10,],aes(label= Injuries,vjust=0)) +
        ggtitle("Injuries by Event type",
subtitle = "This charts shows top 10 causes of injuries from 1950 to 2011") +theme_light()

p<- grid.arrange(p1,p2, ncol = 2)

print(p)

## TableGrob (1 x 2) "arrange": 2 grobs
##   z     cells    name           grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]

Economic consequences plots

p1<- ggplot(data=prop[1:10,], aes(y=round(property.damage/10^9, digits = 0), x=reorder(capitalize(Event), property.damage)))+
        geom_bar(stat="identity", fill="cyan3")+ coord_flip() +theme_light() + 
        labs(x="Events", y="Property damage in billions USD") +
        geom_text(data=prop[1:10,],aes(label=round(property.damage/10^9, digits = 0),vjust=0)) +
        ggtitle("Property damage by Event type", 
                subtitle = "This charts shows top 10 causes of property damage from 1950 to 2011")

p2<- ggplot(data=crop[1:10,], aes(y=round(crop.damage/10^9, digits = 0), x=reorder(capitalize(Event), crop.damage)))+
        geom_bar(stat="identity", fill= "#FF9999")+ coord_flip() +
        labs(x="Events", y="Crop damage in billions USD") +
        geom_text(data=crop[1:10,],aes(label=round(crop.damage/10^9, digits = 0),vjust=0)) +
        ggtitle("Crop damage by Event type",
                subtitle = "This charts shows top 10 causes of Crop damage from 1950 to 2011") +theme_light()

p<- grid.arrange(p1,p2, ncol = 2)

Results and discussion

Public health consequences
Results show that Tornados are the most common cause of both injuries and fatalities accounting for nearly 5000 fatalities and 91,000 injuries. Excessive heat was the second most common cause of fatalities while Thunderstorms came in 2nd place as a cause of injury. Floods, wind and lightning and extreme cold were also major causes of fatalities and injuries.
Economic consequence Analysis showed that floods are the most common cause of property damage as it casued an estimated damage of 145 billion USD from 1950 to 2011. Estimated property damage produced by Hurricances and tornados was 85 and 45 billion USD, respectively.
The most common cause for crop damage was drought (14 billion USD) followed by floods (6 billion USD) and hurricanes (6 billion USD).

Public Health and Economic Consquences of Various Weather Events in the US from 1950 to 2011

Ahmed M. Kamel

January 31, 2017

Introduction

Methodology

Database

Preprocessing

Plots

Results and discussion