Weather Events impacting the United States

SONJA OFFWOOD

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This analysis involves exploring which of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) weather events have the greatest impact on the health of the population of the United States, as well as which Weather events have the biggest economic impact on the United States. Knowing this will ensure that the required budget and preparation is assigned to the weather events that are having the most destructive impact on the United States.

Data Processing

The following analysis utilises the dataset from the location https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2. It assumes the following steps have been performed in getting the data onto your local machine

  • the .csv.bz2 file from the above website is downloaded onto your local machine,
  • the file is unzipped into the working directory, under the name ‘repdata-data-StormData (1).csv’.

The following code reads the data into R and loads the required packages for the analysis into the R session:

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.1
orig_data= read.csv("repdata-data-StormData (1).csv")

We can now do the required processing on the data to be able to conduct our analysis.

Data Preprocessing for Events impacting Population Health

This section outlines the preprocessing which has been done on the data to analyse the first question:

  • Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

To answer this question, we restrict the data to those entries that have caused at least 10 deaths. We convert the EVTYPE factor variable to a character variable and rename any heat related event types under a consistent name, namely “HEAT”."

data=subset(orig_data,orig_data$FATALITIES>10)
data$EVTYPE = as.character(data$EVTYPE)
data$EVTYPE2 = ifelse(grepl("HEAT", data$EVTYPE),"HEAT", data$EVTYPE)

To analyse the fatalities and injuries per event type, the aggregate function is used separately on the fatalities variable as well as on the injuries variable. The outcoming datasets are combined back into one data set with the variables

  • “Event” being the weather event
  • “Indicator” being either “Deaths” or “Injuries”
  • “Value” being the number of occurences of the indicator under the event
data_FAT=aggregate(data$FATALITIES, list(data$EVTYPE2), sum)
data_INJ=aggregate(data$INJURIES, list(data$EVTYPE2), sum)
data_FAT$Indicator = "Deaths"
data_INJ$Indicator = "Injuries"
data_plot = rbind(data_FAT,data_INJ)
names(data_plot)=c("Event","Value","Indicator")

Lets have a look at the first 10 items in this new dataframe:

head(data_plot)
##           Event Value Indicator
## 1 COLD AND SNOW    14    Deaths
## 2   FLASH FLOOD    47    Deaths
## 3         FLOOD    26    Deaths
## 4           FOG    11    Deaths
## 5          HEAT  1682    Deaths
## 6    HEAVY RAIN    19    Deaths

Data Preprocessing for Events impacting the economy

This section outlines the preprocessing which has been done on the data to analyse the second question:

  • Across the United States, which types of events have the greatest economic consequences?

To answer this question, we restrict the original data to those entries that have either a billion dollar damage on properties or on crops. We again convert the EVTYPE variable to a character variable and rename any heat related Event types under one consistent name, namely “HEAT”. We produce new variables “PropertyDamage” and “CropDamage” to give the dollar value of the damage caused.

data2=subset(orig_data,orig_data$PROPDMGEXP=="B" | orig_data$CROPDMGEXP=="B")
data2$PropertyDamage = ifelse(data2$PROPDMGEXP=="B", data2$PROPDMG*1000000000, 
                                ifelse(data2$PROPDMGEXP=="M", data2$PROPDMG*1000000,
                                        ifelse(data2$PROPDMGEXP=="K", data2$PROPDMG*1000,data2$PROPDMG)))
data2$CropDamage = ifelse(data2$CROPDMGEXP=="B", data2$CROPDMG*1000000000, 
                                ifelse(data2$CROPDMGEXP=="M", data2$CROPDMG*1000000,
                                        ifelse(data2$CROPDMGEXP=="K", data2$CROPDMG*1000,data2$CROPDMG)))

data2$EVTYPE = as.character(data2$EVTYPE)
data2$EVTYPE2 = ifelse(grepl("HEAT", data2$EVTYPE),"HEAT", data2$EVTYPE)

Similarly to above, to analyse the damage on properties and crops in the United States per event type, the aggregate function is used separately on the newly created PropertyDamage variable as well as on the CropDamage variable. The outcoming datasets are combined back into one data set with the variables

  • “Event” being the weather event
  • “Indicator” being either “PropertyDamage” or “CropDamage”
  • “Value” being the Dollar value of the Damages of the indicator under the event
data2_Prop=aggregate(data2$PropertyDamage, list(data2$EVTYPE2), sum)
data2_Crop=aggregate(data2$CropDamage, list(data2$EVTYPE2), sum)

data2_Prop$Indicator = "PropertyDamage"
data2_Crop$Indicator = "CropDamage"

data2_plot = rbind(data2_Prop,data2_Crop)
names(data2_plot)=c("Event","Value","Indicator")

Lets have a look at the first 10 items in this new dataframe:

head(data2_plot)
##         Event     Value      Indicator
## 1     DROUGHT 0.000e+00 PropertyDamage
## 2 FLASH FLOOD 1.000e+09 PropertyDamage
## 3       FLOOD 1.225e+11 PropertyDamage
## 4      FREEZE 0.000e+00 PropertyDamage
## 5        HAIL 1.800e+09 PropertyDamage
## 6        HEAT 0.000e+00 PropertyDamage

Results

Results for Events impacting Population Health

Using the preprocessed data, we can produce the following plot showing which events have the greatest impact on population health:

g = ggplot(data=data_plot, aes(x=Event,y=Value, fill = Indicator))+geom_bar(stat = "identity")
g= g + theme(axis.text.x = element_text(angle = 90, hjust = 1), axis.title=element_text(size=10), axis.text=element_text(size=8)) + theme(plot.title = element_text(size=16,face="bold"))
g=g + labs(title=expression("Weather Events causing Fatalities and Injuries"), x="Events", y="Incidents")
print(g)

From this we can clearly see, that the two events that have the most significant impact are

  • Tornadoes, and
  • Heatwaves

Any preparation to reduce the weather impact on the population health, should be focussed on these two events. It might be interesting to see which other events cause an impact on the population health, excluding these two events, as the extremity of these events prevents any further information being read from the above plot.

The following code does the same pre-processing as described above, but excludes the Tornado and Heat event entries.

data_exT = subset(data,data$EVTYPE2!="TORNADO")
data_exT = subset(data_exT,data_exT$EVTYPE2!="HEAT")
data_FAT_exT = aggregate(data_exT$FATALITIES, list(data_exT$EVTYPE2), sum)
data_INJ_exT = aggregate(data_exT$INJURIES, list(data_exT$EVTYPE2), sum)
data_FAT_exT$Indicator = "Deaths"
data_INJ_exT$Indicator = "Injuries"
data_exT_plot = rbind(data_FAT_exT,data_INJ_exT)
names(data_exT_plot) = c("Event","Value","Indicator")

This produces the following plot:

g = ggplot(data=data_exT_plot, aes(x=Event,y=Value, fill=factor(Indicator)))+geom_bar(stat = "identity")
g= g + theme(axis.text.x = element_text(angle = 90, hjust = 1), axis.title=element_text(size=10), axis.text=element_text(size=8)) + theme(plot.title = element_text(size=16,face="bold"))
g= g + labs(title=expression("Weather Events causing Fatalities and Injuries (excl. Tornados and Heat)"), x="Events", y="Incidents")
print(g)

The events which cause the next greatest impact on the population health can be clearly seen to be floods, it might be worth putting some extra budget and preparation into Flood type weather events as well.

Results for Events impacting the Economy

Using the preprocessed data, we can produce the following plot showing which events have the greatest impact on the US economy:

g = ggplot(data=data2_plot, aes(x=Event,y=Value, fill = Indicator))+geom_bar(stat = "identity")
g= g + theme(axis.text.x = element_text(angle = 90, hjust = 1), axis.title=element_text(size=10), axis.text=element_text(size=8)) + theme(plot.title = element_text(size=16,face="bold"))
g=g + labs(title=expression("Weather Events causing Property and Crop Damage"), x="Events", y="Damage")
print(g)

We can see here that the events having the greatest impact on the Economy are all flood related, such as:

  • Flood
  • Hurricane/Typhoon
  • Storm Surge

From these, the recommendation would be to use any resources and budget on weather events that could cause a flood, as these seem to have the largest impact on the economy.

Conclusion

The above analysis indicates that government should focus their resources and budget on Tornados, excessive heat, and weather events causing floods. Tornados are by far the biggest killer, and have very negative effect on the health of the citizens. Floods, and flood related weather events such as typhoons, hurricanes and tsunamis have a negative impact on both the health of the citizens, as well as on the economy and should therefore be taken very seriously by government.