Impact of Severe Weather Events on population health and the economy

Synopsis

The following analysis looks at the US National Oceanic and Atomspheric Adminitstations’s (NOAA) storm database which tracks characteristics of major storms and weather events in the US. Severe weather events often cause fatalities and injuries as well as vast amounts of property damage so it is important to determine which events cause the most harm to the population and the economy. In terms of population health, the storm data has been transformed and analysed to show the type of weather events that cause the most fatalities and injuries. Similarly, for economic consequences, the data has been transformed to show the weather events that cause the most property and crop damage. We find that Tornadoes, Excessive Heat and Flash Floods cause the most harm to both the population and the economy.

Data Processing

In order to transform and process the data we need to load the following packages:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date

We first need to download the data from the course website and then set our working directory to where the downloaded file has been saved.

setwd("C:/Users/AUgart01/Downloads")
stormz <- read.csv("repdata%2Fdata%2FStormData.csv.bz2")

We take a quick snapshot of the data so that we can understand how we should go about transforming the data.

dim(stormz)
## [1] 902297     37
head(stormz, n=5)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
stormz$BGN_DATE <-as.Date(stormz$BGN_DATE,format="%m/%d/%Y")
summary(stormz$BGN_DATE)
##         Min.      1st Qu.       Median         Mean      3rd Qu. 
## "1950-01-03" "1995-04-20" "2002-03-18" "1998-12-27" "2007-07-28" 
##         Max. 
## "2011-11-30"

From the dimensions function we can see that this is a large file with 902297 rows and 37 columns. From previewing the first 5 rows we can see that the data documents various aspects of different weather events. The type of weather event is stored in the column EVTYPE. We can also see data for the number of fatalities, injuries, property damages and crop damages for each weather event. Finally, we can see that the storm data starts in 1950s. We should then convert this column into a date format so that we can call the summary function; the results show us that the data spans from 1950-2011.

As technology has progressed we have been able to record more weather events as well as provide richer information for these events. We would consider the data in later years to be more complete and would therefore want to analyse this subset of data. In order to do so we must coerce the BGN_DATE into a date format and extract the year. We can then plot this in a histogram to show number of events recorded each year. From the histogram we can see that there is a large increase in the number of events recorded between 1993 and 1994. We will therefore subset the data so that we only have data from 1994-2011.

stormz$year <- year(stormz$BGN_DATE)
qplot(stormz$year,geom = "histogram",binwidth = 1,fill=I("lightblue"),
      col=I("blue"),xlab = "Year",ylab = "Number of Events")

stormz2 <- subset(stormz,stormz$year > "1993")

1. Weather Events and Concesquences on Public Health

In order to determine which weather events cause the most harm we need to aggregate all of the storm data to show the number of fatalities against each EVTYPE. We can do this using the aggregate function. From this we can sort the data by number of fatalities and select the top 10 by weather events.

stormz_aggs_fatal <- aggregate(stormz2$FATALITIES, by = list(stormz2$EVTYPE),sum)
colnames(stormz_aggs_fatal) <- c("EventType","Fatalities")
stormz_aggs_fatal <- stormz_aggs_fatal[order(stormz_aggs_fatal$Fatalities),]
top10_fatal <- stormz_aggs_fatal %>% top_n(10)
## Selecting by Fatalities

Similarly, we use the same method to look at the number of injuries caused by different weather events.

stormz_aggs_injury <- aggregate(stormz2$INJURIES, by = list(stormz2$EVTYPE),sum)
colnames(stormz_aggs_injury) <- c("EventType","Injuries")
stormz_aggs_injury <- stormz_aggs_injury[order(stormz_aggs_injury$Injuries),]
top10_injuries <- stormz_aggs_injury %>% top_n(10)
## Selecting by Injuries

1.Results

We can now plot this data using ggplot to display the data as a bar chart showing the top 10 weather events for fatalities and injuries. We arrange this in a panel plot using the grid.arrange function from the gridExtra package.

fatal <- ggplot(top10_fatal,aes(top10_fatal$EventType,top10_fatal$Fatalities)) +
  geom_bar(stat = "identity",fill = "steelblue") + coord_flip()+ xlab("Weather Event")+
  ylab("Number of Fatalities")+ggtitle("Top 10 Weather Events by Number of Fatalities")
 
injure <- ggplot(top10_injuries,aes(top10_injuries$EventType,top10_injuries$Injuries)) +
  geom_bar(stat = "identity",fill = "limegreen") + coord_flip()+ xlab("Weather Event")+
  ylab("Number of Injuries")+ggtitle("Top 10 Weather Events by Number of Injuries")

grid.arrange(fatal,injure)

From these results we can see that the weather event that causes the most fatalities is Excessive Heat followed by Tornadoes and Flash Floods. Similarly, the weather event that causes the most injuries is Tornadoes, followed by Excessive Heat and Floods.

2. Weather Events and Consequences on the Economy

In order to identify the weather events cause the most harm to the economy we must, again, aggregate the storm data to show the cost of property damage against each EVTYPE. As before, we can aggregate the data using the aggregate function and then sort and select the top 10 events with the highest property damage.

stormz_aggs_prop <- aggregate(stormz2$PROPDMG, by = list(stormz2$EVTYPE),sum)
colnames(stormz_aggs_prop) <- c("EventType","PropertyDamage")
stormz_aggs_prop <- stormz_aggs_prop[order(stormz_aggs_prop$PropertyDamage),]
top10_prop <- stormz_aggs_prop %>% top_n(10)
## Selecting by PropertyDamage

We do the same for Crop Damages:

stormz_aggs_crop <- aggregate(stormz2$CROPDMG, by = list(stormz2$EVTYPE),sum)
colnames(stormz_aggs_crop) <- c("EventType","CropDamage")
stormz_aggs_crop <- stormz_aggs_crop[order(stormz_aggs_crop$CropDamage),]
top10_crop <- stormz_aggs_crop %>% top_n(10)
## Selecting by CropDamage

2. Results:

Now that the storm data has been transformed we can plot the data. As before, we can use ggplot to diplay the data in a panel of bar charts.

property <- ggplot(top10_prop,aes(top10_prop$EventType,top10_prop$PropertyDamage)) +
  geom_bar(stat = "identity",fill = "orange") + coord_flip()+ xlab("Weather Event")+
  ylab("Property Damage USD")+ggtitle("Top 10 Weather Events by Property Damages") + 
  scale_y_continuous(labels = scales::comma)

crops <- ggplot(top10_crop,aes(top10_crop$EventType,top10_crop$CropDamage)) +
  geom_bar(stat = "identity",fill = "purple") + coord_flip()+ xlab("Weather Event")+
  ylab("Crop Damage USD")+ggtitle("Top 10 Weather Events by Crop Damages")+ 
  scale_y_continuous(labels = scales::comma)

grid.arrange(property,crops)

These results show that the weather event that incurs the most cost in terms of property damage is TSTM WINDS followed by Tornadoes and Flash Floods. The bar chart for crop damages shows that the most harmful event is Hail, followed by Flash Floods and Floods.

Conclusion

From this analysis we conclude that the the most harmful events to the population are Excessive Heat, Tornadoes and Floods, whilst the most harmful events to the economy are Strong Winds (tornado/TSTM wind), Floods and Hail.