Set Global Options

knitr::opts_chunk$set(warning=FALSE, message = FALSE, echo = TRUE)

Synopsis

The main goal of this study is determine which severe weather events are most harmful to the public health, and which weather events cause the most economic damage, both to crops and property , in the United States. These two insights will be gleaned by analyzing data from the NOAA Storm Database.

Data Processing

Before beginning data cleaning and analysis, we will load in the necessary libraries for this project.

library(ggplot2)

The first step in processing the data is to load it into R Studio. To do this we will call the read.csv() with the assumption that the file has been saved within the project folder.

weather_csv <- read.csv(file = "repdata_data_StormData.csv.bz2", stringsAsFactors = FALSE)

Next, we will pull the 7 fields we will do analyis on and store them in a new data frame.

project_fields <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
weather_select <- weather_csv[,project_fields]

Now that we have the data we will need for the rest of the report, we will want to change the values given in PROPDMGEXP and CROPDMGEXP from letters to numbers that we can use to determine the amount of damage done by each recorded weather event.

weather_select$PROPDMGEXP[weather_select$PROPDMGEXP == "H" | weather_select$PROPDMGEXP == "h"] <- 100
weather_select$PROPDMGEXP[weather_select$PROPDMGEXP == "K" | weather_select$PROPDMGEXP == "k"] <- 1000
weather_select$PROPDMGEXP[weather_select$PROPDMGEXP == "M" | weather_select$PROPDMGEXP == "m"] <- 1000000
weather_select$PROPDMGEXP[weather_select$PROPDMGEXP == "B" | weather_select$PROPDMGEXP == "b"] <- 1000000000
weather_select$PROPDMGEXP[weather_select$PROPDMGEXP == ""] <- 1

weather_select$CROPDMGEXP[weather_select$CROPDMGEXP == "H" | weather_select$CROPDMGEXP == "h"] <- 100
weather_select$CROPDMGEXP[weather_select$CROPDMGEXP == "K" | weather_select$CROPDMGEXP == "k"] <- 1000
weather_select$CROPDMGEXP[weather_select$CROPDMGEXP == "M" | weather_select$CROPDMGEXP == "m"] <- 1000000
weather_select$CROPDMGEXP[weather_select$CROPDMGEXP == "B" | weather_select$CROPDMGEXP == "b"] <- 1000000000
weather_select$CROPDMGEXP[weather_select$CROPDMGEXP == ""] <- 1

We will then create two new columns “Crop_Damage” and “Property_Damage” to store the total amount of damage calculated with our new CROPDMGEXP and PROPDMGEXP values.

weather_select$Property_Damage <- as.numeric(weather_select$PROPDMG)*as.numeric(weather_select$PROPDMGEXP)
weather_select$Crop_Damage <- as.numeric(weather_select$CROPDMG)*as.numeric(weather_select$CROPDMGEXP)

For the final data preprocessing, we will use the aggregrate function with a sum function to gather all the events together and sum their total crop and property damage, respectively. We will store these fields in a new data frame called “group_weather.”

group_weather <- aggregate(weather_select[, c("FATALITIES", "INJURIES", "Property_Damage", "Crop_Damage")],
                           by = list(weather_select$EVTYPE), "sum")
colnames(group_weather) <- c("Weather_Event", "FATALITIES", "INJURIES", "Property_Damage", "Crop_Damage")

Results

Weather Events: Health Impact

The first results we will look at are the health results. Before plotting the health results, we will create a total health impact field that shows the sum of “INJURIES” and “FATALITIES,” but applies a weight to “FATALITIES.” We then order the events be descending impact and subset only the top ten events for ease of readability.

group_weather$Health <- (group_weather$FATALITIES*10) + group_weather$INJURIES
health_weather <- group_weather[order(-group_weather$Health),][1:10,]
health_weather$Weather_Event <- factor(health_weather$Weather_Event, level = health_weather$Weather_Event)

Next, we will plot the top ten weather events based on their impact to public health by using ggplot.

health_plot <- ggplot(data = health_weather, mapping = aes(x = Weather_Event, y = Health)) +
  geom_bar(stat = "identity", fill = "red") + 
  labs(title = "Top 10 Weather Events by Magnitude on Public Health from 1950 and 2011", 
       x = "Type of Weather Event", Y = "Injuries and Fatalities") +
    theme(axis.text.x = element_text(angle = 90))
health_plot

As you can see from the graph, by far the largest impacter on public health was Tornadoes.

Weather Events: Crop Impact

We will next look at weather events by the impact they had crops. To do this, we will again order the weather events by decreasing order and create a new data frame to store our fields in which is named “crop_weather.”

crop_weather <- group_weather[order(-group_weather$Crop_Damage),][1:10,]
crop_weather$Weather_Event <- factor(crop_weather$Weather_Event, level = crop_weather$Weather_Event)

We will then use ggplot to plot the top ten weather events based on their economic impact on crops.

crop_plot <- ggplot(data = crop_weather, mapping = aes(x = Weather_Event, y = Crop_Damage)) +
  geom_bar(stat = "identity", fill = "yellow") +
  labs(title = "Top 10 Weather Events by Magnitude of Crop Damage from 1950 to 2011", 
       x = "Type of Weather Event", y = "Damage Done in USD") +
    theme(axis.text.x = element_text(angle = 90))
crop_plot

Again, we have one clear leader in causes of crop damage as drought has caused more than twice the amount of damage as the next closest event. However, the next three events: flood, river flood and ice storm, are higher than the three events that followed tornadoes in effect on public health.

Weather Events: Property Damage

The final impacted area we will observe in this report is property damage. As we have done with the last two plots, we will need to order the events in descending order and create a new data frame, “property_weather.”

property_weather <- group_weather[order(-group_weather$Property_Damage),][1:10,]
property_weather$Weather_Event <- factor(property_weather$Weather_Event, level = property_weather$Weather_Event)

Next we will use ggplot to graph the top 10 events by their impact on property damage.

property_plot <- ggplot(data = property_weather, mapping = aes(x = Weather_Event, y = Property_Damage)) +
  geom_bar(stat = "identity", fill = "green") +
  labs(title = "Top 10 Weather Events by Magnitude of Property Damage form 1950 to 2011",
       x = "Type of Weather Event", y = "Damage Done in USD") +
    theme(axis.text.x = element_text(angle = 90))
property_plot

From our graph, we can see that the top impacter on property damage is Flood. Looking at the top 5, we can see that most property damage is caused by water, be it tropical storm or inland flooding.