Impact of Storm Events on Population Health and Economy

Synopsis

The purpose of this report is to analyze storm data in order to quantify the impact of storm events on population health as well as the economy. The data is obtained from US National Oceanic and Atmospheric Administration’s (NOAA) database. The affect on population health is being measured in terms of the total number of fatalities and injuries due to various storm events. Similarly, the affect on economy is being measured in terms of the damage to property and crops.

Approach

In order to accurately quantify and depict the parameters mentioned above, we have chosen to define the following questions.

Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?

The approach for both these questions are mentioned below. A common factor in both these questions and in general for any data analysis report is the manner of obtaining the data. In order to ensure reproducibility, this step too has been included in this report. The very first step of the analysis is to download, extract and load the data from the official URL mentioned in the Coursera Website. Subsequently, once the data is loaded, it can be cleaned up and various analysis can be performed.

The approach for question 1 has been detailed below:

Step 1: Calculate the sum of fatalities grouped by event type

Step 2: Calculate the sum of injuries grouped by event type

Step 3: Subset both the data sets mentioned above to get the TOP 15 ABOVE AVERAGE values

Step 4: Stack the data sets

Step 5: Plot the data and draw appropriate inferences

Similarly, the approach for question 2 has been outlined below:

Step 1: Calculate the sum of property damage grouped by event type

Step 2: Calculate the sum of crop damage grouped by event type

Step 3: Subset both the data sets mentioned above to get the TOP 15 ABOVE AVERAGE values

Step 4: Stack the data sets

Step 5: Plot the data and draw appropriate inferences

Data Processing

Let us load the data from the official URL mentioned on the Coursera site. The data is downloaded and stored in a variable named storm.data.

##### Clear the workspace
rm(list = ls())

##### URL of the bz2 file 
storm.data.url <- c("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2")
storm.data.file <- tempfile()
setInternet2(TRUE)
download.file(storm.data.url, storm.data.file)

## Warning: downloaded length 34107392 != reported length 49177144

##### Read the file into a data frame
storm.data <- read.csv ( storm.data.file )

## Warning: EOF within quoted string

##### Load the libraries needed for the analysis
library(ggplot2)
library(plyr)

After the execution of this code chunk, the data will be present in the storm.data variable. This variable will not be modified and will be used as a data source for the analysis detailed below.

Results

The results for each question and the corresponding analysis has been detailed below.

Question 1: Across the United States, which types of events are most

harmful with respect to population health?

The approach followed for this question has been mentioned above. Let us track each and every step mentioned in the approach.

Step 1: Calculate the sum of fatalities grouped by event type

fatality.impact <- ddply(storm.data , c("EVTYPE"),
             summarise , count = sum(FATALITIES),
             type = "Fatality")

Step 2: Calculate the sum of injuries grouped by event type

injury.impact <- ddply(storm.data , c("EVTYPE"),
             summarise , count = sum(INJURIES),
             type = "Injury")

Step 3: Subset both the data sets mentioned above to get the TOP 15 ABOVE AVERAGE values

##### Calculate the mean values
fatality.mean <- mean(fatality.impact$count , na.rm = TRUE)
injury.mean <- mean(injury.impact$count , na.rm = TRUE)

##### Get the above average values
fatality.impact.subset <- 
    fatality.impact[fatality.impact$count > fatality.mean , ]
##### Get the top 15 values
fatality.impact.subset <- 
    head ( fatality.impact.subset[order(- fatality.impact.subset$count), ] 
           , 15 )

##### Get the above average values
injury.impact.subset <- 
    injury.impact[injury.impact$count > injury.mean , ]
##### Get the top 15 values
injury.impact.subset <- 
    head ( injury.impact.subset[order(- injury.impact.subset$count), ] 
           , 15 )

Step 4: Stack the data sets

pop.health.impact <- rbind (fatality.impact 
                , injury.impact)

pop.health.impact.subset <- rbind (fatality.impact.subset 
                   , injury.impact.subset)

Step 5: Plot the data and draw appropriate inferences

ggplot(data = pop.health.impact.subset ,
       aes(x = EVTYPE , y= count ) ) +
    geom_bar(stat = "identity", width = 1 
         , fill = "red3" , color ="white") +
    facet_grid(type ~ . , scales="free_y") +
    
    labs(x = "Types of Events") +
    labs(y = "People Affected") +
    labs(title = "Impact of Events on Population Health") +
    theme(axis.text.x = element_text(angle = -90 , 
                     vjust = 0.5) ) +
    theme(axis.title.x = element_text(face = "bold" , size = 18)) +
    theme(axis.title.y = element_text(face = "bold" , size = 18 , 
                      vjust = 1)) +
    theme(plot.title = element_text(face = "bold" , size = 22 , vjust = 2)) +
    theme(strip.text.y = element_text(face = "bold" , size = 10))

plot of chunk 2_q1_5

As oberved in the plot, we now have the top 15 events which have caused above average fatalities and injuries. The data in the plot leads to the following answer for question 1.

Answer: The event type which causes the maximum harm to population health is a tornado. It causes the maximum number of fatalities (approximately 5633) and injuries (approximately 91346) compared to any other event type.

Following the tornado, there is no event which causes both the second highest number of fatalities AND the second highest number of injuries.

Hence, as far as fatalities are concerned, the events which cause the maximum harm to population health after tornado are Excessive Heat, Flash Floods , Heat and Lightning respectively.

On the other hand, the events which lead to the maximum number of injuries after the tornado are Excessive Heat, Flood, Lightning and TSTM wind.

Question 2: Across the United States, which types of events have

the greatest economic consequences?

The approach followed for this question has been mentioned above. Let us track each and every step mentioned in the approach.

Step 1: Calculate the sum of property damage grouped by event type

property.damage <- ddply(storm.data , c("EVTYPE"),
             summarise , count = sum(PROPDMG),
             type = "Property")

Step 2: Calculate the sum of crop damage grouped by event type

crop.damage <- ddply(storm.data , c("EVTYPE"),
             summarise , count = sum(CROPDMG),
             type = "Crop")

Step 3: Subset both the data sets mentioned above to get the TOP 15 ABOVE AVERAGE values

property.damage.mean <- mean(property.damage$count , na.rm = TRUE)
crop.damage.mean <- mean(crop.damage$count , na.rm = TRUE)

property.damage.subset <- 
    property.damage[property.damage$count > property.damage.mean , ]
property.damage.subset <- 
    head ( property.damage.subset[order(- property.damage.subset$count), ] 
           , 15 )

crop.damage.subset <- 
    crop.damage[crop.damage$count > crop.damage.mean , ]
crop.damage.subset <- 
    head ( crop.damage.subset[order(- crop.damage.subset$count), ] 
           , 15 )

Step 4: Stack the data sets

economic.impact <- rbind(property.damage,
             crop.damage)

economic.impact.subset <- rbind(property.damage.subset, 
                crop.damage.subset)

Step 5: Plot the data and draw appropriate inferences

ggplot(data = economic.impact.subset ,
       aes(x = EVTYPE , y= count )) +
    geom_bar(stat = "identity", width = 1 
         , fill = "red3" , color = "white") +
    facet_grid(type ~ . , scales="free_y") +
    
    labs(x = "Types of Events") +
    labs(y = "Damage") +
    labs(title = "Impact of Events on Economy") +
    theme(axis.text.x = element_text(angle = -90 , 
                     vjust = 0.5) ) +
    theme(axis.title.x = element_text(face = "bold" , size = 18)) +
    theme(axis.title.y = element_text(face = "bold" , size = 18 , 
                      vjust = 1)) +
    theme(plot.title = element_text(face = "bold" , size = 22 , vjust = 2)) +
    theme(strip.text.y = element_text(face = "bold" , size = 10))

plot of chunk 3_q2_5

As oberved in the plot, we now have the top 15 events which have caused above average property and crop damage. The data in the plot leads to the following answer for question 2.

Answer:

There is no event which causes maximum crop damage AND maximum property damage.

The event which causes the maximum property damage is the Tornado (approximately 3212258 units). This is followed by Flash Floods, TSTSM Wind, Flood and Thunderstorm Wind.

The event which causes the maximum crop damage is Hail (approximately 579596 units). This is followed by Flash Flood, Flood and TSTM Wind.