Synopsis

The following analysis is a cursory review of the data from the National Oceanic and Atmospheric Administration (NOAA) storm database.

The analysis looks at the impact of storms on human health and economic impact.

I looked at the events from 1989 to 2011.

The events that have the highest impact on human health are tornado, heat, and flood. While the events with the greatest economic impact are Flood, Hurricane, and Storm Surge.


Data Processing

Load Required Libaries

library(dplyr)      # used to edit the data frames
library(lubridate)  # used to convert dates
library(lattice)    # used to plot the graphs

Data Set Source

The data set comes from the NOAA. An archived version of the data was used. NOAA Storm Data

#assign local file name
data_file <- paste(getwd(),"/noaa_data.bz2", sep = "")

# Check if the file already exists; if it doesn't download it
# https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
if(!file.exists(data_file))
{
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
                  , destfile = data_file
                  , quiet = TRUE)
}

Load Data Set

# Load the NOAA data
# Note: 902,297 observations are in the data
noaa_data <- read.csv(data_file, as.is = TRUE)

Data Dictionary

Detailed documentation about the data set is available from the National Weather Service website.

Storm Data Documentation

Fields from the NOAA Storm database used in the analysis are:

field_names field_types field_description
BGN_DATE chr Beginning date of the event
EVTYPE chr Event Type
FATALITIES num Number of Fatalaties recorded during event
INJURIES num Number of Injuries recorded during event
PROPDMG num Amount of Property Damage recorded during event stated in factor from field PROPDMGEXP
PROPDMGEXP chr Amount of Property Damage factor; see appendix B in the Storm Data Documentation for details
CROPDMG num Amount of Crop Damage recorded during event stated in factor from field CROPDMGEXP
CROPDMGEXP chr Amount of Crop Damage factor; see appendix B in the Storm Data Documentation for details

Subsetting the Data

Determine what data years are most significant out of the data set. We will limit ourselves to years that have a minimum of 1% of total events recorded.

The reason to limit the data used is that older data may be inconsistent or incomplete compared to more current data.

Correcting Event Type Data

It was observed that the event type data may have had spelling inconsistencies and reporting inconsistencies. Because I limited myself to the years where the total events where greater than 1%; it removed many of those issues. There was the possibility of combining some event types together. However I understood the assignment directions to want to include all event types as is. Additionally when looking at some of the event type names that could possibly be combined it wouldn’t have changed the overall results. Without a clear and concise methodology for being able to change the event type names I left them as is.

# creat a summary table that groups by year of the event
event_count_by_year <-  noaa_data %>% 
                        group_by(year = year(as.Date(BGN_DATE,'%m/%d/%Y'))) %>% 
                        summarise(count_obs = n())

# calculate the percentage of the events by year
event_count_by_year$percent_of_events <- prop.table(event_count_by_year$count_obs)

Subset Data Details

# years that are > 1%
number_of_years <- sum(event_count_by_year$percent_of_events > 0.01)

# display the summary table
# Shows the years that have at least 1% of the total events in the storm database
print (event_count_by_year[event_count_by_year$percent_of_events > 0.01,],n=as.integer(count(event_count_by_year)))
## # A tibble: 23 x 3
##     year count_obs percent_of_events
##    <dbl>     <int>             <dbl>
##  1  1989     10410        0.01153722
##  2  1990     10946        0.01213126
##  3  1991     12522        0.01387791
##  4  1992     13534        0.01499950
##  5  1993     12607        0.01397212
##  6  1994     20631        0.02286498
##  7  1995     27970        0.03099866
##  8  1996     32270        0.03576428
##  9  1997     28680        0.03178554
## 10  1998     38128        0.04225660
## 11  1999     31289        0.03467705
## 12  2000     34471        0.03820361
## 13  2001     34962        0.03874777
## 14  2002     36293        0.04022290
## 15  2003     39752        0.04405645
## 16  2004     39363        0.04362533
## 17  2005     39184        0.04342694
## 18  2006     44034        0.04880211
## 19  2007     43289        0.04797644
## 20  2008     55663        0.06169033
## 21  2009     45817        0.05077818
## 22  2010     48161        0.05337599
## 23  2011     62174        0.06890636
#Date Range of Events
first_year  <- min(event_count_by_year[event_count_by_year$percent_of_events > 0.01,]$year)
last_year   <- max(event_count_by_year[event_count_by_year$percent_of_events > 0.01,]$year) 


# What percentage of events for subset years
percent_max_events <- round(sum(event_count_by_year[event_count_by_year$percent_of_events > 0.01,]$percent_of_events) * 100,1)

Data Summary

The total number of years of data used in the analysis: 23

The Date range of events: 1989 to 2011

Percentage of Events for the 23 years in the data subset: 84.5%


Results

Most Harmful Event Types

Across the United States, which types of events are most harmful with respect to population health?

Below shows the top 10 events that are harmful to population health. The events that are most harmful are tornado, excessive heat, and flood.

Tornadoes are the most dangerous events because of the occurrences of them and lack of time to respond, i.e. take shelter. While heat and flood are next most dangerous events because of their wide-impact areas.

# Summarize fatalaties and injuries by event type for selected years
health_data <-  noaa_data %>% 
                filter(year(as.Date(BGN_DATE,'%m/%d/%Y')) >= first_year) %>%
                group_by(Event_Type = EVTYPE) %>% 
                summarise(Fatalities = sum(FATALITIES)
                          ,Injuries = sum(INJURIES)
                          ,Incidents = sum(FATALITIES + INJURIES))  

# sort the data by highest number of fatalities, then highest number of injureis
health_data <-  health_data %>% 
                arrange(desc(Incidents), desc(Fatalities), desc(Injuries))

# create the bar chart
# sorted by total incidents; highest to lowest
barchart(Incidents ~ reorder(Event_Type, -Incidents) 
         , data = head(health_data,10)
         , mar = c(8,4,4,2)
         , col = 104
         , main = paste("Total Incidents for Top Ten Event Types from", first_year, "to", last_year)
         , xlab = "Event Type"
         , ylab = "Incidents (Fatalities + Injuries)"
         , scales=list(y=list(rot=0), x=list(rot=90, cex=0.7))
         )

Greatest Economic Consequences?

Across the United States, which types of events have the greatest economic consequences?

The events that have the greatest economic consequences are flood, hurricane, and storm surge. While tornadoes have higher injuries their impact area is lower than floods, hurricanes and storm surge. The wide impact area for these type of events causes their large economic impact.

# Function to convert the factor level to similiar units
# if no unit applied than default of zero
convert_units <- function(unit_code)
                {   
                    switch(unit_code    ,'0' = 1        ,'1' = 10       ,'2' = 100
                                        ,'3' = 1000     ,'4' = 10000    ,'5' = 1e+05
                                        ,'6' = 1e+06    ,'7' = 1e+07    ,'8' = 1e+08
                                        ,'9' = 1e+09    ,'H' = 100      ,'K' = 1000
                                        ,'M' = 1e+06    ,'B' = 1e+09    , 0
                            )
                }

# subset the columns 
economic_data <-    noaa_data %>%
                    filter(year(as.Date(BGN_DATE,'%m/%d/%Y')) >= first_year) %>%
                    select(Event_Type = EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

# calculate the total damage with similar units
# convert the number into Billions of USD
# Round the value to 2 decimal places
economic_data$total_damage <-round((economic_data$PROPDMG * sapply(economic_data$PROPDMGEXP, convert_units) +
                              economic_data$CROPDMG * sapply(economic_data$CROPDMGEXP, convert_units)) / 1e+09 , 2)

# Summarize property damage by event type for selected years
economic_data <-    economic_data %>% 
                    group_by(Event_Type) %>%
                    summarise(Total_Damage = sum(total_damage)) %>%
                    arrange(desc(Total_Damage))

# plot the chart showing the total damage; property damage + crop damage
# show the amount in billions of USD
barchart(Total_Damage ~ reorder(Event_Type, -Total_Damage) 
         , data = head(economic_data,10)
         , mar = c(8,4,4,2)
         , col = 104
         , main = paste("Total Damage for Top Ten Event Types from", first_year, "to", last_year)
         , xlab = "Event Type"
         , ylab = "Total Damage in Billions of USD"
         , scales=list(y=list(rot=0), x=list(rot=90, cex=0.7))
         )


Appendix A

Assignment Description

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.

Questions

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

Review Criteria

  1. Has either a (1) valid RPubs URL pointing to a data analysis document for this assignment been submitted; or (2) a complete PDF file presenting the data analysis been uploaded?
    • Yes
  2. Is the document written in English?
    • Yes
  3. Does the analysis include description and justification for any data transformations?
    • Yes
  4. Does the document have a title that briefly summarizes the data analysis?
    • Yes
  5. Does the document have a synopsis that describes and summarizes the data analysis in less than 10 sentences?
    • Yes
  6. Is there a section titled “Data Processing” that describes how the data were loaded into R and processed for analysis?
    • Yes
  7. Is there a section titled “Results” where the main results are presented?
    • Yes
  8. Is there at least one figure in the document that contains a plot?
    • Yes
  9. Are there at most 3 figures in this document?
    • Yes
  10. Does the analysis start from the raw data file (i.e. the original .csv.bz2 file)?
    • Yes
  11. Does the analysis address the question of which types of events are most harmful to population health?
    • Yes; listed in bar plot; showing the top 10
  12. Does the analysis address the question of which types of events have the greatest economic consequences?
    • Yes; listed in bar plot; showing the top 10
  13. Do all the results of the analysis (i.e. figures, tables, numerical summaries) appear to be reproducible?
    • Yes
  14. Do the figure(s) have descriptive captions (i.e. there is a description near the figure of what is happening in the figure)?
    • Yes
  15. As far as you can determine, does it appear that the work submitted for this project is the work of the student who submitted it?
    • Yes; It is my own work.