The following analysis is a cursory review of the data from the National Oceanic and Atmospheric Administration (NOAA) storm database.
The analysis looks at the impact of storms on human health and economic impact.
I looked at the events from 1989 to 2011.
The events that have the highest impact on human health are tornado, heat, and flood. While the events with the greatest economic impact are Flood, Hurricane, and Storm Surge.
library(dplyr) # used to edit the data frames
library(lubridate) # used to convert dates
library(lattice) # used to plot the graphs
The data set comes from the NOAA. An archived version of the data was used. NOAA Storm Data
#assign local file name
data_file <- paste(getwd(),"/noaa_data.bz2", sep = "")
# Check if the file already exists; if it doesn't download it
# https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
if(!file.exists(data_file))
{
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
, destfile = data_file
, quiet = TRUE)
}
# Load the NOAA data
# Note: 902,297 observations are in the data
noaa_data <- read.csv(data_file, as.is = TRUE)
Detailed documentation about the data set is available from the National Weather Service website.
Fields from the NOAA Storm database used in the analysis are:
| field_names | field_types | field_description |
|---|---|---|
| BGN_DATE | chr | Beginning date of the event |
| EVTYPE | chr | Event Type |
| FATALITIES | num | Number of Fatalaties recorded during event |
| INJURIES | num | Number of Injuries recorded during event |
| PROPDMG | num | Amount of Property Damage recorded during event stated in factor from field PROPDMGEXP |
| PROPDMGEXP | chr | Amount of Property Damage factor; see appendix B in the Storm Data Documentation for details |
| CROPDMG | num | Amount of Crop Damage recorded during event stated in factor from field CROPDMGEXP |
| CROPDMGEXP | chr | Amount of Crop Damage factor; see appendix B in the Storm Data Documentation for details |
Determine what data years are most significant out of the data set. We will limit ourselves to years that have a minimum of 1% of total events recorded.
The reason to limit the data used is that older data may be inconsistent or incomplete compared to more current data.
It was observed that the event type data may have had spelling inconsistencies and reporting inconsistencies. Because I limited myself to the years where the total events where greater than 1%; it removed many of those issues. There was the possibility of combining some event types together. However I understood the assignment directions to want to include all event types as is. Additionally when looking at some of the event type names that could possibly be combined it wouldn’t have changed the overall results. Without a clear and concise methodology for being able to change the event type names I left them as is.
# creat a summary table that groups by year of the event
event_count_by_year <- noaa_data %>%
group_by(year = year(as.Date(BGN_DATE,'%m/%d/%Y'))) %>%
summarise(count_obs = n())
# calculate the percentage of the events by year
event_count_by_year$percent_of_events <- prop.table(event_count_by_year$count_obs)
# years that are > 1%
number_of_years <- sum(event_count_by_year$percent_of_events > 0.01)
# display the summary table
# Shows the years that have at least 1% of the total events in the storm database
print (event_count_by_year[event_count_by_year$percent_of_events > 0.01,],n=as.integer(count(event_count_by_year)))
## # A tibble: 23 x 3
## year count_obs percent_of_events
## <dbl> <int> <dbl>
## 1 1989 10410 0.01153722
## 2 1990 10946 0.01213126
## 3 1991 12522 0.01387791
## 4 1992 13534 0.01499950
## 5 1993 12607 0.01397212
## 6 1994 20631 0.02286498
## 7 1995 27970 0.03099866
## 8 1996 32270 0.03576428
## 9 1997 28680 0.03178554
## 10 1998 38128 0.04225660
## 11 1999 31289 0.03467705
## 12 2000 34471 0.03820361
## 13 2001 34962 0.03874777
## 14 2002 36293 0.04022290
## 15 2003 39752 0.04405645
## 16 2004 39363 0.04362533
## 17 2005 39184 0.04342694
## 18 2006 44034 0.04880211
## 19 2007 43289 0.04797644
## 20 2008 55663 0.06169033
## 21 2009 45817 0.05077818
## 22 2010 48161 0.05337599
## 23 2011 62174 0.06890636
#Date Range of Events
first_year <- min(event_count_by_year[event_count_by_year$percent_of_events > 0.01,]$year)
last_year <- max(event_count_by_year[event_count_by_year$percent_of_events > 0.01,]$year)
# What percentage of events for subset years
percent_max_events <- round(sum(event_count_by_year[event_count_by_year$percent_of_events > 0.01,]$percent_of_events) * 100,1)
Data Summary
The total number of years of data used in the analysis: 23
The Date range of events: 1989 to 2011
Percentage of Events for the 23 years in the data subset: 84.5%
Across the United States, which types of events are most harmful with respect to population health?
Below shows the top 10 events that are harmful to population health. The events that are most harmful are tornado, excessive heat, and flood.
Tornadoes are the most dangerous events because of the occurrences of them and lack of time to respond, i.e. take shelter. While heat and flood are next most dangerous events because of their wide-impact areas.
# Summarize fatalaties and injuries by event type for selected years
health_data <- noaa_data %>%
filter(year(as.Date(BGN_DATE,'%m/%d/%Y')) >= first_year) %>%
group_by(Event_Type = EVTYPE) %>%
summarise(Fatalities = sum(FATALITIES)
,Injuries = sum(INJURIES)
,Incidents = sum(FATALITIES + INJURIES))
# sort the data by highest number of fatalities, then highest number of injureis
health_data <- health_data %>%
arrange(desc(Incidents), desc(Fatalities), desc(Injuries))
# create the bar chart
# sorted by total incidents; highest to lowest
barchart(Incidents ~ reorder(Event_Type, -Incidents)
, data = head(health_data,10)
, mar = c(8,4,4,2)
, col = 104
, main = paste("Total Incidents for Top Ten Event Types from", first_year, "to", last_year)
, xlab = "Event Type"
, ylab = "Incidents (Fatalities + Injuries)"
, scales=list(y=list(rot=0), x=list(rot=90, cex=0.7))
)
Across the United States, which types of events have the greatest economic consequences?
The events that have the greatest economic consequences are flood, hurricane, and storm surge. While tornadoes have higher injuries their impact area is lower than floods, hurricanes and storm surge. The wide impact area for these type of events causes their large economic impact.
# Function to convert the factor level to similiar units
# if no unit applied than default of zero
convert_units <- function(unit_code)
{
switch(unit_code ,'0' = 1 ,'1' = 10 ,'2' = 100
,'3' = 1000 ,'4' = 10000 ,'5' = 1e+05
,'6' = 1e+06 ,'7' = 1e+07 ,'8' = 1e+08
,'9' = 1e+09 ,'H' = 100 ,'K' = 1000
,'M' = 1e+06 ,'B' = 1e+09 , 0
)
}
# subset the columns
economic_data <- noaa_data %>%
filter(year(as.Date(BGN_DATE,'%m/%d/%Y')) >= first_year) %>%
select(Event_Type = EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
# calculate the total damage with similar units
# convert the number into Billions of USD
# Round the value to 2 decimal places
economic_data$total_damage <-round((economic_data$PROPDMG * sapply(economic_data$PROPDMGEXP, convert_units) +
economic_data$CROPDMG * sapply(economic_data$CROPDMGEXP, convert_units)) / 1e+09 , 2)
# Summarize property damage by event type for selected years
economic_data <- economic_data %>%
group_by(Event_Type) %>%
summarise(Total_Damage = sum(total_damage)) %>%
arrange(desc(Total_Damage))
# plot the chart showing the total damage; property damage + crop damage
# show the amount in billions of USD
barchart(Total_Damage ~ reorder(Event_Type, -Total_Damage)
, data = head(economic_data,10)
, mar = c(8,4,4,2)
, col = 104
, main = paste("Total Damage for Top Ten Event Types from", first_year, "to", last_year)
, xlab = "Event Type"
, ylab = "Total Damage in Billions of USD"
, scales=list(y=list(rot=0), x=list(rot=90, cex=0.7))
)
The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.