Introduction or Synopsis

The purpose of this report is to use the NOAA Storm Database to answer basic questions about severe weather events. This version of the NOAA Storm Database contains information about severe weather events from 1950 until November 2011. Data documentation indicates that in earlier years there may be some The report will be focused on two key questions; namely,

Data Processing

The code below illustrate the way data were transformed to produce these results. First, I read the data from the compressed file provided by NOAA which contains deaths from 1950 until November 2011. This is accomplished by using the read.csv function in RStudio (see below). This data file was not manipulated directly and the source data remains unchanged even when I process it through the statistical software called RStudio. All code used to process and manipulate the data are included below with appropriate documentation. Measures The NOAA data includes a variable called EVTYPE which defines the type of event represented by each observation in the dataset. The NOAA data also includes information about injuries, fatalities, and damages caused or associated by the event. Analytical Approach The analyses consisted of processing the data by types of events to create summary figures that allow us to answer the two research questions mentioned in the Introduction. I leveraged summary functions that allow us to aggregate data by type of event. To evaluate the harms to population health, I tabulate the total number of fatalities and injuries by event. To evaluate the harm with respect to the economy, I tabulate the number total damage by event for events with estimates above $1,000. Given the interest in the most impact analyses, limiting the analysis to them is unlikely to influence the top categories.

First, data are presented in three tables. Each table contains information for a specific focal variable. The report also includes two figures, one for fatalities and one for injuries. The data used for the figures is extracted from the tables presented for each variable.

In the following code I read the data and determine the number of observations it includes.

Read the data

data <- read.csv("repdata_data_StormData.csv.bz2")
sample<-count(data)

Quick Data Examination

I ask RStudio to show me the first 5 recorded events in the data. This will allow me to confirm the data was imported correctly.

head(data$COUNTYNAME,n=5)
## [1] "MOBILE"  "BALDWIN" "FAYETTE" "MADISON" "CULLMAN"
head(data$EVTYPE,n=5)
## [1] "TORNADO" "TORNADO" "TORNADO" "TORNADO" "TORNADO"

It seems the first five observations are from counties in Alabama. For instance, Mobile, Alabama is the hometown for the character Forrest Gump in the movie by the same name. The first five observations are of Tornadoes recorded in those counties.

Results

Human cost of environmental disasters

The code below summarizes the number of fatalities and injuries from the NOAA Storm Data.

data$Category<-data$EVTYPE #New Categorical Variable for label purposes

#Summary table for fatalities by event type
table1<- data %>%
    group_by(Category) %>% 
  summarise(Fatalities = sum(FATALITIES))

#Reorder observations by most fatalities by event
table1_topf<-table1[with(table1,order(-Fatalities)),]
#Only keep top 10 categories or event types
table1_topf <- table1_topf[1:10,]

#This code produces the table for most fatal events
knitr::kable(table1_topf, align = "lcc", caption="Table 1. Top 10 most fatal events in the US (NOAA Data)")
Table 1. Top 10 most fatal events in the US (NOAA Data)
Category Fatalities
TORNADO 5633
EXCESSIVE HEAT 1903
FLASH FLOOD 978
HEAT 937
LIGHTNING 816
TSTM WIND 504
FLOOD 470
RIP CURRENT 368
HIGH WIND 248
AVALANCHE 224
#The following code is the code for injuries, and follow the same logic as in fatalities
table1<- data %>%
    group_by(Category) %>% 
  summarise(Injuries=sum(INJURIES))

table1_topi<-table1[with(table1,order(-Injuries)),]
table1_topi <- table1_topi[1:10,]

#Here, I produce the Table
knitr::kable(table1_topi, align = "lcc",caption="Table 2. Top 10 most injuries caused in US (NOAA Data)")
Table 2. Top 10 most injuries caused in US (NOAA Data)
Category Injuries
TORNADO 91346
TSTM WIND 6957
FLOOD 6789
EXCESSIVE HEAT 6525
LIGHTNING 5230
HEAT 2100
ICE STORM 1975
FLASH FLOOD 1777
THUNDERSTORM WIND 1488
HAIL 1361

The type of event causing most fatalities is Tornado with 5,633 deaths. This is followed by Heat and flash floods with 1,903 and 978 deaths, respectively.The total number of injuries shows a similar pattern with Tornado being the top cause with 91,346 injuries. This is followed by thunderstorms and floods, 6,957 and 6,789 injuries, respectively. Excessive Heat and Flash floods remain in the top-10 causes of injury but not as prominent as it was in fatalities. They are the fourth and eight cause. There is another cause of injury, labeled Heat, that if added to Excessive Heat would bring these injuries to the second cause. However aggregating deaths by categories that may capture similar phenomenon is beyond the scope of this report.

Visual comparison of the events

The following visualizations seeks to determine the level of congruence found in the top-10 types of events and the ranking of these within fatalities and injuries. Each Figure shows the total casualties by top-10 type of event.

library(ggplot2)
ggplot(table1_topf, aes(x=reorder(Category,-Fatalities), y=Fatalities)) +
  geom_bar(stat="identity",width=0.50)+
  theme(axis.text.x = element_text(angle = 45, vjust=1, hjust=1))+
  labs(x="Type of Event",title = "Fig 1. Fatalities by top-10 events in the US")

library(ggplot2)
ggplot(table1_topi, aes(x=reorder(Category,-Injuries), y=Injuries)) +
  geom_bar(stat="identity",width=0.50)+
  theme(axis.text.x = element_text(angle = 45, vjust=1, hjust=1))+
  labs(x="Type of Event",title = "Fig 2. Injuries by top-10 events in the US")

As expected, Tornado is the top-cause - we knew this from the tables - but the difference in magnitude is more evident when we visualize the data this way.

Economic cost of environmental disasters in the US

The NOAA Data includes estimates of property damage for each event included in the NOAA Storm Data. Following the methodological guidelines, I accessed the variable PROPDMG, which contains the numeric value of the costs and the PROPDMGEXP includes the magnitude (i.e. hundreds, thousands, millions, billions, etc.). I transform the raw numbers in the corresponding dollar amount before proceeding to analyze the top-10 environmental disaster-related property damage events in the US.

Data processing for damage estimation

#I create a variable identifying observations with more than $1K damages
data$KEEP<-ifelse(data$PROPDMGEXP=="K",1,
             ifelse(data$PROPDMGEXP=="B",1,
             ifelse(data$PROPDMGEXP=="M",1,0)))

#Produce final damage estimates with comparable scales
data$damage_est<-ifelse(data$PROPDMGEXP=="K",data$PROPDMG*1000,
             ifelse(data$PROPDMGEXP=="M",data$PROPDMG*1000000,
             ifelse(data$PROPDMGEXP=="B",data$PROPDMG*1000000000,0)))

data2<-subset(data,KEEP==1) #Removes observations with less than $1K damage
sample2<-count(data2) #Gets the number of observations with more than 1k damage

For this section of the analysis, I concentrated on events with more than $1,000 USD in damages. This reduced the analytic sample from 902297 to 436035. Given the focal economic variable of this section is top-10 events, there is high likelihood that limiting the analysis to these observations will affect the conclusions derived from the analysis.

data2$Category<-data2$EVTYPE

table1d<- data2 %>%
    group_by(Category) %>% 
  summarise(Damages= sum(damage_est))

table1_topd<-table1d[with(table1d,order(-Damages)),]
table1_topd <- table1_topd[1:10,]

table1_topd$Damages<-formatC(table1_topd$Damages, format="f", big.mark = ",", digits=0)

knitr::kable(table1_topd, align = "lr", caption="Table 3. 10 events that caused most cost damage in the US (NOAA Data)")
Table 3. 10 events that caused most cost damage in the US (NOAA Data)
Category Damages
FLOOD 144,657,709,800
HURRICANE/TYPHOON 69,305,840,000
TORNADO 56,925,660,480
STORM SURGE 43,323,536,000
FLASH FLOOD 16,140,811,510
HAIL 15,727,366,720
HURRICANE 11,868,319,010
TROPICAL STORM 7,703,890,550
WINTER STORM 6,688,497,250
HIGH WIND 5,270,046,260

Flood, Hurricane/Typhoon, and Tornado are the top-10 events that have caused the most damages from 1950 until November 2011. The damages by flood are more than double as those observed for the hurricanes/typhoon and almost three times as larger as the costs recorded for tornado. Similar to the previous instance, there are some types of events that could be aggregated together such as Floods and Flash Floods. This is also the case for Hurricanes and Tropical Storms. Again, such a nuanced examination is beyond the scope of this project. An analysis of this type is warranted in the future.

Conclusion

The research questions that guided this analysis were: 1) Across the United States, which types of events are most harmful with respect to population health? and 2) Across the United States, which types of events are most harmful with respect to the economy?. According to the data obtained from the NOAA Storm Data, Tornadoes are the most harmful events concerning population health as they have caused the most fatalities and injuries. On the other hand, Floods are the most harmful with respect to the economy.

Authorship and Ethics Statement

The student certifies this is own work.