Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
This report aims to respond these two questions:
Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
The file containing the is available in https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2.
The data is a comma separetd file and missing values are coded as blank fields. Each collumn contains one variable and each row represents one measurement of all variables. There is a header that contains the name of each collumn variable.
library(lubridate)
library(dplyr)
data<-read.csv("FStormData.csv.bz2", header = TRUE, na.strings = "")
After reading in the file we check the first few rows (there are 902,297) rows in this dataset.
dim(data)
## [1] 902297 37
head(data[, 1:13])
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME
## 1 TORNADO 0 <NA> <NA> <NA> <NA>
## 2 TORNADO 0 <NA> <NA> <NA> <NA>
## 3 TORNADO 0 <NA> <NA> <NA> <NA>
## 4 TORNADO 0 <NA> <NA> <NA> <NA>
## 5 TORNADO 0 <NA> <NA> <NA> <NA>
## 6 TORNADO 0 <NA> <NA> <NA> <NA>
The collumn names are fine for understand what each one represents. For the subjects we are interested, the following collumns will be evaluated:
Here we extract some basic informations:
total.fatalities<-sum(data$FATALITIES)
Total fatalities due to severe weather is 15145 between 1950 and 2011.
total.injuries<-sum(data$INJURIES)
Total injuries due to severe weather is 140528 between 1950 and 2011.
library(dplyr)
## To compute the total damage, it will be created one variable that is equl 6 if ## PROPDMGEXP equal "M" or 3 if not.
data<-mutate(data, propexp = ifelse(PROPDMGEXP == "M", 6 ,3))
total.propdmg<-sum(data$PROPDMG*10^data$propexp/10^9, na.rm = TRUE)
Total property damages due to severe weather is 151.44 billion dollars between 1950 and 2011.
library(dplyr)
## To compute the total damage, it will be created one variable that is equl 6 if ## PROPDMGEXP equal "M" or 3 if not.
data<-mutate(data, cropexp = ifelse(CROPDMGEXP == "M", 6 ,3))
total.cropdmg<-sum(data$CROPDMG*10^data$cropexp/10^9, na.rm = TRUE)
Total crop damages due to severe weather is 35.48 billion dollars between 1950 and 2011.
Just to illustrate, it will be also shown the event with the largest number of fatalities.
library(lubridate)
most_fatal_event<-data[which(data$FATALITIES == max(data$FATALITIES)),]
most_fatal_event_year<-year(mdy_hms(as.character(most_fatal_event$BGN_DATE)))
This event occured in IL state in 1995 due to HEAT, resulting in 583 deaths.
In order to answer the first questions, it will be evaluated the top 10 events with more accumulated deaths and injuries.
library(dplyr)
fatalities<-data %>%
group_by(EVTYPE) %>%
summarise(FATALITIES = sum(FATALITIES))
fatal_arranged<-arrange(fatalities, desc(FATALITIES))
print(fatal_arranged[1:10,])
## # A tibble: 10 × 2
## EVTYPE FATALITIES
## <fctr> <dbl>
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
We can see that TORNADO is the event with most accumulated deaths with 5633 occurences.
injuries<-data %>%
group_by(EVTYPE) %>%
summarise(INJURIES = sum(INJURIES))
injuries_arranged<-arrange(injuries, desc(INJURIES))
print(injuries_arranged[1:10,])
## # A tibble: 10 × 2
## EVTYPE INJURIES
## <fctr> <dbl>
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
We can see that TORNADO is the event with most accumulated injuries with 91346 occurences.
Now, we will evaluate wich events is in top 10 occurence of death and injuries.
death_injuries<-merge(fatal_arranged[1:10,],injuries_arranged[1:10,])
print(arrange(death_injuries, desc(FATALITIES)))
## EVTYPE FATALITIES INJURIES
## 1 TORNADO 5633 91346
## 2 EXCESSIVE HEAT 1903 6525
## 3 FLASH FLOOD 978 1777
## 4 HEAT 937 2100
## 5 LIGHTNING 816 5230
## 6 TSTM WIND 504 6957
## 7 FLOOD 470 6789
Just to make sure that this list contains the most harmfull events, let’s check the proportion that these selected events to the total occurences.
death_ratio<-sum(death_injuries$FATALITIES)/sum(data$FATALITIES)
injury_ratio<-sum(death_injuries$INJURIES)/sum(data$INJURIES)
So, these 7 events corresponds to 74.2% of total deaths and 85.9% of injuries reported.
The answer of first question is: The most harmful events with respect to population health is TORNADO, EXCESSIVE HEAT, FLASH FLOOD, HEAT, LIGHTNING, TSTM WIND, FLOOD.
In order to answer the second questions, it will be evaluated the top 10 events with the biggest sum of property damage and crop damages.
library(dplyr)
propdmg<-data %>%
group_by(EVTYPE) %>%
summarise(PROPDMG = sum(PROPDMG*10^propexp/10^9, na.rm=TRUE))
propdmg_arranged<-arrange(propdmg, desc(PROPDMG))
print(propdmg_arranged[1:10,])
## # A tibble: 10 × 2
## EVTYPE PROPDMG
## <fctr> <dbl>
## 1 TORNADO 51.625973
## 2 FLOOD 22.157832
## 3 FLASH FLOOD 15.141163
## 4 HAIL 13.927644
## 5 HURRICANE 6.168325
## 6 TSTM WIND 4.484983
## 7 HIGH WIND 3.970083
## 8 ICE STORM 3.944978
## 9 HURRICANE/TYPHOON 3.805906
## 10 WILDFIRE 3.725115
We can see that TORNADO is the event with most accumulated property damage with 51.63 billion dollars.
library(dplyr)
cropdmg<-data %>%
group_by(EVTYPE) %>%
summarise(CROPDMG = sum(CROPDMG*10^cropexp/10^9, na.rm = TRUE))
cropdmg_arranged<-arrange(cropdmg, desc(CROPDMG))
print(cropdmg_arranged[1:10,])
## # A tibble: 10 × 2
## EVTYPE CROPDMG
## <fctr> <dbl>
## 1 DROUGHT 12.4725675
## 2 FLOOD 5.6619684
## 3 HAIL 3.0259745
## 4 HURRICANE 2.7419100
## 5 FLASH FLOOD 1.4213171
## 6 EXTREME COLD 1.2929730
## 7 HURRICANE/TYPHOON 1.0978743
## 8 FROST/FREEZE 1.0940860
## 9 HEAVY RAIN 0.7333998
## 10 TROPICAL STORM 0.6783460
We can see that DROUGHT is the event with most accumulated crop damage with 12.47 billion dollars.
Now, we will evaluate wich events is in top 10 occurence of crop and property damage.
prop_crop<-merge(propdmg_arranged[1:10,],cropdmg_arranged[1:10,])
print(arrange(prop_crop, desc(PROPDMG)))
## EVTYPE PROPDMG CROPDMG
## 1 FLOOD 22.157832 5.661968
## 2 FLASH FLOOD 15.141163 1.421317
## 3 HAIL 13.927644 3.025974
## 4 HURRICANE 6.168325 2.741910
## 5 HURRICANE/TYPHOON 3.805906 1.097874
Just to make sure that this list contains the most harmfull events, let’s check the proportion that these selected events to the total occurences.
propdmg_ratio<-sum(prop_crop$PROPDMG)/total.propdmg
cropdmg_ratio<-sum(prop_crop$CROPDMG)/total.cropdmg
So, these 5 events corresponds to 40.4% of total property damage and 39.3% of crop damage reported.
These approach seems not be good. So, it will be applied the sum of property damage and crop damage.
library(dplyr)
total<-merge(propdmg, cropdmg, by = "EVTYPE")
total<-mutate(total, total = PROPDMG+CROPDMG)
totaldmg_arranged<-arrange(total, desc(total))
print(totaldmg_arranged[1:10,])
## EVTYPE PROPDMG CROPDMG total
## 1 TORNADO 51.625973 0.4151131 52.041086
## 2 FLOOD 22.157832 5.6619684 27.819801
## 3 HAIL 13.927644 3.0259745 16.953619
## 4 FLASH FLOOD 15.141163 1.4213171 16.562480
## 5 DROUGHT 1.046106 12.4725675 13.518674
## 6 HURRICANE 6.168325 2.7419100 8.910235
## 7 TSTM WIND 4.484983 0.5540074 5.038991
## 8 HURRICANE/TYPHOON 3.805906 1.0978743 4.903780
## 9 HIGH WIND 3.970083 0.6385713 4.608654
## 10 WILDFIRE 3.725115 0.2954728 4.020588
The answer of second question is: The most harmful events with respect to economic consequences is TORNADO, FLOOD, HAIL, FLASH FLOOD, DROUGHT.
As we can see, the event Tornado is in the top of both lists. So, we will analyse the effects of this kind of event through the years.
Firstly, we will add one more variable in data set that will represent the year of event.
data<-mutate(data, year = year(mdy_hms(as.character(data$BGN_DATE))))
head(data[1:10,35:38])
## LONGITUDE_ REMARKS REFNUM propexp
## 1 8806 <NA> 1 3
## 2 0 <NA> 2 3
## 3 0 <NA> 3 3
## 4 0 <NA> 4 3
## 5 0 <NA> 5 3
## 6 0 <NA> 6 3
Now, we will create one variable with only tornado events and grouped by year of event.
tornado<-data %>%
filter(EVTYPE == "TORNADO") %>%
group_by(EVTYPE, year) %>%
summarise(FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES), PROPDMG = sum(PROPDMG*10^propexp/10^9), CROPDMG = sum(CROPDMG*10^cropexp/10^6))
head(tornado)
## Source: local data frame [6 x 6]
## Groups: EVTYPE [1]
##
## EVTYPE year FATALITIES INJURIES PROPDMG CROPDMG
## <fctr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 TORNADO 1950 70 659 0.03448165 NA
## 2 TORNADO 1951 34 524 0.06550599 NA
## 3 TORNADO 1952 230 1915 0.09410224 NA
## 4 TORNADO 1953 519 5131 0.59610470 NA
## 5 TORNADO 1954 36 715 0.08580532 NA
## 6 TORNADO 1955 129 926 0.08266063 NA
Finally, we will create one plot that show how tornado consequences bahave through the years.
par(mfrow=c(2,2), oma=c(0,0,2,0))
with(tornado, plot(year, FATALITIES, pch=20, xlab="Year", ylab = "Fatalities"))
with(tornado, plot(year, INJURIES, pch=20, xlab="Year", ylab = "Injuries"))
with(tornado, plot(year, PROPDMG, pch=20, xlab="Year", ylab = "Property damage (billion $)"))
with(tornado, plot(year, CROPDMG, pch=20, xlab="Year", ylab = "Crop damage (million $)"))
title("Tornado consequences through the years", outer = TRUE )