Storms and other severe weather events can cause both public health and economic problems for communities and municipalities effected by these events. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
##Data ##National Weather Service Storm Database National Climatic Data Center Storm Events database records the information collected on all the severe weather events in the United States from the year 1950. This copy of the database ends at end of November 2011. In the earlier years of the database there are fewer events recorded. This is likely to be due to a lack of good records. More recent years should be considered more complete as entered by NOAA’s National Weather Service (NWS). Due to changes in the data collection and processing procedures over time according to the NWS’s website there are unique periods of record available depending on the event type. The following timelines show the different time spans for each period of unique data collection and processing procedures:
From 1950 through 1954, only tornado events were recorded. From 1955 through 1995, only tornado, thunderstorm wind and hail events were keyed from the paper publications into digital data. Since 1996 the number of event types have increased to 48 (as per NWS directive 10-1605). This will cause some bias in data and may result in mistaken belief that such events have suddenly increased at these cutoff points in time. Also it is unclear from the documentation whether the economic costs are calculated using the “time value of money”. This is important as there is a significant difference between 1950 US$ and 2011 US$.
The data for this assignment comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the following source: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 A bz2 zip file can be read using the ‘read.csv()’ function without the need for a separate unzipping section.
Further reading can be obtained from: “https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf”
The aim of this study are to address the following questions: 1- which types of events are most harmful with respect to population health? 2- which types of events have the greatest economic consequences?
The following R packages are used for data analysis
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(knitr)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:dplyr':
##
## intersect, setdiff, union
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(ggplot2)
library(data.table)
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:lubridate':
##
## hour, isoweek, mday, minute, month, quarter, second, wday, week,
## yday, year
## The following objects are masked from 'package:dplyr':
##
## between, first, last
library(stringr)
This step involves downloading the National Weather Service Storm Database from the link provided. as it is compressed via the bzip2 algorithm to reduce its size, it needs to be unzipped. The uncompressed file is a cvs file called repdata_data_StormData.csv read into a file called storm_data. The following R code performs this task:
if(!file.exists("~/data1")){dir.create("~/data1")}
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl,destfile="~/data1/repdata_data_StormData.csv.bz2")
setwd("~/data1")
stormData <- read.csv('repdata_data_StormData.csv.bz2', header = TRUE, sep = ",")
head(stormData)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
stormData$annual <- as.numeric (format(as.Date(stormData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"),"%Y"))
The data in the database requires siginificant processing this includes: Standardising the EVTYPE events and removing whitespaces:
stormData$EVTYPE <- str_trim(stormData$EVTYPE)
stormData$EVTYPE <- toupper(stormData$EVTYPE)
The whole dataset is very large so Reducing the dataset to the data you need would be efficient. It also sums up the fatalilties and injuries caused by each weather event and lists them under Events_harm2:
Analysis_data<-select(stormData, EVTYPE, FATALITIES,INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
head(Analysis_data)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
#make sure Events are set as character and fatalities and injuries are st as integers.#
Events<- as.character(Analysis_data$EVTYPE)
FATALITIES<-as.integer(Analysis_data$FATALITIES)
INJURIES<- as.integer(Analysis_data$INJURIES)
Events_harm<- aggregate(FATALITIES + INJURIES ~ EVTYPE, data=Analysis_data, sum, na.rm=TRUE)
names(Events_harm)[2]<-"total"
Events_harm2<- Events_harm[order(-Events_harm$total),]
fatal_EVT <- summarise(group_by(Analysis_data, EVTYPE), fatalities = sum(FATALITIES))
top10fatal <- head(arrange(fatal_EVT, desc(fatalities)), n = 10)
injuries_EVT <- summarise(group_by(Analysis_data, EVTYPE), injuries = sum(INJURIES))
Injuries<- injuries_EVT[order(-injuries_EVT$injuries), ]
Injuriestop10<- head(arrange(injuries_EVT, desc(injuries)), n=10)
The last data preprocessing event is to calculate the economic impact of each weather event by combining the property damage and crop damage enteries. For some reason these values have multipliers for each observtion. These multipliers are Hundred (H), Thousand (K), Million (M) and Billion (B).This is the code for converting these values to their numeric values so the comparisons could be performed:
Analysis_data$PROPDMGEXP<- as.character(Analysis_data$PROPDMGEXP)
Analysis_data<- mutate(Analysis_data, PROPDMGEXP = ifelse (PROPDMGEXP == "B",9,
ifelse (PROPDMGEXP %in% c("M","m"),6,
ifelse (PROPDMGEXP %in% c("K","k"), 3,
ifelse (PROPDMGEXP %in% c("H", "h"), 2,
ifelse(PROPDMGEXP %in% c("+","?","-"),0,
ifelse(PROPDMGEXP == "",1,
PROPDMGEXP)))))))
Analysis_data$PROPDMGEXP<- as.numeric(Analysis_data$PROPDMGEXP)
Analysis_data$PROPDMG1<- Analysis_data$PROPDMG*10^Analysis_data$PROPDMGEXP
Analysis_data$CROPDMGEXP<- as.character(Analysis_data$CROPDMGEXP)
Analysis_data<- mutate(Analysis_data, CROPDMGEXP = ifelse (CROPDMGEXP == "B",9,
ifelse (CROPDMGEXP %in% c("M","m"),6,
ifelse (CROPDMGEXP %in% c("K","k"), 3,
ifelse (CROPDMGEXP %in% c("H", "h"), 2,
ifelse(CROPDMGEXP %in% c("+","?","-", "Inf"),0,
ifelse(CROPDMGEXP == "",1,
CROPDMGEXP)))))))
Analysis_data$CROPDMGEXP<- as.numeric(Analysis_data$CROPDMGEXP)
Analysis_data$CROPDMG1<- Analysis_data$CROPDMG*10^Analysis_data$CROPDMGEXP
Analysis_data$DMG<- Analysis_data$PROPDMG1 + Analysis_data$CROPDMG1
propDamg <- summarise(group_by(Analysis_data, EVTYPE), PROPDAM = sum(PROPDMG1))
top10PropDamg <- head(arrange(propDamg, desc(PROPDAM)), n = 10)
cropDamg <- summarise(group_by(Analysis_data, EVTYPE), cropDAMG = sum(CROPDMG1))
top10CropDamg<- head(arrange(cropDamg, desc(cropDAMG)), n=10)
The following histogram illustrates the annualized frequency of (recorded) severe weather events between the years 1950 and 2011:
stormData$annual <- as.numeric (format(as.Date(stormData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"),"%Y"))
hist(stormData$annual, breaks = 60, main="Frequency of Extreme weather Events per year", ylab = "Year")
As you can see the following the introduction of the NWS directive 10-1605 in 1995 there has been a significant increase iin the yearly frequency of extreme weather events and that makes sense as many more up to 48 types of weather events are recorded. Althrough there is underlying trend of gradually increasing frequency of recorded events.
The following list are the top 10 events which have had the largest impact on human health (i.e. causing death or severe injuries):
head(Events_harm2, 10)
## EVTYPE total
## 750 TORNADO 96979
## 108 EXCESSIVE HEAT 8428
## 771 TSTM WIND 7461
## 146 FLOOD 7259
## 410 LIGHTNING 6046
## 235 HEAT 3037
## 130 FLASH FLOOD 2755
## 379 ICE STORM 2064
## 677 THUNDERSTORM WIND 1621
## 880 WINTER STORM 1527
The following 2 tables list the events the top 10 events responsible for fatalities and injuries:
print(top10fatal)
## # A tibble: 10 x 2
## EVTYPE fatalities
## <chr> <dbl>
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
head(Injuries, 10)
## # A tibble: 10 x 2
## EVTYPE injuries
## <chr> <dbl>
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
This is the graphical representation of the above findings:
library(cowplot)
##
## ********************************************************
## Note: As of version 1.0.0, cowplot does not change the
## default ggplot2 theme anymore. To recover the previous
## behavior, execute:
## theme_set(theme_cowplot())
## ********************************************************
##
## Attaching package: 'cowplot'
## The following object is masked from 'package:lubridate':
##
## stamp
fatalitiesPlot <- ggplot(top10fatal, aes(x = reorder(EVTYPE,-fatalities), y = fatalities)) + geom_bar(stat = "identity", fill = "blue") +
theme(axis.text.x = element_text(angle = 35,hjust = 1, size = 8)) +
xlab("Event Type") + ylab("Number of Fatalities") +
ggtitle("Top 10 Severe Weather Events\n causing Fatalities in US\n from 1995 to 2011")
InjuriesPlot <- ggplot(Injuriestop10, aes(x = reorder(EVTYPE,-injuries), y = injuries)) + geom_bar(stat = "identity", fill = "blue") +
theme(axis.text.x = element_text(angle = 35,hjust = 1, size = 8)) +
xlab("Event Type") + ylab("Number of Injuries") +
ggtitle("Top 10 Severe Weather Events\n causing Injuries from 1995 to 2011")
cowplot::plot_grid(fatalitiesPlot, InjuriesPlot, align = "v")
Figure-2: illustrates the human impact of advese weather events.
The economic impact of adverse weather events between 1950 and 2011 has been considerable the total economic impact of adverse weather events (as recorded in the National Weather Service Storm Database) in the Unites States was US$ 477,329,065,794 of which US$ 428,224,873,514 was property damage and US$ 49,104,192,280 was damage to crops.
#total economic impact
sum(Analysis_data$DMG)
## [1] 477329065794
#Property damage
sum(Analysis_data$PROPDMG1)
## [1] 428224873514
#Crop damage
sum(Analysis_data$CROPDMG1)
## [1] 49104192280
The following is a list of the top 10 causes of property damage as recorded by the National Weather Service Storm Database:
print(top10PropDamg)
## # A tibble: 10 x 2
## EVTYPE PROPDAM
## <chr> <dbl>
## 1 FLOOD 144657709870
## 2 HURRICANE/TYPHOON 69305840000
## 3 TORNADO 56947380704.
## 4 STORM SURGE 43323536000
## 5 FLASH FLOOD 16822725842.
## 6 HAIL 15735268026.
## 7 HURRICANE 11868319010
## 8 TROPICAL STORM 7703890550
## 9 WINTER STORM 6688497251
## 10 HIGH WIND 5270046295
print(top10CropDamg)
## # A tibble: 10 x 2
## EVTYPE cropDAMG
## <chr> <dbl>
## 1 DROUGHT 13972566000
## 2 FLOOD 5661968450
## 3 RIVER FLOOD 5029459000
## 4 ICE STORM 5022113500
## 5 HAIL 3025954500
## 6 HURRICANE 2741910000
## 7 HURRICANE/TYPHOON 2607872800
## 8 FLASH FLOOD 1421317100
## 9 EXTREME COLD 1312973000
## 10 FROST/FREEZE 1094186000
This is the graphical representation of the above findings:
library(cowplot)
PROPDMGPlot <- ggplot(top10PropDamg, aes(x = reorder(EVTYPE,-PROPDAM), y = PROPDAM)) + geom_bar(stat = "identity", fill = "blue") +
theme(axis.text.x = element_text(angle = 35,hjust = 1, size = 8)) +
xlab("Event Type") + ylab("Property damage (US$)") +
ggtitle("Top 10 Severe Weather Events causing property damage in the US")
CROPDMGPlot <- ggplot(top10CropDamg, aes(x = reorder(EVTYPE,-cropDAMG), y = cropDAMG)) + geom_bar(stat = "identity", fill = "blue") +
theme(axis.text.x = element_text(angle = 35,hjust = 1, size = 8)) +
xlab("Event Type") + ylab("Crop damage (US$)") +
ggtitle("Top 10 Severe Weather Events causing property damage in the US")
cowplot::plot_grid(PROPDMGPlot, CROPDMGPlot, align = "v")