Â
Â
This document consists of the analysis of impact of severe weather events on public health as well as economic problems. The weather events such as storms, tornado, rain, flood, hail, wind, heat etc. result in fatalities, injuries, crop and property damages. Hence, understanding and preventing such damages is important for government authorities at multiple levels.
One way to understand the impact of severe weather on public health and other economic damages is by analyzing the data of historic events. For this analysis I am using U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database track characteristics of major storms and weather events in the United States that includes estimates of fatalities, injuries, crop and property damages along with other attributes related to the storm.
The data shows that most of the severe weather events has impacted people lives and damaged their properties. Among all thpes of weather events Tornadoes resulted into higher health impacts such as fatalities and injuries and Floods resulted into most economic impacts such as properties and crops damages.
Â
This Assignment is a part of Reproducible Research Course Project 2 The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events.
Â
Â
Load some R Libraries used for data manipulation and creating visuals.
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(plyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
Â
Load file that has data of historic storm events,
if(!exists("storm_data_all")) {
storm_data_all <- read.csv(bzfile("repdata_data_StormData.csv.bz2"),header=TRUE)
}
Â
Below code runs some basic checks to confirm that the file has been loaded properly. Following R commands helps to check the size of the data, column headers and internal structure of the R object.
dim(storm_data_all)
## [1] 902297 37
colnames(storm_data_all)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
str(storm_data_all)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
The size of the input database is 902297 rows and 37 columns
Â
Next steps after loading the data successfully are for data pre-processing. Data pre-processing is very important step before utilizing any data or beginning with any analysis.
For this analysis i am considering only specific columns those are related with health and economic impacts from the raw data.
vars <- c( "BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
storm_data <- storm_data_all[, vars]
Below code removes the records with incomplete information. It removes the record where none of the information related to storm event, fatalities, injuries, property and crop damages exists.
storm_data <- filter(storm_data, (EVTYPE != "?" & (FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0)))
The data in input database starts from year 1950. But in earlier years there were very few events recorded mostly due to lack of good records. Hence, I am considering records after year 1991 onward for this analysis. In order to restrict data based on year, I am converting the BGN_DATE data point in standard date format of mm/dd/yyyy and then applying filter based on the value of year of begin date.
storm_data$BGN_DATE <- as.Date(storm_data$BGN_DATE, "%m/%d/%Y")
storm_data$YR <- year(storm_data$BGN_DATE)
storm_data <- filter(storm_data, YR >= 1991)
sort(table(storm_data$YR))
##
## 1991 1992 1993 1994 2005 1996 2001 1997 2002 1995 2004 1999 2003
## 879 990 5838 9643 10014 10040 10298 10322 10432 10457 10484 10609 11015
## 2000 2007 2006 1998 2009 2010 2008 2011
## 11508 11953 11974 14013 14434 16019 17633 20570
Let’s take a quick look at the converted data set by checking basic parameters.
dim(storm_data)
## [1] 229125 9
colnames(storm_data)
## [1] "BGN_DATE" "EVTYPE" "FATALITIES" "INJURIES" "PROPDMG"
## [6] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "YR"
Next steps in data pre-processing are for standardizing and converting storm events names. Following code will group the data into main weather events like Hail, Heat, Flood, Wind, Storm, Snow, Tornatdo, Winter and Rain. The remaining weather events data are grouped as ‘Other’ weather events.
storm_data$EVENT_TYPE <- "Other"
storm_data$EVENT_TYPE[grep("HAIL", storm_data$EVTYPE, ignore.case = TRUE)] <- "Hail"
storm_data$EVENT_TYPE[grep("HEAT", storm_data$EVTYPE, ignore.case = TRUE)] <- "Heat"
storm_data$EVENT_TYPE[grep("FLOOD", storm_data$EVTYPE, ignore.case = TRUE)] <- "Flood"
storm_data$EVENT_TYPE[grep("WIND", storm_data$EVTYPE, ignore.case = TRUE)] <- "Wind"
storm_data$EVENT_TYPE[grep("STORM", storm_data$EVTYPE, ignore.case = TRUE)] <- "Storm"
storm_data$EVENT_TYPE[grep("SNOW", storm_data$EVTYPE, ignore.case = TRUE)] <- "Snow"
storm_data$EVENT_TYPE[grep("TORNADO", storm_data$EVTYPE, ignore.case = TRUE)] <- "Torando"
storm_data$EVENT_TYPE[grep("WINTER", storm_data$EVTYPE, ignore.case = TRUE)] <- "Winter"
storm_data$EVENT_TYPE[grep("RAIN", storm_data$EVTYPE, ignore.case = TRUE)] <- "Rain"
sort(table(storm_data$EVENT_TYPE), decreasing = TRUE)
##
## Wind Storm Flood Hail Other Torando Winter Snow Rain Heat
## 72935 57434 32455 26102 18507 15524 2060 1878 1250 980
Property damages are recorded with exponential units. In order to analyze the economic impact, I am converting these costs with exponential units to actual dollar amount. This includes transformations such as K will be converted to 1,000, M will be converted to 1,000,000, B will be converted to 1,000,000,000.
storm_data$PROPDMGEXP <- as.character(storm_data$PROPDMGEXP)
storm_data$PROPDMGEXP[is.na(storm_data$PROPDMGEXP)] <- 0
storm_data$PROPDMGEXP[!grepl("K|M|B", storm_data$PROPDMGEXP, ignore.case = TRUE)] <- 0
storm_data$PROPDMGEXP[grep("K", storm_data$PROPDMGEXP, ignore.case = TRUE)] <- "3"
storm_data$PROPDMGEXP[grep("M", storm_data$PROPDMGEXP, ignore.case = TRUE)] <- "6"
storm_data$PROPDMGEXP[grep("B", storm_data$PROPDMGEXP, ignore.case = TRUE)] <- "9"
storm_data$PROPDMGEXP <- as.numeric(as.character(storm_data$PROPDMGEXP))
storm_data$PROPERTY_DAMAGE <- storm_data$PROPDMG * 10^storm_data$PROPDMGEXP
sort(table(storm_data$PROPERTY_DAMAGE), decreasing = TRUE)[1:10]
##
## 5000 10000 1000 2000 0 50000 3000 20000 25000 15000
## 31730 21787 17544 17186 14066 13596 10364 9179 8919 8617
Crop damages are recorded with exponential units. In order to analyze the economic impact, I am converting these costs with exponential units to actual dollar amount. This includes transformations such as K will be converted to 1,000, M will be converted to 1,000,000, B will be converted to 1,000,000,000.
storm_data$CROPDMGEXP <- as.character(storm_data$CROPDMGEXP)
storm_data$CROPDMGEXP[is.na(storm_data$CROPDMGEXP)] <- 0
storm_data$CROPDMGEXP[!grepl("K|M|B", storm_data$CROPDMGEXP, ignore.case = TRUE)] <- 0
storm_data$CROPDMGEXP[grep("K", storm_data$CROPDMGEXP, ignore.case = TRUE)] <- "3"
storm_data$CROPDMGEXP[grep("M", storm_data$CROPDMGEXP, ignore.case = TRUE)] <- "6"
storm_data$CROPDMGEXP[grep("B", storm_data$CROPDMGEXP, ignore.case = TRUE)] <- "9"
storm_data$CROPDMGEXP <- as.numeric(as.character(storm_data$CROPDMGEXP))
storm_data$CROP_DAMAGE <- storm_data$CROPDMG * 10^storm_data$CROPDMGEXP
sort(table(storm_data$CROP_DAMAGE), decreasing = TRUE)[1:10]
##
## 0 5000 10000 50000 1e+05 1000 2000 25000 20000 5e+05
## 207026 4097 2349 1984 1233 956 951 830 758 721
Â
Â
For analyzing health impacts related to weather events, I am taking into consideration the number of fatalities and injuries by weather event type.
agg.fatalities_injuries <- ddply(storm_data, .(EVENT_TYPE), summarize,
Total=sum(FATALITIES + INJURIES, na.rm=TRUE))
agg.fatalities_injurie <- "Fatalities and Injuries"
agg.fatalities <- ddply(storm_data, .(EVENT_TYPE), summarize, Total = sum(FATALITIES, na.rm = TRUE))
agg.fatalities$Type <- "Fatalities"
agg.injuries <- ddply(storm_data, .(EVENT_TYPE), summarize, Total = sum(INJURIES, na.rm = TRUE))
agg.injuries$Type <- "Injuries"
agg.health_impact <- rbind(agg.fatalities, agg.injuries)
health_impact <- join (agg.fatalities, agg.injuries, by="EVENT_TYPE", type="inner")
health_impact
## EVENT_TYPE Total Type Total Type
## 1 Flood 1524 Fatalities 8602 Injuries
## 2 Hail 15 Fatalities 1082 Injuries
## 3 Heat 3138 Fatalities 9224 Injuries
## 4 Other 2626 Fatalities 12224 Injuries
## 5 Rain 114 Fatalities 305 Injuries
## 6 Snow 164 Fatalities 1164 Injuries
## 7 Storm 416 Fatalities 5339 Injuries
## 8 Torando 1727 Fatalities 25558 Injuries
## 9 Wind 990 Fatalities 6485 Injuries
## 10 Winter 278 Fatalities 1891 Injuries
Plotting the results on chart for quick analysis.
agg.health_impact$EVENT_TYPE <- as.factor(agg.health_impact$EVENT_TYPE)
health_impact_plot <- ggplot(agg.health_impact, aes(x = reorder(EVENT_TYPE, -Total), y = Total, fill = Type)) +
theme_classic() +
geom_bar(stat = "identity", position = 'dodge', alpha=0.75) +
xlab("Weather Event") +
ylab("Number of Fatalities and Injuries") +
ggtitle("Impact of Severe Weather Events On Public Health 1991-2011") +
theme(axis.text = element_text(face="bold")) +
theme(axis.text.x = element_text(angle=90)) +
theme(plot.title = element_text(hjust = 0.5))
print(health_impact_plot)
The graph shows that the highest impact on public health was resulted due to tornado. Most fatalities and injuries were recorded due to tornadoes. Among other known weather events heat and flood resulted into more fatalities and injuries most after tornadoes
Â
For analyzing economic impacts related to weather events, I am taking into consideration the cost of property damages and crop damaged by weather event type.
agg.propdmg_cropdmg <- ddply(storm_data, .(EVENT_TYPE), summarize, Total = sum(PROPERTY_DAMAGE + CROP_DAMAGE, na.rm = TRUE))
agg.propdmg_cropdmg$type <- "Property and Crop Damage"
agg.prop <- ddply(storm_data, .(EVENT_TYPE), summarize, Total = sum(PROPERTY_DAMAGE, na.rm = TRUE))
agg.prop$Type <- "Property"
agg.crop <- ddply(storm_data, .(EVENT_TYPE), summarize, Total = sum(CROP_DAMAGE, na.rm = TRUE))
agg.crop$Type <- "Crop"
agg.economic_impact <- rbind(agg.prop, agg.crop)
economic_impact <- join (agg.prop, agg.crop, by="EVENT_TYPE", type="inner")
economic_impact
## EVENT_TYPE Total Type Total Type
## 1 Flood 167502193929 Property 12266906100 Crop
## 2 Hail 15733043048 Property 3046837473 Crop
## 3 Heat 20325750 Property 904469280 Crop
## 4 Other 97246707337 Property 23588880870 Crop
## 5 Rain 3270230192 Property 919315800 Crop
## 6 Snow 1024169752 Property 134683100 Crop
## 7 Storm 66304415393 Property 6374474888 Crop
## 8 Torando 30553884789 Property 417461520 Crop
## 9 Wind 10847166618 Property 1403719150 Crop
## 10 Winter 6777295251 Property 47444000 Crop
Plotting the results on chart for quick analysis.
economic_impact$EVENT_TYPE <- as.factor(economic_impact$EVENT_TYPE)
economic_impact_plot <- ggplot(agg.economic_impact, aes(x = reorder(EVENT_TYPE, -Total), y = Total/1e9, fill = Type)) +
theme_classic() +
geom_bar(stat = "identity", position = 'dodge', alpha=0.75) +
xlab("Weather Event") +
ylab("Property and Crop Damage (in billion USD)") +
ggtitle("Impact of Severe Weather Events On Economy 1991-2011") +
theme(axis.text = element_text(face="bold")) +
theme(axis.text.x = element_text(angle=90)) +
theme(plot.title = element_text(hjust = 0.5))
print(economic_impact_plot)
The graph shows that the highest economic damage was recorded due to flood. Floods resulted into most properties and crops damages. Among other known weather events storms and tornadoes impacted crops and damages most after floods.