This is the study of the effects that severe weather has on public heath and the economic damage it causes. The Data is gathered from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks major storms and weather events across the US. This data is read into R and is analysed. The main objective of this analysis is to understand the top 20 Weather events that causes the most health hazard and economic descruction, and present it to the concerned authorities.
First step is to download the csv file from the link provided in the course (https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2). Then we use the bzfile() to unzip the file, and read.csv() to read the file into memory. We also assign the required libraries.
# Set working directories
setwd("C:/Users/pneupane/Documents/study topics/coursera/rep_research/peer_assn2/indata")
getwd()
## [1] "C:/Users/pneupane/Documents/study topics/coursera/rep_research/peer_assn2/indata"
# Set necessary libraries
library (data.table)
library(ggplot2)
library(reshape)
# Read the zipped file
storm <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))
# Convert to Data Table
storm.dt<- as.data.table(storm)
dim(storm.dt)
## [1] 902297 37
This dataset has 902297 records, each pointing to a weather event for a given day.
To determine the effects on public health, variables FATALITIES and INJURIES provide the number of fatalities and injuries for each event type.
To determine the effects on the economy, variables PROPDMG and CROPDMG provide the dollar amount of Property Damages and Crop Damages for each event type. Additionally the variables PROPDMGEXP and CROPDMGEXP provide the units of measurement (in Humdreds(H), Thousands(K), Millions(M) and Billions(B)) for Property Damages and Crop Damages dollar amounts.
Next we try to standarize the dollar amounts for Property Damages and Crop Damages.
# Standardize the unit of measurements for Property Damages and Crop Damages
# update with standard values for Property Damage (Convert to H, K, M & B units)
storm.dt1<- storm.dt[,PROPDMG1 := ifelse(PROPDMGEXP %in% c('H','h'),PROPDMG*10^2,
ifelse(PROPDMGEXP %in% c('K','k'),PROPDMG*10^3,
ifelse(PROPDMGEXP %in% c('M','m'),PROPDMG*10^6,
ifelse(PROPDMGEXP %in% c('B','b'),PROPDMG*10^9, PROPDMG
)
)
)
)
]
# update with standard values for cROP Damage (Convert to H, K, M & B units)
storm.dt1<- storm.dt[,CROPDMG1 := ifelse(CROPDMGEXP %in% c('H','h'),CROPDMG*10^2,
ifelse(CROPDMGEXP %in% c('K','k'),CROPDMG*10^3,
ifelse(CROPDMGEXP %in% c('M','m'),CROPDMG*10^6,
ifelse(CROPDMGEXP %in% c('B','b'),CROPDMG*10^9, CROPDMG
)
)
)
)
]
We aggregate the counts for FATALITIES and INJURIES. Then we plot the top 20 events for both type of public health hazards.
# Aggregate total number of Events causing Injury & Fatalities
evtype.pubhealth <- storm.dt1[,.(Injuries=sum(INJURIES), Fatalities = sum(FATALITIES)),by=EVTYPE] [Injuries != 0 & Fatalities !=0 ]
# Draw a chart to show injuries By top 20 Events
injuries=tail(evtype.pubhealth[order(Injuries)],20)
injuries$EVTYPE <- factor(injuries$EVTYPE, levels = injuries$EVTYPE)
a <- ggplot(data=injuries, aes(x = EVTYPE, y = Injuries))
a <- a + geom_bar(stat="identity", fill="red") +
coord_flip() +
xlab("Event Type") +
ylab("Number of Injuries") +
ggtitle("Number of injuries for top 20 Event Types (Fig. 1)")
a
# Draw a chart to show fatalities By top 20 Events
fatalities=tail(evtype.pubhealth[order(Fatalities)],20)
fatalities$EVTYPE <- factor(fatalities$EVTYPE, levels = fatalities$EVTYPE)
a <- ggplot(data=fatalities, aes(x = EVTYPE, y = Fatalities))
a <- a + geom_bar(stat="identity", fill="red") +
coord_flip() +
xlab("Event Type") +
ylab("Number of Fatalities") +
ggtitle("Number of fatalities for top 20 Event Types (Fig. 2)")
a
From the plots above, it seems tornado causes the most injuries and fatalities. Other events that cause public health hazards are: Excessive-Heat, Flash-Flood, Thunderstorm wind, and Flood.
We aggregate the standarsized dollar amounts for FATALITIES and INJURIES. Then we plot the top 20 events for both type of economic consequences.
# Aggregate total number of Events causing Injury & Fatalities
evtype.econ <- storm.dt1[,.(propdmg=sum(PROPDMG1), cropdmg=sum(CROPDMG1)),by=EVTYPE] [propdmg != 0 & cropdmg !=0 ]
# Draw a chart to show Property Damage By top 20 Events
damages=head(evtype.econ[order(-propdmg)],20)
# Melt the dataset
damages = melt(damages,id=c("EVTYPE"))
a <- ggplot(data=damages, aes(x = EVTYPE, y = value, fill= variable))
a <- a + geom_bar(stat="identity") +
facet_grid(variable~., scales="free_y") +
xlab("Event Type") +
ylab("Dollar amount of Property Damage") +
ggtitle("Dollar amount of Property Damage for top 20 Event Types (Fig. 3)") +
theme(axis.text.x=element_text(angle=90)) +
scale_fill_discrete(name="Damage Type",
breaks=c("propdmg", "cropdmg"),
labels=c("Property Damage", "Crop Damage"))
a
We can see from the plot above that Flood causes the most economic damage. Events like Hurricane/Typhoon, Tornado, Storm-Surge, etc causes more Property damage, whereas events like Hail, Hurricane, River-Flood, Ice-storm, etc. causes crop damage.