Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to find out across the United States, (a) which types of events are most harmful with respect to population health and (b) which types of events have the greatest economic consequences.The findings suggest that tornardoes are most harmful to population health, while floods have the greatest economic consequences.
The raw data file is downloaded and stored in the working directory as “weather_raw”.
weather_raw <- download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 'weather.csv.bz2')
weather_raw <- read.csv('weather.csv.bz2')
Irrelevant columns in the dataset are deleted and the new dataset is saved as “weather”.
weather <- weather_raw[,c(8,23:28)]
A summary of the data is obtained.
str(weather)
## 'data.frame': 902297 obs. of 7 variables:
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
The dplyr package is being run.
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.2.4
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
As there are invalid characters for property and crop damages, a new column ‘MAGNITUDE’is created with these invalid characters being excluded.The variable ’ECONOMIC CONSEQUENCE’ is then created by adding the amount of property and crop damages.
weather$PROPMAGNITUDE <- 0
weather$CROPMAGNITUDE <- 0
weather[which(weather$PROPDMGEXP == "K"),]$PROPMAGNITUDE <- 1000
weather[which(weather$PROPDMGEXP == "m"),]$PROPMAGNITUDE <- 1000000
weather[which(weather$PROPDMGEXP == "M"),]$PROPMAGNITUDE <- 1000000
weather[which(weather$PROPDMGEXP == "B"),]$PROPMAGNITUDE <- 1000000000
weather[which(weather$CROPDMGEXP == "K"),]$CROPMAGNITUDE <- 1000
weather[which(weather$CROPDMGEXP == "k"),]$CROPMAGNITUDE <- 1000
weather[which(weather$CROPDMGEXP == "m"),]$CROPMAGNITUDE <- 1000000
weather[which(weather$CROPDMGEXP == "M"),]$CROPMAGNITUDE <- 1000000
weather[which(weather$CROPDMGEXP == "B"),]$CROPMAGNITUDE <- 1000000000
weather$ECONOMIC_CONSEQUENCE <- (weather$PROPDMG * weather$PROPMAGNITUDE) + (weather$CROPDMG * weather$CROPMAGNITUDE)
A new variable ‘weather_event’ is created by grouping the fatalities, injuries, and economic consequences by event types.
weather_event <- group_by(weather, EVTYPE)
weather_event <- summarise(weather_event, FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES), ECONOMIC = sum(ECONOMIC_CONSEQUENCE))
The package ggplot2 is being run.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.4
A bar chart is plotted showing the top ten events resulting in the highest number of fatalities.
weather_final_fatalities <- arrange(weather_event, desc(FATALITIES))
weather_10_fatalities <- head(weather_final_fatalities,10)
ggplot(data=weather_10_fatalities, aes(x = factor(EVTYPE), y = FATALITIES)) +
geom_bar(stat = "identity", fill="white", colour="darkblue") +
coord_flip() + labs(title = "Top 10 Events Resulting in Fatalities", x = "Event Types", y = "Number of Fatalities")
A bar chart is plotted showing the top ten events resulting in the highest number of injuries.
weather_final_injuries <- arrange(weather_event, desc(INJURIES))
weather_10_injuries <- head(weather_final_injuries,10)
ggplot(data=weather_10_injuries, aes(x = factor(EVTYPE), y = INJURIES)) +
geom_bar(stat = "identity", fill="white", colour="darkred") +
coord_flip() + labs(title = "Top 10 Events Resulting in Injuries", x = "Event Types", y = "Number of Injuries")
From the two charts, we can see that tornadoes have resulted in the highest numbers of fatalities and injuries. Hence, we conclude that tornadoes are most the harmful to population health.
A bar chart is plotted showing the top ten events resulting in the greatest economic consequences in terms of value.
weather_final_economic <- arrange(weather_event, desc(ECONOMIC))
weather_10_economic <- head(weather_final_economic,10)
ggplot(data=weather_10_economic, aes(x = factor(EVTYPE), y = ECONOMIC/1.0e9)) +
geom_bar(stat = "identity", fill="white", colour="darkgreen") +
coord_flip() + labs(title = "Top 10 Events with Worst Economic Consequences", x = "Event Types", y = "Economic Consequences ($ in billions)")
The chart shows that floods have the greatest economic consequences.