Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric AdministrationÂ’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
This report analyses NOAA storm database and address the following questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
The answers to the questions were answered by running an analysis and aggregating the data by storm events type:
Response 1: Tornados are the most harmfull events on population health. Response 2: Floods are responsible for the most economic damage.
In order to process the data a file was downloded from the National Oceanic and Atmospheric Administration’s (NOAA) storm database.
if(!file.exists("StormData.csv.bz2")) {
NOAA_File_URL <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(NOAA_File_URL, destfile="StormData.csv.bz2", method = "curl")
}
Once the zip file is downloaded into our working directory we proceed to unzip it and load it into R. Cache is turn on for better performance.
storm_data <- read.csv("StormData.csv.bz2", stringsAsFactors=F)
dim(storm_data)
## [1] 902297 37
Now that we have the data loaded we proceed to clean the data set a bit and getting only the related variables for our analysis.
selected_variables <- c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
# reduce the data variables
storm_data_clean <-storm_data[,selected_variables]
head(storm_data_clean)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
In order to answer this question we need to have a look at the enjuries and fatalities.
# Install library if required
library(reshape)
## Warning: package 'reshape' was built under R version 3.1.3
# Install library if required
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
# Install library if required
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.1.3
##
## Attaching package: 'tidyr'
##
## The following object is masked from 'package:reshape':
##
## expand
fatal_aggr <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, storm_data_clean, sum)
data_aggr <- melt(head(fatal_aggr[order(-fatal_aggr$FATALITIES, -fatal_aggr$INJURIES), ], 10))
## Using EVTYPE as id variables
By using a bar chart from the ggplot2 library we can identify the event that impacts the most amount of people.
ggplot(data_aggr, aes(x = EVTYPE, y = value, fill = variable)) +
geom_bar(stat = "identity") +
coord_flip() +
ggtitle("Harmful events with respect to population health") +
labs(x = "Event Types", y = "# people impacted") +
scale_fill_manual(values = c("red", "blue"), labels = c("Fatalities", "Injuries"))
We can see that Tornados are the events that affect people the most.
# Install library if required
library(car)
## Warning: package 'car' was built under R version 3.1.3
storm_data_clean$PROPDMG <- storm_data_clean$PROPDMG * as.numeric(Recode(storm_data_clean$PROPDMGEXP,
"'0'=1;'1'=10;'2'=100;'3'=1000;'4'=10000;'5'=100000;'6'=1000000;'7'=10000000;'8'=100000000;'B'=1000000000;'h'=100;'H'=100;'K'=1000;'m'=1000000;'M'=1000000;'-'=0;'?'=0;'+'=0",
as.factor.result = FALSE))
storm_data_clean$CROPDMG <- storm_data_clean$CROPDMG * as.numeric(Recode(storm_data_clean$CROPDMGEXP,
"'0'=1;'2'=100;'B'=1000000000;'k'=1000;'K'=1000;'m'=1000000;'M'=1000000;''=0;'?'=0",
as.factor.result = FALSE))
eco_aggr <- aggregate(cbind(PROPDMG, CROPDMG) ~ EVTYPE, storm_data_clean, sum)
data_aggr <- melt(head(eco_aggr[order(-eco_aggr$PROPDMG, -eco_aggr$CROPDMG), ], 10))
## Using EVTYPE as id variables
By using a bar chart from the ggplot2 library we can identify the event that impacts properties and crops the most.
ggplot(data_aggr, aes(x = EVTYPE, y = value, fill = variable)) +
geom_bar(stat = "identity") +
coord_flip() + ggtitle("Economic consequences") +
labs(x = "Event Types", y = "cost of damages ($)") +
scale_fill_manual(values = c("red", "blue"), labels = c("Property Damage",
"Crop Damage"))
We can see that Floods are the events that affect Properties and Crops the most.