The intention of this document is to understand the impact that Sever Weather Events have in our society. Using NOAA historical data we are able to identify those events that have toll the most casualities and those events that are prone to generate the most economical impact.
This document can be segmented in 4 different steps: 1. Data loading import NOAA data to R 2. Data cleaning prepare a tidy dataset 3. Public Health impact analysis identify the most dangerous severe wheater event for humand kind 4. Economical impact analysis identify the most expensive severe weather event for the federal government
Below are the steps followed to import NOAA data:
#We start with all the required libraries accros our analysis
library(ggplot2)
library(dplyr)
library(data.table)
library(reshape2)
#Load dataset to data
data <- read.csv("repdata-data-StormData.csv.bz2")
#Select all relevant variables
data <- select(data,
EVTYPE,
FATALITIES,
INJURIES,
PROPDMG,
PROPDMGEXP,
CROPDMG,
CROPDMGEXP)
The next step is to clean our data in order to facilitate any further calculation. There are two colums PROPDMGEXP and CROPDMGEXP that need to be changed from K, M, B to 1,000 or 1,000,000 or 1,000,000,000. To do so we are going to take advantages of sapply by creating a quick function:
f1 <- function(x) {
if (toupper(x) == "K"){
x = 1000 #Thousand
} else if (toupper(x) == "M"){
x = 1000000 #Millions
} else if (toupper(x) == "B"){
x = 1000000000 #Billions
} else
x = 0
} #function to replace values in the *DMGEXP columns
data$PROPDMGEXP <- sapply(data$PROPDMGEXP, f1) #Replace PROPDMGEXP values
data$CROPDMGEXP <- sapply(data$CROPDMGEXP, f1) #Replace CROPDMGEXP values
We still need to “Trim a lot of fat” from our data table. Using mutate we calculate the total damage/casualities generated by the severe weather event:
data <- mutate(data,
Property = PROPDMG * PROPDMGEXP,
Crops = CROPDMG * CROPDMGEXP,
Total = Property + Crops,
Population = FATALITIES + INJURIES)
Health <- select(data,
EVTYPE,
FATALITIES,
INJURIES,
Population)
Damage <- select(data,
EVTYPE,
Property,
Crops,
Total)
To answer this question we are going to use the recently created Health dataset, as it includes the required variables to answer this quesiton: 1. EVTYPE 2. FATALITIES 3. INJURIES 4. Population *new variable created by adding 2 and 3
First we are going to select the top 10 events ranked by the total of Fatalities + Injuries
q1 <- Health %>%
group_by(EVTYPE) %>%
summarize(TFatalities = sum(FATALITIES),
TInjuries = sum(INJURIES),
TPopulation = sum(Population)) %>%
arrange(desc(TPopulation))
q1 <- q1[1:10,] #Limiting the table to only the top 10 events
q1
## Source: local data frame [10 x 4]
##
## EVTYPE TFatalities TInjuries TPopulation
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
The previous table can be easly visualized with a bar chart:
question1<-melt(q1[,1:3], id.vars="EVTYPE")
plot1 <- ggplot(question1, aes(x=EVTYPE,
y=value,
fill=variable)) +
geom_bar(stat="identity") +
theme_minimal() +
ggtitle("Severe weather consequences to Population Health (by type of event)") +
xlab("Event Type") +
ylab("Total casualities (both Fatalities and Injuries)") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
plot1
You can see the impact that Tornados have on the public health
To answer this question we are going to use the recently created Damage dataset, as it includes the required variables to answer this quesiton: 1. EVTYPE 2. Property new variable created by multiplying PROPDMG and PROPDMGEXP 3. Crops new variable created by multiplying CROPDMG and CROPDMGEXP 4. Total *new variable created by adding 2 and 3
Similiar to question 1, we are going to select the top 10 events ranked by the total dollar impact
q2 <- Damage %>%
group_by(EVTYPE) %>%
summarize(TProperty = sum(Property),
TCrops = sum(Crops),
TTotal = sum(Total)) %>%
arrange(desc(TTotal))
q2 <- q2[1:10,] #Limiting the table to only the top 10 events
q2
## Source: local data frame [10 x 4]
##
## EVTYPE TProperty TCrops TTotal
## 1 FLOOD 144657709800 5661968450 150319678250
## 2 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 3 TORNADO 56937160480 414953110 57352113590
## 4 STORM SURGE 43323536000 5000 43323541000
## 5 HAIL 15732266720 3025954450 18758221170
## 6 FLASH FLOOD 16140811510 1421317100 17562128610
## 7 DROUGHT 1046106000 13972566000 15018672000
## 8 HURRICANE 11868319010 2741910000 14610229010
## 9 RIVER FLOOD 5118945500 5029459000 10148404500
## 10 ICE STORM 3944927810 5022113500 8967041310
Visualizing the previous table we can see the following:
question2<-melt(q2[,1:3], id.vars="EVTYPE")
question2$value <- question2$value/1000000000
plot2 <- ggplot(question2, aes(x=EVTYPE,
y=value,
fill=variable)) +
geom_bar(stat="identity") +
theme_minimal() +
ggtitle("Severe weather consequences in dollars (by type of event)") +
xlab("Event Type") +
ylab("Total damage (both property and crops) in Billion Dollars") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
plot2
You can see how expensive Floods are theh most expensiver severe weather event
Note That according to NOAA Storm Event Database from 1950 to 1954 only Tornados were recorded. From 1954 to 1992 only Tornado, Thunderstorm Wind and Hail were recorded and starting from 1993 all events started to be logged in the database.