The analysis we present intent to answer the two main questions of the project.For both questions we present a table and a chart in order to analyze the data in two different ways. At the end we present the results with the puntcual answers fot the questions.
We are getting de data from the given URL.
fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileurl, destfile="data.bz2")
data <- read.table("data.bz2", sep=",", header = TRUE)
Using dplyr we group the data. We keep in a new table only the variables we need for the analysis: EVTYPE, FATALITIES, INJURIES, PROPDMG and CROPDMG.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
g_data <- group_by(data, EVTYPE)
s_table <- summarize(g_data, Fatalities=sum(FATALITIES), Injuries=sum(INJURIES), CDamages = sum(PROPDMG), PDamages = sum(CROPDMG))
Now we filter the data keeping only those events that have at least one injury or one fatality. Also we order the data by fatality.
harm_t <- s_table[s_table$Fatalities != 0 | s_table$Injuries != 0, c('EVTYPE', 'Fatalities', 'Injuries')]
harm_t <- arrange(harm_t, desc(Fatalities, Injuries))
head(harm_t)
## Source: local data frame [6 x 3]
##
## EVTYPE Fatalities Injuries
## (fctr) (dbl) (dbl)
## 1 TORNADO 5633 91346
## 2 EXCESSIVE HEAT 1903 6525
## 3 FLASH FLOOD 978 1777
## 4 HEAT 937 2100
## 5 LIGHTNING 816 5230
## 6 TSTM WIND 504 6957
We’ll also plot Fatalities vs Injuries to get a graphic visualisation of the harm each event produced.
library(ggplot2)
qplot(Fatalities, Injuries, data = harm_t, label = EVTYPE) + geom_text(check_overlap = TRUE, nudge_y = 2000, size = 2.5)
On the same way we analyzed the harmful events, we use the DMD variables to analize the economic consecuences.
economic_t <- s_table[s_table$CDamages != 0 | s_table$PDamages != 0, c('EVTYPE', 'CDamages', 'PDamages')]
economic_t <- arrange(economic_t, desc(CDamages, PDamages))
head(economic_t)
## Source: local data frame [6 x 3]
##
## EVTYPE CDamages PDamages
## (fctr) (dbl) (dbl)
## 1 TORNADO 3212258.2 100018.52
## 2 FLASH FLOOD 1420124.6 179200.46
## 3 TSTM WIND 1335965.6 109202.60
## 4 FLOOD 899938.5 168037.88
## 5 THUNDERSTORM WIND 876844.2 66791.45
## 6 HAIL 688693.4 579596.28
The plot the variables in the same way.
qplot(CDamages, PDamages, data = economic_t, label = EVTYPE) + geom_text(check_overlap = TRUE, nudge_y = 15000, size = 2.5)
From the first table and the plot we can conclude that the most harmful events are tornado, followed by excessive heat and heat.
As we can see in the second plot and table, the two events that cause the greatest economic consequences are tornado and hail.