Effect of events with respect to health and economic

Author: Jawad Rashid

Synopsis

This analysis analyzes the events across United States which impact public health and economic the most. The data collected is from weather data and for each type of event there is information about the impact of the event on different areas including crop damage, property damage, fatalities, injures etc. In this analysis we will try to find which event has impacted health and economic the most by seing their frequency and their impact. For health we will use fatalities and injuries as a appropriate measure and for economic loss we will analyze crop and property damage. These attributes seems to be most interesting in this analysis.

Data Processing

This analysis uses the downloaded zipped file from the source:

if (!file.exists("data")) {
    dir.create("data")
}
download.file(url = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
    destfile = "data/stormdata.csv.bz2", method = "curl")

The zipped file is read in for analysis.

data <- read.csv("data/stormdata.csv.bz2", stringsAsFactors = FALSE, strip.white = TRUE, 
    na.strings = c("NA", ""))

Only the required fields will be kept other columns will be removed as we are interested in fatalaties, injuries, property damage and crop damage on event type will only those column and remove all others. All the required events will be renamed to make it more easier to understand.

processedData <- data[, c(8, 23, 24, 25, 27)]
names(processedData) <- c("eventtype", "fatalaties", "injuries", "propertydamage", 
    "cropdamage")

Some summary of the dataset is given below:

head(processedData)
##   eventtype fatalaties injuries propertydamage cropdamage
## 1   TORNADO          0       15           25.0          0
## 2   TORNADO          0        0            2.5          0
## 3   TORNADO          0        2           25.0          0
## 4   TORNADO          0        2            2.5          0
## 5   TORNADO          0        2            2.5          0
## 6   TORNADO          0        6            2.5          0

More summary is given below.

str(processedData)
## 'data.frame':    902297 obs. of  5 variables:
##  $ eventtype     : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ fatalaties    : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ injuries      : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ propertydamage: num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ cropdamage    : num  0 0 0 0 0 0 0 0 0 0 ...

Results

Event Type Impact On Public Health

In order to answer the question which event type is most harmful to public health we will first for each event type calculate the maximum fatalaties, and injuries. First we will aggregate fatataties for each event type and add it up and sort it. Here are the top 10 events in terms of fatalaties

library(plyr)
fatalatiesSummary <- aggregate(processedData$fatalaties, list(event = processedData$eventtype), 
    sum)
fatalatiesSummary <- arrange(fatalatiesSummary, desc(x))
names(fatalatiesSummary) <- c("event", "fatalaties")
head(fatalatiesSummary, n = 10)
##             event fatalaties
## 1         TORNADO       5633
## 2  EXCESSIVE HEAT       1903
## 3     FLASH FLOOD        978
## 4            HEAT        937
## 5       LIGHTNING        816
## 6       TSTM WIND        504
## 7           FLOOD        470
## 8     RIP CURRENT        368
## 9       HIGH WIND        248
## 10      AVALANCHE        224

We will do the same for injuries.

injurySummary <- aggregate(processedData$injuries, list(event = processedData$eventtype), 
    sum)
injurySummary <- arrange(injurySummary, desc(x))
names(injurySummary) <- c("event", "injuries")
head(injurySummary, n = 10)
##                event injuries
## 1            TORNADO    91346
## 2          TSTM WIND     6957
## 3              FLOOD     6789
## 4     EXCESSIVE HEAT     6525
## 5          LIGHTNING     5230
## 6               HEAT     2100
## 7          ICE STORM     1975
## 8        FLASH FLOOD     1777
## 9  THUNDERSTORM WIND     1488
## 10              HAIL     1361

Visually we can see the results we have come up so far by plotting boxplot of the top 10 fatalaties and injuries.

library(ggplot2)
par(mfrow = c(1, 2))
barplot(fatalatiesSummary$fatalaties[1:10]/1000, main = "Top 10 Fatalaties/Event", 
    ylab = expression("Count * 1000"), names.arg = fatalatiesSummary$event[1:10], 
    cex.names = 0.6, las = 2, cex.axis = 0.7, col = 3)
barplot(injurySummary$injuries[1:10]/1000, main = "Top 10 Injuries/Event", ylab = expression("Count * 1000"), 
    names.arg = injurySummary$event[1:10], cex.names = 0.6, las = 2, cex.axis = 0.7, 
    col = 2)

plot of chunk HealthPlot

# These stats are used in the paragraphs below
firstFatalityPercentage <- fatalatiesSummary[1, 2]/sum(fatalatiesSummary[, 2]) * 
    100
secondFatalityPercentage <- fatalatiesSummary[2, 2]/sum(fatalatiesSummary[, 
    2]) * 100
firstInjuryPercentage <- injurySummary[1, 2]/sum(injurySummary[, 2]) * 100
secondInjuryPercentage <- injurySummary[2, 2]/sum(injurySummary[, 2]) * 100

As you can see from the figures and the table tornado is the most harmful to public health in terms of fatalaties. It is responsible for 37.1938% of the total fatalaties with second of excessive heat with percentage of 12.5652%

Same analysis with injury reveals that tornado is the most harmful to public health in terms of injuries. It is responsible for 65.002% of the total injuries with second of tstm wind with percentage of 4.9506%

Event Type Impact On Economic

In order to answer the question which event type is most harmful to economic we will first for each event type calculate the maximum crop and property damage. First we will aggregate fatataties for each event type and add it up and sort it. Here are the top 10 events in terms of crop damage

cropSummary <- aggregate(processedData$cropdamage, list(event = processedData$eventtype), 
    sum)
cropSummary <- arrange(cropSummary, desc(x))
names(cropSummary) <- c("event", "crop")
head(cropSummary, n = 10)
##                 event   crop
## 1                HAIL 579596
## 2         FLASH FLOOD 179200
## 3               FLOOD 168038
## 4           TSTM WIND 109203
## 5             TORNADO 100019
## 6   THUNDERSTORM WIND  66791
## 7             DROUGHT  33899
## 8  THUNDERSTORM WINDS  18685
## 9           HIGH WIND  17283
## 10         HEAVY RAIN  11123

We will do the same for property damage.

propertySummary <- aggregate(processedData$propertydamage, list(event = processedData$eventtype), 
    sum)
propertySummary <- arrange(propertySummary, desc(x))
names(propertySummary) <- c("event", "property")
head(propertySummary, n = 10)
##                 event property
## 1             TORNADO  3212258
## 2         FLASH FLOOD  1420125
## 3           TSTM WIND  1335966
## 4               FLOOD   899938
## 5   THUNDERSTORM WIND   876844
## 6                HAIL   688693
## 7           LIGHTNING   603352
## 8  THUNDERSTORM WINDS   446293
## 9           HIGH WIND   324732
## 10       WINTER STORM   132721

Visually we can see the results we have come up so far by plotting box plot of the top 10 crop and property damage.

par(mfrow = c(1, 2))
barplot(cropSummary$crop[1:10]/10^5, main = "Top 10 Crop Damage/Event", ylab = expression("Count * 100000"), 
    names.arg = cropSummary$event[1:10], cex.names = 0.6, las = 2, cex.axis = 0.7, 
    col = 3)
barplot(propertySummary$property[1:10]/10^5, main = "Top 10 Propery Damage/Event", 
    ylab = expression("Count * 100000"), names.arg = propertySummary$event[1:10], 
    cex.names = 0.6, las = 2, cex.axis = 0.7, col = 2)

plot of chunk EconomicPlots

# These stats are used in the paragraphs below
firstCropPercentage <- cropSummary[1, 2]/sum(cropSummary[, 2]) * 100
secondCropPercentage <- cropSummary[2, 2]/sum(cropSummary[, 2]) * 100
firstPropertyPercentage <- propertySummary[1, 2]/sum(propertySummary[, 2]) * 
    100
secondPropertyPercentage <- propertySummary[2, 2]/sum(propertySummary[, 2]) * 
    100

As you can see from the figures and the table hail is the most harmful to economic in terms of crop damage. It is responsible for 42.066% of the total crop damage with second of flash flood with percentage of 13.006%

Same analysis with property damage reveals that tornado is the most harmful to economic in terms of property damage. It is responsible for 29.5122% of the total property damage with second of flash flood with percentage of 13.0472%