This study made for the Reproducible Research course on Coursera
Synopsis: The data to this project were obtained from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This study briefly analyzes the impact of major storms and weather events on population in the United States. In the analysis I will concentrate especially on injuries and economic changes. My paper will examine the differences between types of this harmful events. The goal is to show which type of events caused the most damage to the counties of the United States. The two main questions I would like to get the answers at the end of this study:
1. Across the United States, which types of events are most harmful with respect to population health?
2. Across the United States, which types of events have the greatest economic consequences?
Below I will illustrate my results with numbers and figures. Finally I will describe my conclusions and I will answer to the two main issues.
The data come from a bz2 file. I download the zip file with download.file() command and then I load them with read.table() where I use the comma sign for data separation.
.exdir = "c:\\DIANA\\Coursera\\Reproducible Research\\temp_new"
dir.create(.exdir)
## Warning: 'c:\DIANA\Coursera\Reproducible Research\temp_new' already exists
.file = file.path(.exdir, "repdata%2Fdata%2FStormData.csv.bz2")
url = "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, .file)
data <- read.table(bzfile(.file, "repdata%2Fdata%2FStormData.csv"), header = TRUE, sep = ',')
head(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
I summarize numbers of injuries to certain types of events. I ignore where there was no damage (i.e injuries = 0).
I calculate how much types of events gave rise to some harm and which type was the most powerful for the population.
Injuries_data <- aggregate(INJURIES ~ EVTYPE, data, sum)
Main_Injuries_data <- Injuries_data[Injuries_data$INJURIES != 0,]
head(Main_Injuries_data)
## EVTYPE INJURIES
## 19 AVALANCHE 170
## 29 BLACK ICE 24
## 30 BLIZZARD 805
## 42 blowing snow 1
## 44 BLOWING SNOW 13
## 49 BRUSH FIRE 2
dim(Main_Injuries_data)[1]
## [1] 158
M <- max(Main_Injuries_data$INJURIES)
Max_injury_type <- Main_Injuries_data[Main_Injuries_data$INJURIES == M,]
Max_injury_type
## EVTYPE INJURIES
## 834 TORNADO 91346
The following I create a histogram to the six most powerful events in United States.
library(ggplot2)
sortdata <- Main_Injuries_data[order(Main_Injuries_data$INJURIES),]
Majorevents <- tail(sortdata)
Majorevents
## EVTYPE INJURIES
## 275 HEAT 2100
## 464 LIGHTNING 5230
## 130 EXCESSIVE HEAT 6525
## 170 FLOOD 6789
## 856 TSTM WIND 6957
## 834 TORNADO 91346
d <- ggplot(Majorevents)
d <- d + geom_histogram(aes(x = Majorevents$EVTYPE, y = Majorevents$INJURIES), stat = "identity", color = "red", fill = "red")
d <- d + labs(title = "Histogram to the most powerful events in US")
d <- d + labs(x = "Type of Events", y = "Numbers of Injuries")
d
I summarize numbers of damage of properties (PROPDMG) and numbers of damage of crop (CROPDMG) to certain types of events. I ignore where there was no damage (i.e PROPDMG = 0 and DROPDMG = 0).
I calculate how much types of events gave rise to some harm and which type was the most powerful for the economy. Then I make a dataframe where I collect the harmful events with the damages.
Property_data <- aggregate(PROPDMG ~ EVTYPE, data, sum)
Main_Property_data <- Property_data[Property_data$PROPDMG != 0,]
sortdata1 <- Main_Property_data[order(Main_Property_data$PROPDMG),]
Majorevents_P <- tail(sortdata1)
Majorevents_P_New <- Majorevents_P[order(Majorevents_P$EVTYPE),]
Majorevents_P_New
## EVTYPE PROPDMG
## 153 FLASH FLOOD 1420125
## 170 FLOOD 899938
## 244 HAIL 688693
## 760 THUNDERSTORM WIND 876844
## 834 TORNADO 3212258
## 856 TSTM WIND 1335966
Crop_data <- aggregate(CROPDMG ~ EVTYPE, data, sum)
Main_Crop_data <- Crop_data[Crop_data$CROPDMG != 0,]
sortdata2 <- Main_Crop_data[order(Main_Crop_data$CROPDMG),]
Majorevents_C <- tail(sortdata2)
Majorevents_C_New <- Majorevents_C[order(Majorevents_C$EVTYPE),]
Majorevents_C_New
## EVTYPE CROPDMG
## 153 FLASH FLOOD 179200
## 170 FLOOD 168038
## 244 HAIL 579596
## 760 THUNDERSTORM WIND 66791
## 834 TORNADO 100019
## 856 TSTM WIND 109203
df <- data.frame(Majorevents_P_New$EVTYPE, Majorevents_P_New$PROPDMG, Majorevents_C_New$CROPDMG)
df
## Majorevents_P_New.EVTYPE Majorevents_P_New.PROPDMG
## 1 FLASH FLOOD 1420125
## 2 FLOOD 899938
## 3 HAIL 688693
## 4 THUNDERSTORM WIND 876844
## 5 TORNADO 3212258
## 6 TSTM WIND 1335966
## Majorevents_C_New.CROPDMG
## 1 179200
## 2 168038
## 3 579596
## 4 66791
## 5 100019
## 6 109203
When I get the dataframe I draw a plot where I can see the different between damages of properties and crop.
library(ggplot2)
library(scales)
E <- ggplot()
E <- E + geom_point(data = df, aes(x = Majorevents_P_New$EVTYPE, y = Majorevents_P_New$PROPDMG, color = "Properties", stat = "identity", group = 2), size = 3)
E <- E + geom_line(data = df, aes(x = Majorevents_P_New$EVTYPE, y = Majorevents_P_New$PROPDMG, color = "Properties", stat = "identity", group = 2), size = 1)
E <- E + geom_point(data = df, aes(x = Majorevents_P_New$EVTYPE, y = Majorevents_C_New$CROPDMG, color = "Crop", stat = "identity", group = 3), size = 3)
E <- E + geom_line(data = df, aes(x = Majorevents_P_New$EVTYPE, y = Majorevents_C_New$CROPDMG, color = "Crop", stat = "identity", group = 3), size = 1)
E <- E + labs(title = "Damage in Property and Crop")
E <- E + labs(x = "Type of events", y = "Numbers of Damage")
E <- E + scale_colour_manual("Damage", values = c("Properties" = "green","Crop" = "red"))
E <- E + scale_y_continuous(labels = comma)
E <- E + theme(axis.text.x = element_text(colour = 'black', angle = 30, size = 8, hjust = 1, vjust = 1))
E
Across the United States, which types of events are most harmful with respect to population health?
In the first part of my analysis I’ve showed that the total numbers of worthwhile events were 158. The most harmful event was TORNADO from them. I’ve showed with a histogram that from the six most powerful events TORNADOs were the far most harmful for the population health.
Across the United States, which types of events have the greatest economic consequences?
In the second part of my analysis I’ve calculated the numbers of damage in Properties and Crop. I’ve realized that most of the damage were caused by the same six events in properties and crop. Then I could make a common plot for those types of damages.
The plot show that those powerful events caused more damages in properties then in crop.
From the common figure the following things can be seen:
TORNADO has caused the greatest damage in properties.
HAIL has caused the greatest damage in crop.