This study made for the Reproducible Research course on Coursera

Synopsis: The data to this project were obtained from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This study briefly analyzes the impact of major storms and weather events on population in the United States. In the analysis I will concentrate especially on injuries and economic changes. My paper will examine the differences between types of this harmful events. The goal is to show which type of events caused the most damage to the counties of the United States. The two main questions I would like to get the answers at the end of this study:
1. Across the United States, which types of events are most harmful with respect to population health?
2. Across the United States, which types of events have the greatest economic consequences?
Below I will illustrate my results with numbers and figures. Finally I will describe my conclusions and I will answer to the two main issues.

Loading the data

The data come from a bz2 file. I download the zip file with download.file() command and then I load them with read.table() where I use the comma sign for data separation.

.exdir = "c:\\DIANA\\Coursera\\Reproducible Research\\temp_new"
dir.create(.exdir)
## Warning: 'c:\DIANA\Coursera\Reproducible Research\temp_new' already exists
.file = file.path(.exdir, "repdata%2Fdata%2FStormData.csv.bz2")

url = "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, .file)

data <- read.table(bzfile(.file, "repdata%2Fdata%2FStormData.csv"), header = TRUE, sep = ',')
head(data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

Examination of numbers of injuries because of major events

I summarize numbers of injuries to certain types of events. I ignore where there was no damage (i.e injuries = 0).
I calculate how much types of events gave rise to some harm and which type was the most powerful for the population.

Injuries_data <- aggregate(INJURIES ~ EVTYPE, data, sum)
Main_Injuries_data <- Injuries_data[Injuries_data$INJURIES != 0,]
head(Main_Injuries_data)
##          EVTYPE INJURIES
## 19    AVALANCHE      170
## 29    BLACK ICE       24
## 30     BLIZZARD      805
## 42 blowing snow        1
## 44 BLOWING SNOW       13
## 49   BRUSH FIRE        2
dim(Main_Injuries_data)[1]
## [1] 158
M <- max(Main_Injuries_data$INJURIES)
Max_injury_type <- Main_Injuries_data[Main_Injuries_data$INJURIES == M,]
Max_injury_type
##      EVTYPE INJURIES
## 834 TORNADO    91346

The following I create a histogram to the six most powerful events in United States.

library(ggplot2)
sortdata <- Main_Injuries_data[order(Main_Injuries_data$INJURIES),]
Majorevents <- tail(sortdata)
Majorevents
##             EVTYPE INJURIES
## 275           HEAT     2100
## 464      LIGHTNING     5230
## 130 EXCESSIVE HEAT     6525
## 170          FLOOD     6789
## 856      TSTM WIND     6957
## 834        TORNADO    91346
d <- ggplot(Majorevents)
d <- d + geom_histogram(aes(x = Majorevents$EVTYPE, y = Majorevents$INJURIES), stat = "identity", color = "red", fill = "red")
d <- d + labs(title = "Histogram to the most powerful events in US")
d <- d + labs(x = "Type of Events", y = "Numbers of Injuries")
d

plot of chunk unnamed-chunk-3

Examination of numbers of damages in Economy

I summarize numbers of damage of properties (PROPDMG) and numbers of damage of crop (CROPDMG) to certain types of events. I ignore where there was no damage (i.e PROPDMG = 0 and DROPDMG = 0).
I calculate how much types of events gave rise to some harm and which type was the most powerful for the economy. Then I make a dataframe where I collect the harmful events with the damages.

Property_data <- aggregate(PROPDMG ~ EVTYPE, data, sum)
Main_Property_data <- Property_data[Property_data$PROPDMG != 0,]
sortdata1 <- Main_Property_data[order(Main_Property_data$PROPDMG),]
Majorevents_P <- tail(sortdata1)
Majorevents_P_New <- Majorevents_P[order(Majorevents_P$EVTYPE),]
Majorevents_P_New
##                EVTYPE PROPDMG
## 153       FLASH FLOOD 1420125
## 170             FLOOD  899938
## 244              HAIL  688693
## 760 THUNDERSTORM WIND  876844
## 834           TORNADO 3212258
## 856         TSTM WIND 1335966
Crop_data <- aggregate(CROPDMG ~ EVTYPE, data, sum)
Main_Crop_data <- Crop_data[Crop_data$CROPDMG != 0,]
sortdata2 <- Main_Crop_data[order(Main_Crop_data$CROPDMG),]
Majorevents_C <- tail(sortdata2)
Majorevents_C_New <- Majorevents_C[order(Majorevents_C$EVTYPE),]
Majorevents_C_New
##                EVTYPE CROPDMG
## 153       FLASH FLOOD  179200
## 170             FLOOD  168038
## 244              HAIL  579596
## 760 THUNDERSTORM WIND   66791
## 834           TORNADO  100019
## 856         TSTM WIND  109203
df <- data.frame(Majorevents_P_New$EVTYPE, Majorevents_P_New$PROPDMG, Majorevents_C_New$CROPDMG)
df
##   Majorevents_P_New.EVTYPE Majorevents_P_New.PROPDMG
## 1              FLASH FLOOD                   1420125
## 2                    FLOOD                    899938
## 3                     HAIL                    688693
## 4        THUNDERSTORM WIND                    876844
## 5                  TORNADO                   3212258
## 6                TSTM WIND                   1335966
##   Majorevents_C_New.CROPDMG
## 1                    179200
## 2                    168038
## 3                    579596
## 4                     66791
## 5                    100019
## 6                    109203

When I get the dataframe I draw a plot where I can see the different between damages of properties and crop.

library(ggplot2)
library(scales)
E <- ggplot()
E <- E + geom_point(data = df, aes(x = Majorevents_P_New$EVTYPE, y = Majorevents_P_New$PROPDMG, color = "Properties", stat = "identity", group = 2), size = 3)
E <- E + geom_line(data = df, aes(x = Majorevents_P_New$EVTYPE, y = Majorevents_P_New$PROPDMG, color = "Properties", stat = "identity", group = 2), size = 1)
E <- E + geom_point(data = df, aes(x = Majorevents_P_New$EVTYPE, y = Majorevents_C_New$CROPDMG, color = "Crop", stat = "identity", group = 3), size = 3)
E <- E + geom_line(data = df, aes(x = Majorevents_P_New$EVTYPE, y = Majorevents_C_New$CROPDMG, color = "Crop", stat = "identity", group = 3), size = 1)
E <- E + labs(title = "Damage in Property and Crop")
E <- E + labs(x = "Type of events", y = "Numbers of Damage")
E <- E + scale_colour_manual("Damage", values = c("Properties" = "green","Crop" = "red"))
E <- E + scale_y_continuous(labels = comma)
E <- E + theme(axis.text.x = element_text(colour = 'black', angle = 30, size = 8, hjust = 1, vjust = 1))
E

plot of chunk unnamed-chunk-5

Conclusions and Answers

  1. Across the United States, which types of events are most harmful with respect to population health?
    In the first part of my analysis I’ve showed that the total numbers of worthwhile events were 158. The most harmful event was TORNADO from them. I’ve showed with a histogram that from the six most powerful events TORNADOs were the far most harmful for the population health.

  2. Across the United States, which types of events have the greatest economic consequences?
    In the second part of my analysis I’ve calculated the numbers of damage in Properties and Crop. I’ve realized that most of the damage were caused by the same six events in properties and crop. Then I could make a common plot for those types of damages.
    The plot show that those powerful events caused more damages in properties then in crop.
    From the common figure the following things can be seen:
    TORNADO has caused the greatest damage in properties.
    HAIL has caused the greatest damage in crop.