Peer assignment 2
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
This document assumes the user has the NOAA storm database that is saved as a raw CSV file. It should be noted that this is a very large datafile, so it can take a couple of minutes to read the data into memory. The following code chunk sets the echo option toTRUE to allow for the analysis to be easily replicated by the user.
library(ggplot2)
library(lattice)
library(knitr)
library(plyr)
data2 <- read.csv("repdata_data_StormData.csv", nrows = 1e+05, header = T, sep = ",")
data <- data.frame(data2)
year <- (data$BGN_DATE)
newdata <- data$EVTYPE
In this analysis, health consequences refers to fatalities due to weather-related events. This study focuses on the 10 events that are associated with the most number of fatalities. This involves sorting the data frame by the number of fatalities, and then subsetting the sorted dataframe.
health.events <- data[order(-data$FATALITIES), ]
in this analysis, types of winds was measured as well as the consequences they cause as injuries and as fatalities
In this analysis, economic consequences was measured using the PROPDMGEXP variable. This is a factor variable corresponding to the monetary unit of the actual property damage that has a corresponding numeric estimate containing the scale for the numeric estimate. More specifically, the factor levels “K”, “M”, and “B” are used to denote estimates in thousands of dollars, millions of dollars, and billions of dollars (respectively). Other factor levels are included in PROPDMGEXP but these represent only a small percentage of the overall data points and their meaning is not contained in the documentation. Therefore, these other levels are excluded from the current analysis.
In this study, events that are associated with damages estimated in billions of dollars are considered to have the greatest economic consequences. The following code chunk subsets the dataframe to include only those variables associated with weather-events that are measured in billions of dollars. After identifying those estimated in billions of dollars, the ten most costliest events are identified by ordering the dataframe based on the numeric estimate of property damage (PROPDMG) and then subsetting the ordered data frame.
The following figures show the top 10 weather events that were associated with the greatest health and economic impacts. As indicated in the following figure, tornados were the most frequent weather-related event, representing over half of the top-10 weather-related events associated with fatalities. Heat and excessive heat represented the weather-related events that accounted for the most fatalities. However, it is unclear whether the single event of heat actually accounted for nearly 600 deaths. This issue cannot be resolved with the currently available documentation for the data source.
qplot(as.numeric(year), newdata, data = data, group = data$FATALITIES, color = data$FATALITIES,
geom = c("point", "line"), ylab = "type of wind", xlab = "frequency", main = "FATALITIES BY THE TYPE OF WIND")
data1 = data[order(data$FATALITIES, data$INJURIES, decreasing = TRUE), ]
data3 = data[order(data$PROPDMG, data$CROPDMG, decreasing = TRUE), ]
head(data3)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 13 1 2/13/1952 0:00:00 2130 CST 73 JEFFERSON AL
## 36 1 5/1/1953 0:00:00 1930 CST 27 CLAY AL
## 48 1 12/5/1954 0:00:00 1200 CST 81 LEE AL
## 49 1 12/5/1954 0:00:00 1330 CST 15 CALHOUN AL
## 121 1 4/8/1957 0:00:00 946 CST 93 MARION AL
## 124 1 4/8/1957 0:00:00 1030 CST 43 CULLMAN AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 13 TORNADO 0 NA NA NA NA 0
## 36 TORNADO 0 NA NA NA NA 0
## 48 TORNADO 0 NA NA NA NA 0
## 49 TORNADO 0 NA NA NA NA 0
## 121 TORNADO 0 NA NA NA NA 0
## 124 TORNADO 0 NA NA NA NA 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 13 NA 0 NA NA 0.0 200 3 0 1
## 36 NA 0 NA NA 12.1 440 4 0 7
## 48 NA 0 NA NA 19.4 100 3 0 0
## 49 NA 0 NA NA 24.7 100 3 0 0
## 121 NA 0 NA NA 51.4 100 3 0 0
## 124 NA 0 NA NA 16.3 33 3 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC
## 13 26 250 K 0 NA NA
## 36 12 250 K 0 NA NA
## 48 4 250 K 0 NA NA
## 49 26 250 K 0 NA NA
## 121 0 250 K 0 NA NA
## 124 0 250 K 0 NA NA
## ZONENAMES LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 13 NA 3336 8656 0 0 NA 13
## 36 NA 3313 8556 3318 8545 NA 36
## 48 NA 3241 8525 3240 8505 NA 48
## 49 NA 3347 8600 3355 8536 NA 49
## 121 NA 3407 8759 3419 8707 NA 121
## 124 NA 3418 8636 3423 8620 NA 124
economic.events <- subset(data, subset = PROPDMGEXP == "B")
economic.events <- economic.events[order(-economic.events$PROPDMG), ]
economic.events <- economic.events
dotplot(newdata ~ data$FATALITIES, data = health.events, xlab = "Weather-related Fatalities",
main = "Weather-related Fatalities", ylab = "Event type")
health.events$event.type <- as.character(health.events$newdata)
## Error: replacement has 0 rows, data has 100000
economic.events$event.type <- as.character(economic.events$newdata)
The following figure shows the amount of property damage in billions of dollars. It should be noted that the dollar values were log transformed due to high skewness. These data suggest that warm-weather storms with both wind and rain are associated with the greatest amount of property damage. Only one of the events was a winter storm.
qplot(as.numeric(year), data$INJURIES, data = data, group = data$FATALITIES,
color = data$FATALITIES, geom = c("point", "line"), ylab = "type of wind",
xlab = "frequency", main = "FATALITIES BY THE TYPE OF WIND")