In this report we analyze storms and other severe weather events occurred in the U.S.A. We discover the types of events most harmful to population health and to the economics. The data comes from the NOAA Storm Database. The events are registered from the year 1950 to 2011. In this analysis we ignore the differences between the years. We discover that tornado and excessive heat are the most harmful to population health, while flood, hurricane/typhoon and tornado have the most impact to economics.
descr.url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf"
data.url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
data.fn <- "repdata-data-StormData.csv.bz2"
We used the storm data and the dataset description. Every observation is supposed to be related to one severe weather event. The variables describe the event type, and different aspects of damages produced by the event.
Follow the first 6 events.
download.file(data.url, destfile = data.fn, method = "curl")
df <- read.csv(bzfile(data.fn))
head(df[, c(8,23:28)])
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
The EVTYPE variable indicates type of event. The other variables are useful to determine the impact of the event.
We choose the FATALITIES and the INJURIES variables as indicators of the event’s impact to population health. Each of them counts how many fatalities and injuries respectively are due to the event.
sum(is.na(df$FATALITIES))
## [1] 0
sum(is.na(df$INJURIES))
## [1] 0
There are no missing data for those variables. There is no need to replace missing data.
The PROPDMG and CROPDMG variables are suitable as indicators of damage value. They correspond to the property damage and to the crop damage expressed in U.S. dollars together with magnitudes PROPDMGEXP and CROPDMGEXP. The magnitudes are: K for thousands, M for millions, and B for billions.
sum(is.na(df$PROPDMG))
## [1] 0
sum(is.na(df$CROPDMG))
## [1] 0
There are no missing data for those variables. There is no need to replace missing data.
We calculate the damage values multiplying PROPDMG and CROPDMG by the magnitude:
df$PROPV <- df$PROPDMG *
ifelse(df$PROPDMGEXP == "B", 1e9,
ifelse(df$PROPDMGEXP == "M", 1e6,
ifelse(df$PROPDMGEXP == "K", 1e3, 1)))
df$CROPV <- df$CROPDMG *
ifelse(df$CROPDMGEXP == "B", 1e9,
ifelse(df$CROPDMGEXP == "M", 1e6,
ifelse(df$CROPDMGEXP == "K", 1e3, 1)))
We use the calculated values in our analysis.
We total up the indicators chosen for each type of event:
library(data.table)
dt <- data.table(df)
by.type <- dt[, list(FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES),
PROPDMG = sum(PROPV), CROPDMG = sum(CROPV)),
by = EVTYPE]
The totals correspond to the whole period of observation, from 1950 to 2011. The maximum values for our indicators among all the event types are:
max(by.type$FATALITIES)
## [1] 5633
max(by.type$INJURIES)
## [1] 91346
max(by.type$PROPDMG)
## [1] 144657709807
max(by.type$CROPDMG)
## [1] 13972566000
The maximum values are useful to select the gravest types of events.
We select the most grave type of events harmful to population health, corresponding to more then 100 fatalities or to more then 1000 injuries. Every type is plotted with fatalities and injuries coordinates in logarithmic scale because of great skewness of the values.
library(ggplot2)
grave <- by.type$FATALITIES > 1e2 | by.type$INJURIES > 1e3;
qplot(FATALITIES, INJURIES, data = by.type[grave,], log = "xy") +
geom_text(by.type[grave,], mapping=aes(x=FATALITIES, y=INJURIES, label=EVTYPE), size=2, vjust=3) +
labs(x = "Fatalities score", y = "Injuries score")
From this plot one can see that tornado has more impact to the fatality score, followed by excessive heat. The injuries score is roughly proportional to the fatality score.
We select the type of events with the greatest economic consequences, corresponding to the property damage above 1e10 USD or to crop damage above 1e8 USD. The values are plotted again in logarithmic scale.
grave <- by.type$PROPDMG > 1e10 | by.type$CROPDMG > 1e8;
qplot(CROPDMG, PROPDMG, data = by.type[grave,], log = "xy") +
geom_text(by.type[grave,], mapping=aes(x=CROPDMG, y=PROPDMG, label=EVTYPE), size=2, vjust=3) +
labs(x = "Crop damage (USD)", y = "Property damage (USD)")
From this plot one can see that the gravest property damage is due to flood, followed by hurricane/typhoon and tornado. The maximum crop damage does not exceed the property value for the three mentioned event types.