The following analysis works on the National Oceanic and Atmospheric Administration (NOAA) data set to discover which are the weather events with most severe effects on human health and property.
First, we worked on cleaning the data set in order to match event type names and discard data before 1996, when current event name standards were introduced.
We have analysed four variables: fatalities, injuries, property damage and crop damage. For each event types of the 48 official ones, we’ve computed total and average values for each variable.
Creating general settings and loading libraries.
library(knitr)
opts_chunk$set(echo = TRUE,
message = TRUE,
cache = TRUE)
library(dplyr)
library(tools)
library(stringdist)
library(VennDiagram)
Downloading data.
URL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
name <- "StormData.csv.bz2"
download.file(URL, destfile = name)
To make sure data has not been changed in any way, let’s create a checksum to verify integrity. Using the function md5sum() from tools package, we generated a md5 hash of the raw file: df4aa61fff89427db6b7f7b1113b5553. Let’s check if the hashes match.
original.hash <- "df4aa61fff89427db6b7f7b1113b5553"
raw.hash <- md5sum(name)[[1]]
if (original.hash == raw.hash) {
print("The checksums match. File integrity verified.")
} else {
print("ATTENTION! Hashes do not match. The file you downloaded may not be identical to the one used when this analysis was originally done. It might have been corrupted or altered in some way.")
}
## [1] "The checksums match. File integrity verified."
Loading data.
DataStorm <- read.csv(name)
str(DataStorm)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
The NOAA website states that, although data began being colected in 1950, for the first years it only recorded tornados and for the next decades only three other event types were added. From 1996 onwards, the data base began recording a set of 48 event types. To make our comparisons more suitable, we’ll work with a subset of the original data base comprising the records happening in 1996 or later.
Storm96 <- DataStorm %>%
mutate(Date = as.Date(BGN_DATE, format = "%m/%d/%Y")) %>%
filter(Date >= "1996-01-01")
The original data set has 902297 rows, while our subset has 653530 (72.43%)
Working from the previous subset (all events that happened after January 1, 1996), we will standardize the names of the event types.
From 1996 onwards, the data base began working with an official list of 48 event types. We will transform this list in a object in our environment.
list.of.events <- tolower(c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme Cold/Wind Chill", "Flash Flood", "Flood", "Frost/Freeze", "Funnel Cloud", "Freezing Fog", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane Typhoon", "Ice Storm", "Lake-Effect Snow", "Lakeshore Flood", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Surge/Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather"))
The number of unique event types in the data set does not match the official list. There are 48 events in the official list and 516 different event names in the data set.
Let’s work with the names of event types in our subset to try and match them to the official list. First, we’ll try to standardize the names by:
1. making everything lower case;
2. deleting unwanted spaces;
3. deleting parenthesis;
4. changing the common “tstm” abbreviation to “thunderstorm”.
before <- length(unique(Storm96$EVTYPE))
Storm96$EVTYPE <- Storm96$EVTYPE %>%
as.character %>%
tolower %>%
gsub(pattern = "^\\s+|\\s+$", replacement = "") %>%
gsub(pattern = "\\s+", replacement = " ") %>%
gsub(pattern = "\\(|\\)", replacement = "") %>%
gsub(pattern = "tstm", replacement = "thunderstorm")
after <- length(unique(Storm96$EVTYPE))
There were 516 different events before standardization and 421 after, still a lot more than the 48 from the official list.
Let’s work on matching the different names list.
Storm96.events <- Storm96$EVTYPE %>%
table %>%
as.data.frame %>%
`names<-` (c("Original_name", "Frequency"))
match.names <- paste("^",
Storm96.events$Original_name,
"$",
sep = "") %>%
lapply(grep,
list.of.events,
ignore.case = T,
value = T)
how.many.matches <- match.names %>%
lapply(length) %>%
unlist %>%
table %>%
data.frame %>%
`colnames<-`(c("Number.of.matches", "Frequency")) %>%
print
## Number.of.matches Frequency
## 1 0 375
## 2 1 46
There were only 46 exact matches, but there are 48 names in the official list, so some of the official events found no equivalent in our data set. They are: debris flow, hurricane typhoon.
There are still 375 event names that had no match in the official list. Let’s create a table with summary information on each event type name and see how are the rows distributed across the matched and unmatched names.
match.number <- vector("list", length = length(match.names))
match.factor <- vector("list", length = length(match.names))
for (i in 1:length(match.names)) {
match.number[i] <- length(match.names[[i]])
if (match.number[i] == 0) {
match.factor[i] <- "No match"
} else if (match.number[i] == 1) {
match.factor[i] <- "Exact match"
} else if (match.number[i] >= 1) {
match.factor[i] <- "Multiple matches"
} else {
match.factor[i] <- "Other"
}
}
match.table <- cbind(as.character(Storm96.events$Original_name),
Storm96.events$Frequency,
match.number,
Match_factor = match.factor) %>%
as.data.frame %>%
rename(Original_name = V1,
Frequency = V2,
Number_of_matches = match.number)
class(match.table$Original_name) <- "character"
class(match.table$Frequency) <- "numeric"
match.table$Match_factor <- match.table$Match_factor %>%
unlist %>%
as.factor
table.of.match.groups <- match.table %>%
group_by(Match_factor) %>%
summarise(Frequency = sum(Frequency),
Percentage = sum(Frequency) * 100 / nrow(Storm96)) %>%
print
## # A tibble: 2 × 3
## Match_factor Frequency Percentage
## <fctr> <dbl> <dbl>
## 1 Exact match 639474 97.849219
## 2 No match 14056 2.150781
Only 2.15% of rows correspond to unmatched names (although they amount to 375 different entries).
To minimize loss of data, we’ll try to manually match event name entries that correspond to more than 500 rows but were not matched.
maybe.match <- match.table %>%
filter(Match_factor == "No match") %>%
filter(Frequency >= 500) %>%
print
## Original_name Frequency Number_of_matches Match_factor
## 1 extreme cold 617 0 No match
## 2 fog 532 0 No match
## 3 landslide 588 0 No match
## 4 thunderstorm wind/hail 1028 0 No match
## 5 urban/sml stream fld 3392 0 No match
## 6 wild/forest fire 1443 0 No match
## 7 winter weather/mix 1104 0 No match
There are 7 unmatched event names with more than 500 hundred rows. With the aid of the Storm Data Preparation document, we matched them one by one. Then, the modified names were inserted back into the original subset Storm96.
matched.list <- c("extreme cold/wind chill", "dense fog", "debris flow", "winter storm", "heavy rain", "wildfire", "winter weather")
manual.match <- cbind(matched.list, maybe.match$Original_name)
for(i in 1:nrow(manual.match)) {
Storm96$EVTYPE <- gsub(pattern = paste("^",
manual.match[[i,2]],
"$",
sep = ""),
replacement = manual.match[[i,1]],
x = Storm96$EVTYPE)
}
Let’s rerun part of the previous code to check the impact of the manual match of data.
Storm96.events.2 <- Storm96$EVTYPE %>%
table %>%
as.data.frame %>%
`names<-` (c("Original_name", "Frequency"))
match.names.2 <- paste("^",
Storm96.events.2$Original_name,
"$",
sep = "") %>%
lapply(grep,
list.of.events,
ignore.case = T,
value = T)
how.many.matches.2 <- match.names.2 %>%
lapply(length) %>%
unlist %>%
table %>%
data.frame %>%
`colnames<-`(c("Number.of.matches", "Frequency")) %>%
print
## Number.of.matches Frequency
## 1 0 368
## 2 1 47
match.number.2 <- vector("list", length = length(match.names.2))
match.factor.2 <- vector("list", length = length(match.names.2))
for (i in 1:length(match.names.2)) {
match.number.2[i] <- length(match.names.2[[i]])
if (match.number.2[i] == 0) {
match.factor.2[i] <- "No match"
} else if (match.number.2[i] == 1) {
match.factor.2[i] <- "Exact match"
} else if (match.number.2[i] >= 1) {
match.factor.2[i] <- "Multiple matches"
} else {
match.factor.2[i] <- "Other"
}
}
match.table.2 <- cbind(as.character(Storm96.events.2$Original_name),
Storm96.events.2$Frequency,
match.number.2,
Match_factor = match.factor.2) %>%
as.data.frame %>%
rename(Original_name = V1,
Frequency = V2,
Number_of_matches = match.number.2)
class(match.table.2$Original_name) <- "character"
class(match.table.2$Frequency) <- "numeric"
match.table.2$Match_factor <- match.table.2$Match_factor %>%
unlist %>%
as.factor
table.of.match.groups.2 <- match.table.2 %>%
group_by(Match_factor) %>%
summarise(Frequency = sum(Frequency),
Percentage = sum(Frequency) * 100 / nrow(Storm96)) %>%
print
## # A tibble: 2 x 3
## Match_factor Frequency Percentage
## <fctr> <dbl> <dbl>
## 1 Exact match 648178 99.1810628
## 2 No match 5352 0.8189372
We have now more than 99% of the rows of our subset matching one entry in the official list.
There are 47 matched events in the data set, but 48 in the official list, so there seems to be one single event type that has had zero matches. Let’s found out which.
unmatched <- setdiff(list.of.events,
match.table.2 %>%
filter(Match_factor == "Exact match") %>%
`[`(,1)) %>%
print
## [1] "hurricane typhoon"
Let’s make a little bit more of an effort to find some matches for it.
hur.typ.match <- lapply(c("typhoon", "hurricane"),
grep,
match.table.2$Original_name,
value = T) %>%
unlist %>%
unique
hur.typ.table <- match.table.2 %>%
filter(Original_name %in% hur.typ.match) %>%
print
## Original_name Frequency Number_of_matches Match_factor
## 1 hurricane 170 0 No match
## 2 hurricane edouard 2 0 No match
## 3 hurricane/typhoon 88 0 No match
## 4 typhoon 11 0 No match
Let’s change those names to the standard format.
for(i in 1:nrow(hur.typ.table)) {
Storm96$EVTYPE <- gsub(pattern = paste("^",
hur.typ.table[[i,1]],
"$",
sep = ""),
replacement = unmatched,
x = Storm96$EVTYPE)
}
Finally, let’s discard the unmatched rows.
Storm.96.std <- Storm96 %>%
filter(EVTYPE %in% list.of.events)
We’re now ready to work on it. But before that, let’s visualize how row selection affected the size of our final data set.
The following Venn diagram shows the difference between the number of rows in the original data set and in our final subset.
no.color <- rgb(red = 0, blue = 0, green = 0, alpha = 0)
Dataset.rows <- nrow(DataStorm)
Subset.rows = nrow(Storm.96.std)
plot.new()
venn.plot <- draw.pairwise.venn(area1 = Dataset.rows,
area2 = Subset.rows,
cross.area = Subset.rows,
lty = 0,
fill = c("blue", "red"),
alpha = c(0.5, 0.5),
category = c(" Original data set,
902,297 rows",
" Final subset,
648,449 rows"),
cat.pos = c(50, 250),
cat.dist = c(-0.038, -0.02),
cat.cex = 1.2,
cat.col = c("black", "white"),
label.col = rep(no.color, 3),
ext.text = F)
grid.draw(venn.plot)
title("Proportion of data sets", outer = T, line = -1)
Which types of events are most harmful to population health?
Let’s create a subset from the original data set showing health-related summary statistics for each event type in the data set.
That was accomplished by selecting only the variables EVTYPE, FATALITIES and INJURIES from the original data, then grouping it by EVTYPE.
Next, a summary table was created that included: the total number of occurences for that event type; the total and mean number of fatalities; and the total and mean number of injured people.
storm.health <- Storm.96.std %>%
select(EVTYPE, FATALITIES, INJURIES) %>%
group_by(EVTYPE) %>%
summarise(Occurrences = n(),
Total.deaths = sum(FATALITIES),
Average.deaths = FATALITIES %>%
mean %>%
round(3),
Total.injured = sum(INJURIES),
Average.injured = INJURIES %>%
mean %>%
round(3)) %>%
print
## # A tibble: 48 x 6
## EVTYPE Occurrences Total.deaths Average.deaths
## <chr> <int> <dbl> <dbl>
## 1 astronomical low tide 174 0 0.000
## 2 avalanche 378 223 0.590
## 3 blizzard 2633 70 0.027
## 4 coastal flood 596 3 0.005
## 5 cold/wind chill 539 95 0.176
## 6 debris flow 588 37 0.063
## 7 dense fog 1725 69 0.040
## 8 dense smoke 10 0 0.000
## 9 drought 2433 0 0.000
## 10 dust devil 136 2 0.015
## # ... with 38 more rows, and 2 more variables: Total.injured <dbl>,
## # Average.injured <dbl>
The 5 deadliest types of weather events are the following:
storm.dead <- storm.health %>%
select(EVTYPE, Average.deaths, Total.deaths, Occurrences) %>%
arrange(desc(Average.deaths)) %>%
head(5) %>%
print
## # A tibble: 5 x 4
## EVTYPE Average.deaths Total.deaths Occurrences
## <chr> <dbl> <dbl> <int>
## 1 tsunami 1.650 33 20
## 2 excessive heat 1.085 1797 1656
## 3 rip current 0.787 340 432
## 4 avalanche 0.590 223 378
## 5 hurricane typhoon 0.461 125 271
The 5 event types with highest average number of injured people are the following:
storm.inj <- storm.health %>%
select(EVTYPE, Average.injured, Total.injured, Occurrences) %>%
arrange(desc(Average.injured)) %>%
head(5) %>%
print
## # A tibble: 5 x 4
## EVTYPE Average.injured Total.injured Occurrences
## <chr> <dbl> <dbl> <int>
## 1 tsunami 6.450 129 20
## 2 hurricane typhoon 4.900 1328 271
## 3 excessive heat 3.859 6391 1656
## 4 heat 1.707 1222 716
## 5 dust storm 0.902 376 417
Putting together both the top 5 events in both lists (average injured people and average number of deaths), we get this list of most harmful events for human health:
harmful.index <- storm.dead$EVTYPE %>%
c(storm.inj$EVTYPE) %>%
unique %>%
match(storm.health$EVTYPE)
most.harmful <- storm.health[harmful.index, c(1, 4, 6)] %>%
arrange(desc(Average.deaths)) %>%
print
## # A tibble: 7 x 3
## EVTYPE Average.deaths Average.injured
## <chr> <dbl> <dbl>
## 1 tsunami 1.650 6.450
## 2 excessive heat 1.085 3.859
## 3 rip current 0.787 0.484
## 4 avalanche 0.590 0.413
## 5 hurricane typhoon 0.461 4.900
## 6 heat 0.331 1.707
## 7 dust storm 0.026 0.902
Which types of events are responsible for most material damage?
Let’s create a subset from the original data set showing property damage summary statistics for each event type in the data set. We will separate material damages into property damage and crop damage.
First, we have to change the values in the exponent columns (PROPEXP and CROPEXP) from factors to the actual numbers they represent.
storm.damage <- Storm.96.std %>%
select(EVTYPE, PROPDMG,
PROPDMGEXP, CROPDMG, CROPDMGEXP)
exps <- c("B", "b", "K", "k", "M", "m", "H", "h", "+", "-", "?", "", 0:9)
values <- c(10^9, 10^9, 10^3, 10^3, 10^6, 10^6, 10^2, 10^2, 1, 0, 0, 0, rep(10, 10))
exp.table <- data.frame(exps, values)
for(i in 1:nrow(exp.table)){
storm.damage$PROPDMGEXP <- storm.damage$PROPDMGEXP %>%
gsub(pattern = paste("^", exp.table[i,1], "$", sep = ""),
replacement = exp.table[i,2])
storm.damage$CROPDMGEXP <- storm.damage$CROPDMGEXP %>%
gsub(pattern = paste("^", exp.table[i,1], "$", sep = ""),
replacement = exp.table[i,2])
}
Now, let’s group by event type and create summmary statistics for property damage that include: the total number of occurencies for that event type and the total and mean number of property damage.
storm.damage.prop <- storm.damage %>%
group_by(EVTYPE) %>%
summarise(Average.damage = PROPDMGEXP %>%
as.numeric %>%
`*` (PROPDMG) %>%
mean %>%
round,
Property.damage = sum(PROPDMG * as.numeric(PROPDMGEXP)) %>%
format(big.mark = ","),
Occurrences = n()) %>%
arrange((desc(Average.damage)))
storm.damage.prop$Average.damage <- storm.damage.prop$Average.damage %>%
format(big.mark = ",")
The six event types that cause the highest average property damage per occurence are:
head(storm.damage.prop, 6)
## # A tibble: 6 x 4
## EVTYPE Average.damage Property.damage Occurrences
## <chr> <chr> <chr> <int>
## 1 hurricane typhoon 301,545,716 81,718,889,010 271
## 2 storm surge/tide 31,359,378 4,641,188,000 148
## 3 tropical storm 11,205,976 7,642,475,550 682
## 4 tsunami 7,203,100 144,062,000 20
## 5 flood 5,936,359 143,944,833,550 24248
## 6 ice storm 1,938,397 3,642,248,810 1879
Now, let’s do the same with crop damage.
storm.damage.crop <- storm.damage %>%
group_by(EVTYPE) %>%
summarise(Average.damage = CROPDMGEXP %>%
as.numeric %>%
`*` (CROPDMG) %>%
mean %>%
round,
Property.damage = sum(CROPDMG * as.numeric(CROPDMGEXP)) %>%
format(big.mark = ","),
Occurrences = n()) %>%
arrange((desc(Average.damage)))
storm.damage.crop$Average.damage <- storm.damage.crop$Average.damage %>%
format(big.mark = ",")
The six event types that have the highest average crop damage per occurence are:
head(storm.damage.crop, 6)
## # A tibble: 6 x 4
## EVTYPE Average.damage Property.damage Occurrences
## <chr> <chr> <chr> <int>
## 1 hurricane typhoon 19,742,095 5,350,107,800 271
## 2 drought 5,494,273 13,367,566,000 2433
## 3 tropical storm 993,711 677,711,000 682
## 4 frost/freeze 814,733 1,094,186,000 1343
## 5 extreme cold/wind chill 808,538 1,309,023,000 1619
## 6 excessive heat 297,344 492,402,000 1656
Out of the 48 weather event types identified by NOAA, those 7 are the most harmful to human health (both the average number of fatalities caused and injuries):
print(most.harmful)
## # A tibble: 7 x 3
## EVTYPE Average.deaths Average.injured
## <chr> <dbl> <dbl>
## 1 tsunami 1.650 6.450
## 2 excessive heat 1.085 3.859
## 3 rip current 0.787 0.484
## 4 avalanche 0.590 0.413
## 5 hurricane typhoon 0.461 4.900
## 6 heat 0.331 1.707
## 7 dust storm 0.026 0.902
Out of the 48 weather event types identified by NOAA, those 48 are the ones that cause most property damage:
print(storm.damage.prop)
## # A tibble: 48 x 4
## EVTYPE Average.damage Property.damage Occurrences
## <chr> <chr> <chr> <int>
## 1 hurricane typhoon 301,545,716 81,718,889,010 271
## 2 storm surge/tide 31,359,378 4,641,188,000 148
## 3 tropical storm 11,205,976 7,642,475,550 682
## 4 tsunami 7,203,100 144,062,000 20
## 5 flood 5,936,359 143,944,833,550 24248
## 6 ice storm 1,938,397 3,642,248,810 1879
## 7 wildfire 1,858,790 7,760,449,500 4175
## 8 tornado 1,063,183 24,616,945,710 23154
## 9 debris flow 552,003 324,578,000 588
## 10 drought 429,963 1,046,101,000 2433
## # ... with 38 more rows
Out of the 48 weather event types identified by NOAA, those 48 are the ones that cause most property damage:
print(storm.damage.crop)
## # A tibble: 48 x 4
## EVTYPE Average.damage Property.damage Occurrences
## <chr> <chr> <chr> <int>
## 1 hurricane typhoon 19,742,095 5,350,107,800 271
## 2 drought 5,494,273 13,367,566,000 2433
## 3 tropical storm 993,711 677,711,000 682
## 4 frost/freeze 814,733 1,094,186,000 1343
## 5 extreme cold/wind chill 808,538 1,309,023,000 1619
## 6 excessive heat 297,344 492,402,000 1656
## 7 flood 205,162 4,974,778,400 24248
## 8 wildfire 96,349 402,255,130 4175
## 9 heavy rain 49,374 736,657,900 14920
## 10 debris flow 34,043 20,017,000 588
## # ... with 38 more rows
Put together, these are the types of weather events that create more trouble:
c(storm.damage.crop$EVTYPE %>% head(6),
storm.damage.prop$EVTYPE %>% head(6),
most.harmful$EVTYPE) %>%
unique
## [1] "hurricane typhoon" "drought"
## [3] "tropical storm" "frost/freeze"
## [5] "extreme cold/wind chill" "excessive heat"
## [7] "storm surge/tide" "tsunami"
## [9] "flood" "ice storm"
## [11] "rip current" "avalanche"
## [13] "heat" "dust storm"
This report was written as part of the second project assignment of the Reproducible Research course, taught by Johns Hopkins University through Coursera. It is the fifth couse in the Data Science Specialization.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
More information about the data set might be found here:
1. Storm Data Preparation, a documentation of the data set.
2. Storm Data FAQ Page.
3. The NOAA website.
sessionInfo()
## R version 3.4.1 (2017-06-30)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 10240)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=Portuguese_Brazil.1252 LC_CTYPE=Portuguese_Brazil.1252
## [3] LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C
## [5] LC_TIME=Portuguese_Brazil.1252
##
## attached base packages:
## [1] grid tools stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] bindrcpp_0.2 VennDiagram_1.6.17 futile.logger_1.4.3
## [4] stringdist_0.9.4.6 dplyr_0.7.2 knitr_1.16
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.12 bindr_0.1 magrittr_1.5
## [4] R6_2.2.2 rlang_0.1.1 stringr_1.2.0
## [7] parallel_3.4.1 lambda.r_1.1.9 htmltools_0.3.6
## [10] yaml_2.1.14 assertthat_0.2.0 rprojroot_1.2
## [13] digest_0.6.12 tibble_1.3.3 codetools_0.2-15
## [16] futile.options_1.0.0 glue_1.1.1 evaluate_0.10.1
## [19] rmarkdown_1.6 stringi_1.1.5 compiler_3.4.1
## [22] backports_1.1.0 pkgconfig_2.0.1