Weather phenomenons can cause both public health and economic problems to our society and result in fatalities, injuries, and property damage. A better understanding of weather phenomenons and their effects allows us to approach these problems more efficiently.
This data analysis uses the storm database of the U.S. National Oceanic and Atmospheric Administration. The database includes major storms and weather events in the U.S. as well as when and where they occur and moreover estimates of any fatalities, injuries and property damage from 1950 till November 2011.
In this report we want to answer the following two questions for across the United States:
Which types of events are most harmful with respect to population health?
Which types of events have the greatest economic consequences?
The database is downloaded from cloudfront.net and read as .bz2 file. The documenation is downloaded separately as a .pdf file including the definitions of the variables of the storm database. Both files are loaded into the current workspace folder.
fileUrl <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = "stormdatabase.csv.bz2")
downloadedTimeData <- Sys.Date()
fileUrlDocumentation <- "http://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf"
download.file(fileUrlDocumentation, destfile = "documentation.pdf")
downloadedTimeDocumentation <- Sys.Date()
data <- read.csv(bzfile("stormdatabase.csv.bz2"), stringsAsFactors = FALSE)
We look at the structure of the downloaded data as we need to discern the necessary variables for our analysis.
str(data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
For this analysis we are interested in the event types and their respective damages to the economy and public health. Consequently the storm database is reduced to the following variables:
For the property damage and crop damage variables we need to adapt their values as they possess a column with supplementary information in their respective PROPDMGEXP and CROPDMGEXP variables. They describe the multiplier we need to use. H and h represent 100, K and k 1000, M and m a million, B and b a billion and furthermore a number from 0 to 9 representing the power of ten. Unfortunately the documentation does not describe all their respective values such that I had to do some research on google and coursera in order to figure out their meaning.
I decided to split the data into four subsets:
We create a subset by selecting only the event type, property damage and its respective supplementary column PROPDMGEXP. First we analyse the different values for PROPDMGEXP in order to adapt the transformation which is done separately for each value - it took too long with a for loop. Some transformations are done with help of the dplyr package.
library(dplyr)
property <- data %>% select(EVTYPE, PROPDMGEXP, PROPDMG)
#range of values in PROPDMGEXP
unique(property$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
#each value for PROPDMGEXP is tested separately and replaced by a numeric value
property[property$PROPDMGEXP == "", "PROPDMGEXP"] <- 0
property[property$PROPDMGEXP == "?", "PROPDMGEXP"] <- 0
property[property$PROPDMGEXP == "-", "PROPDMGEXP"] <- 0
property[property$PROPDMGEXP == "0", "PROPDMGEXP"] <- 1
property[property$PROPDMGEXP == "+", "PROPDMGEXP"] <- 1
property[property$PROPDMGEXP == "1", "PROPDMGEXP"] <- 10
property[property$PROPDMGEXP == "2", "PROPDMGEXP"] <- 100
property[property$PROPDMGEXP == "3", "PROPDMGEXP"] <- 1000
property[property$PROPDMGEXP == "4", "PROPDMGEXP"] <- 10000
property[property$PROPDMGEXP == "5", "PROPDMGEXP"] <- 100000
property[property$PROPDMGEXP == "6", "PROPDMGEXP"] <- 1000000
property[property$PROPDMGEXP == "7", "PROPDMGEXP"] <- 10000000
property[property$PROPDMGEXP == "8", "PROPDMGEXP"] <- 100000000
property[property$PROPDMGEXP == "h", "PROPDMGEXP"] <- 100
property[property$PROPDMGEXP == "H", "PROPDMGEXP"] <- 100
property[property$PROPDMGEXP == "K", "PROPDMGEXP"] <- 1000
property[property$PROPDMGEXP == "m", "PROPDMGEXP"] <- 1000000
property[property$PROPDMGEXP == "M", "PROPDMGEXP"] <- 1000000
property[property$PROPDMGEXP == "B", "PROPDMGEXP"] <- 1000000000
property$PROPDMGEXP <- as.numeric(property$PROPDMGEXP)
#multiplication of the values property damage and its multiplier PROPDMGEXP
property$PROPDMG <- property[, "PROPDMG"] * property[,"PROPDMGEXP"]
propertyFinished <- property %>%
select(EVTYPE, PROPDMG) %>%
group_by(EVTYPE) %>%
summarize(total = sum(PROPDMG))
propertyOrdered <- propertyFinished[order(-propertyFinished$total),]
#any missing values
sum(!complete.cases(propertyOrdered))
## [1] 0
For the crop damage we proceed simliarly as described above.
crop <- data %>% select(EVTYPE, CROPDMGEXP, CROPDMG)
unique(crop$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
crop[crop$CROPDMGEXP == "", "CROPDMGEXP"] <- 0
crop[crop$CROPDMGEXP == "?", "CROPDMGEXP"] <- 0
crop[crop$CROPDMGEXP == "0", "CROPDMGEXP"] <- 1
crop[crop$CROPDMGEXP == "2", "CROPDMGEXP"] <- 100
crop[crop$CROPDMGEXP == "k", "CROPDMGEXP"] <- 1000
crop[crop$CROPDMGEXP == "K", "CROPDMGEXP"] <- 1000
crop[crop$CROPDMGEXP == "m", "CROPDMGEXP"] <- 1000000
crop[crop$CROPDMGEXP == "M", "CROPDMGEXP"] <- 1000000
crop[crop$CROPDMGEXP == "B", "CROPDMGEXP"] <- 1000000000
crop$CROPDMGEXP <- as.numeric(crop$CROPDMGEXP)
#multiplication of the values crop damage and its multiplier CROPDMGEXP
crop$CROPDMG <- crop[,"CROPDMG"] * crop[,"CROPDMGEXP"]
cropFinished <- crop %>%
select(EVTYPE, CROPDMG) %>%
group_by(EVTYPE) %>%
summarize(total = sum(CROPDMG))
cropOrdered <- cropFinished[order(-cropFinished$total),]
sum(!complete.cases(cropOrdered))
## [1] 0
For injuries and fatalities we take two separate subsets accordingly. The amount are grouped to their respective event types and finally summarized by taking the sum of their number of injuries or fatalities.
injuries <- data %>%
select(EVTYPE, INJURIES) %>%
group_by(EVTYPE) %>%
summarize(total = sum(INJURIES))
injuriesOrdered <- injuries[order(-injuries$total),]
sum(!complete.cases(injuriesOrdered))
## [1] 0
fatalities <- data %>%
select(EVTYPE, FATALITIES) %>%
group_by(EVTYPE) %>%
summarize(total = sum(FATALITIES))
fatalitiesOrdered <- fatalities[order(-fatalities$total),]
sum(!complete.cases(fatalitiesOrdered))
## [1] 0
The four subsets consist of several different event types. In order to accomplish a more realisitc approach the 10 most devastating weather phenomenons with the highest impact in their respective fields across the U.S. are plotted and explained below.
top10injuries <- injuriesOrdered[1:10,]
top10fatalities <- fatalitiesOrdered[1:10,]
par(mfrow = c(1,2), mar = c(12,4,2,1), oma = c(0,0,2,0), cex = 0.7)
barplot(top10injuries$total, las = 3, names.arg = top10injuries$EVTYPE, col = "orange",
main = "Related To Injuries" , ylab = "Number Of Injuries Caused")
barplot(top10fatalities$total, las = 3, names.arg = top10fatalities$EVTYPE, col = "red",
main = "Related To Fatalities", ylab = "Number Of Fatalities Caused")
title(main = "THE 10 MOST DANGEROUS WEATHER PHENOMENONS", outer = TRUE)
This figure shows that tornados are incredibly dangerous weather phenomenons and cause the major amount of injuries and deaths across the event types. Tornados are difficult to control and it remains unclear if weather countermeasures could be available in the next decade that could prevent or weaken their perilous effects. Another approach consists in better informing the population on how to react in such events in order to prevent further injuries or deaths. Otherwise, if these were caused by defects or deficiencies in the construction establishing better standards are more appropriate. Finally, it is interesting to see that excessive heat accounts for quite a high number of deaths. Further investigation needs to be done, if this phenomenon is progressing the last years and what the exact causes of death were, especially if these people had further diseases or not.
top10property <- propertyOrdered[1:10,]
top10crop <- cropOrdered[1:10,]
par(mfrow=c(1,2), mar = c(12,4,2,1), oma = c(0,0,3,0), cex = 0.6)
barplot(top10property$total, las = 3, names.arg = top10property$EVTYPE, col = "brown",
main = "Property Damage", ylab = "US Dollars")
barplot(top10crop$total, las = 3, names.arg = top10crop$EVTYPE, col = "darkgreen",
main = "Crop Damage", ylab = "US Dollars")
title(main = "THE 10 MOST DEVASTATING WEATHER PHENOMENONS", outer = TRUE)
In this figure related to property and crop damage, floods have the biggest impact on the amount of property damage. Floods account for more damage than hurricanes/typhons AND tornados together. Appropriate countermeasures consist in constructing efficient dams in critical hotspots where floods happen regularly. If these were caused next to wild rivers, rectifying them would be another solution. Unfortunately, it remains unclear if these phenomenons happen frequently or if the major amount of damage is caused by rare extreme events like for example Hurricane Katrina.
From an agricultural perspective, we see that droughts are causing the biggest impact on crop damage. Droughts are progressivly becoming an issue which is due to the climate change as well as deforestation. Countermeasures are more difficult to establish as usable water becomes more and more scarce and a rise of demand by other end customers restricts the possibilites of water redistribution. One approach would consist in forestation and more efficient irrigation techniques. Weather control could be an appropriate measure to reduce the agricultural damage but it still remains highly experimental without any knowledge of long-term effects.