This report aims to show different results about the influence of storms and other weather events into the economy of communities and municipalities. There are two aspects that are considered in the research, economy and public health. For each factor it is stablished the damage done.
Some steps require functions that are present in certain R packages that need to be loaded in the project.
library(R.utils)
library(dplyr)
library(plyr)
library(ggplot2)
In this section it is important to notice about the work environment:
In the same directory in which the .Rmd exists, there is a folder called data.
In the data folder it’s located the csv file required for the analysis of this project.
fileDirectory <- dirname(rstudioapi::getSourceEditorContext()$path)
dataDirectory <- paste(fileDirectory, "/", "data", sep = "")
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = paste(dataDirectory, "/repdata_data_StormData.csv.bz2", sep = ""))
bunzip2(paste(dataDirectory, "/repdata_data_StormData.csv.bz2", sep = ""), overwrite = TRUE, remove = FALSE)
storms <- read.csv(paste(dataDirectory, "/repdata_data_StormData.csv", sep = ""))
As for the loaded data, it can be reviewed for some of its content.
head(storms)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
As it can be seen, the structure of the dataset can be appreciated within the first six rows. Eventhough there isn’t enough information for all the variables presented. It can be an overview as how the data is formatted for some columns and the relation within each one of them.
dim(storms)
## [1] 902297 37
As per the dimensions of the dataset, it is known that there are 902297 rows or data entries, and there are 37 columns or variables.
summary(storms)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## Min. : 1.0 Length:902297 Length:902297 Length:902297
## 1st Qu.:19.0 Class :character Class :character Class :character
## Median :30.0 Mode :character Mode :character Mode :character
## Mean :31.2
## 3rd Qu.:45.0
## Max. :95.0
##
## COUNTY COUNTYNAME STATE EVTYPE
## Min. : 0.0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 31.0 Class :character Class :character Class :character
## Median : 75.0 Mode :character Mode :character Mode :character
## Mean :100.6
## 3rd Qu.:131.0
## Max. :873.0
##
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE
## Min. : 0.000 Length:902297 Length:902297 Length:902297
## 1st Qu.: 0.000 Class :character Class :character Class :character
## Median : 0.000 Mode :character Mode :character Mode :character
## Mean : 1.484
## 3rd Qu.: 1.000
## Max. :3749.000
##
## END_TIME COUNTY_END COUNTYENDN END_RANGE
## Length:902297 Min. :0 Mode:logical Min. : 0.0000
## Class :character 1st Qu.:0 NA's:902297 1st Qu.: 0.0000
## Mode :character Median :0 Median : 0.0000
## Mean :0 Mean : 0.9862
## 3rd Qu.:0 3rd Qu.: 0.0000
## Max. :0 Max. :925.0000
##
## END_AZI END_LOCATI LENGTH WIDTH
## Length:902297 Length:902297 Min. : 0.0000 Min. : 0.000
## Class :character Class :character 1st Qu.: 0.0000 1st Qu.: 0.000
## Mode :character Mode :character Median : 0.0000 Median : 0.000
## Mean : 0.2301 Mean : 7.503
## 3rd Qu.: 0.0000 3rd Qu.: 0.000
## Max. :2315.0000 Max. :4400.000
##
## F MAG FATALITIES INJURIES
## Min. :0.0 Min. : 0.0 Min. : 0.0000 Min. : 0.0000
## 1st Qu.:0.0 1st Qu.: 0.0 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median :1.0 Median : 50.0 Median : 0.0000 Median : 0.0000
## Mean :0.9 Mean : 46.9 Mean : 0.0168 Mean : 0.1557
## 3rd Qu.:1.0 3rd Qu.: 75.0 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :5.0 Max. :22000.0 Max. :583.0000 Max. :1700.0000
## NA's :843563
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0.00 Length:902297 Min. : 0.000 Length:902297
## 1st Qu.: 0.00 Class :character 1st Qu.: 0.000 Class :character
## Median : 0.00 Mode :character Median : 0.000 Mode :character
## Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :5000.00 Max. :990.000
##
## WFO STATEOFFIC ZONENAMES LATITUDE
## Length:902297 Length:902297 Length:902297 Min. : 0
## Class :character Class :character Class :character 1st Qu.:2802
## Mode :character Mode :character Mode :character Median :3540
## Mean :2875
## 3rd Qu.:4019
## Max. :9706
## NA's :47
## LONGITUDE LATITUDE_E LONGITUDE_ REMARKS
## Min. :-14451 Min. : 0 Min. :-14455 Length:902297
## 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0 Class :character
## Median : 8707 Median : 0 Median : 0 Mode :character
## Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. : 17124 Max. :9706 Max. :106220
## NA's :40
## REFNUM
## Min. : 1
## 1st Qu.:225575
## Median :451149
## Mean :451149
## 3rd Qu.:676723
## Max. :902297
##
As per the description of each variable, it can be summarized by the aspect of type, in which 18 of them are character, 18 numeric and 1 is logical. There is also a specific aspect that is advisable to take into account, the NAs data.
The dataset of storms and other weather events is required to answer to specific questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
For this type of study it is necessary to get some of the variables from the entire dataset to get a direct analysis. The columns needed to achieve answers for the principal questions are (in order of appearance):
EVTYPE
FATALITIES
INJURIES
PROPDMG
PROPDMGEXP
CROPDMG
CROPDMGEXP
The weather events have different association with a principal event:
summary(unique(storms$EVTYPE))
## Length Class Mode
## 985 character character
As shown in the summary table, there are 985 weather events in which they may influence about the public health risk. Since the data is excessive, the focus about the information provided will be centered in the top 10 weather events.
One of the questions involves the security related to the weather events. In this aspect there are two variables that considers damage related to population:
FATALITIES
INJURIES
To get a more visible ratio of population damage, it can be calculated the total of public health risk by weather:
totalFatalities <- aggregate(storms$FATALITIES, by = list(Events = storms$EVTYPE), FUN = sum)
top10TotalFatalities <- head(arrange(totalFatalities, desc(x)), n = 10)
top10TotalFatalities
## Events x
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
As it can be seen, tornadoes and heat are the two mayor events that risk people lives with more than a thousand fatalities all over the data recollection.
totalInjuries <- aggregate(storms$INJURIES, by = list(Events = storms$EVTYPE), FUN = sum)
top10TotalInjuries <- head(arrange(totalInjuries, desc(x)), n = 10)
top10TotalInjuries
## Events x
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
In the other hand, there are some differences about the top 10 weather events that causes more injuries in population. But tornadoes keep representing the one event with more harmful consequences.
In this aspect, tornadoes are the weather events that threatens the most to population en public health, following by excessive heat. This two factors, even though they are opposite from each other taking into account the type of weather required for them to succeed, are a point of interest to keep track of anomalies in their respective season of high probability of occurrence.
For the consideration of the economy, there are some variables that counts damage done property as well as crops. There are also other annotations that are crucial for the quantities of monetary damage generated, these are located in the EXP columns. The referred exponentials are:
H,h = hundreds = 100
K,k = kilos = thousands = 1,000
M,m = millions = 1,000,000
B,b = billions = 1,000,000,000
(+) = 1
(-) = 0
(?) = 0
black/empty character = 0
numeric 0..8 = 10
As a matter of more detailed numbers according to the exponential, it is necessary to get the corresponding value of the damaged done to properties and crops.
storms$PROPDMGEXP <- mapvalues(storms$PROPDMGEXP, from = c("H", "h", "K", "k", "M", "m", "B", "b", "+", "-", "?", "", "0", "1", "2", "3", "4", "5", "6", "7", "8"), to = c(10^2, 10^2, 10^3, 10^3, 10^6, 10^6, 10^9, 10^9, 1, 0, 0, 0, 10, 10, 10, 10, 10, 10, 10, 10, 10))
## The following `from` values were not present in `x`: k, b
storms$PROPDMGEXP <- as.numeric(as.character(storms$PROPDMGEXP))
storms$PROPTOTALDMG <- (storms$PROPDMG * storms$PROPDMGEXP) / 1000000000
totalPopertyDamage <- aggregate(storms$PROPTOTALDMG, by = list(Events = storms$EVTYPE), FUN = sum)
top10TotalPropertyDamage <- head(arrange(totalPopertyDamage, desc(x)), n = 10)
top10TotalPropertyDamage
## Events x
## 1 FLOOD 144.657710
## 2 HURRICANE/TYPHOON 69.305840
## 3 TORNADO 56.937163
## 4 STORM SURGE 43.323536
## 5 FLASH FLOOD 16.140815
## 6 HAIL 15.732270
## 7 HURRICANE 11.868319
## 8 TROPICAL STORM 7.703891
## 9 WINTER STORM 6.688497
## 10 HIGH WIND 5.270046
As the table of the top 10 weather events related to property damage, flood is the one which takes some consideration in monetary loss from structures.
storms$CROPDMGEXP <- mapvalues(storms$CROPDMGEXP, from = c("H", "h", "K", "k", "M", "m", "B", "b", "+", "-", "?", "", "0", "1", "2", "3", "4", "5", "6", "7", "8"), to = c(10^2, 10^2, 10^3, 10^3, 10^6, 10^6, 10^9, 10^9, 1, 0, 0, 0, 10, 10, 10, 10, 10, 10, 10, 10, 10))
## The following `from` values were not present in `x`: H, h, b, +, -, 1, 3, 4, 5, 6, 7, 8
storms$CROPDMGEXP <- as.numeric(as.character(storms$CROPDMGEXP))
storms$CROPTOTALDMG <- (storms$CROPDMG * storms$CROPDMGEXP) / 1000000000
totalCropDamage <- aggregate(storms$CROPTOTALDMG, by = list(Events = storms$EVTYPE), FUN = sum)
top10TotalCropDamage <- head(arrange(totalCropDamage, desc(x)), n = 10)
top10TotalCropDamage
## Events x
## 1 DROUGHT 13.972566
## 2 FLOOD 5.661968
## 3 RIVER FLOOD 5.029459
## 4 ICE STORM 5.022113
## 5 HAIL 3.025955
## 6 HURRICANE 2.741910
## 7 HURRICANE/TYPHOON 2.607873
## 8 FLASH FLOOD 1.421317
## 9 EXTREME COLD 1.292973
## 10 FROST/FREEZE 1.094086
As a matter of logical thinking, crops would be susceptible to drought, some of the opposite to properties. These two elements (crops and properties) are different in physical aspects, meaning that there are different events that can affect them more than others. But flood is still one of the most damaging event that can be conceptualize in negative impact as a economic factor.
To get a more visible perception about the elements within the factors of analysis, the data can be plot to differentiate the magnitude of damaged caused.
fatalityPlot <- ggplot(top10TotalFatalities, aes(x = Events, y = x)) + geom_bar(stat='identity') + theme(axis.text.x = element_text(size = 6)) + xlab("Weather events") + ylab("Total fatalities") + labs(title = "Fatalities of weather events")
fatalityPlot
injuryPlot <- ggplot(top10TotalInjuries, aes(x = Events, y = x)) + geom_bar(stat='identity') + theme(axis.text.x = element_text(size = 4)) + xlab("Weather events") + ylab("Total injuries") + labs(title = "Injuries of weather events")
injuryPlot
As shown above, tornadoes are one of the most devastating weather events for population. Certain securities strategies need to be planned when these specific factor occurs (this doesn’t mean that other natural disaster has to be looked down as a minor one).
propertyPlot <- ggplot(top10TotalPropertyDamage, aes(x = Events, y = x)) + geom_bar(stat = 'identity') + theme(axis.text.x = element_text(size = 4)) + xlab("Weather events") + ylab("Monetary property damage (Billions)") + labs(title = "Monetary property damage by weather events")
propertyPlot
cropPlot <- ggplot(top10TotalCropDamage, aes(x = Events, y = x)) + geom_bar(stat = 'identity') + theme(axis.text.x = element_text(size = 4)) + xlab("Weather events") + ylab("Monetary crop damage (Billions)") + labs(title = "Monetary crop damage by weather events")
cropPlot
In the economy loss related to weather events, flood and drought are the two most dangerous natural disasters. The first consist in water damage to different infrastructures and the other is the death of crops with poor weather.