This paper analyzes the effects of storms and various weather events on health and the economy in communities.
To study this we have the NOAA (U.S. National Oceanic and Atmospheric Administration’s) storm database that contains data of when and where the phenomena occur , as well as the estimate of deaths, people injured and urban destruction.
We will study the types of events (variable EVTYPE) that are most harmful to the health of the population and which types of events have the most serious economic consequences
I load the stormData.csv.bz2 file from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
library(knitr)
library(ggplot2)
library(tidyverse)
## -- Attaching packages ------------------------------------------------ tidyverse 1.3.0 --
## v tibble 3.0.0 v dplyr 1.0.1
## v tidyr 1.0.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## v purrr 0.3.4
## -- Conflicts --------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
if(!file.exists("stormData.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = "stormData.csv.bz2", method = "curl")
}
# Loading data
data <- read.csv(bzfile("stormData.csv.bz2"), sep=",", header=T)
#Summary of data
str(data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
The fields necessary to perform the study are obtained from the data which are:
BGNDATE: Begin date
EVTYPE: Type of event ej. tornado, storm, flood,etc
FATALITIES: number of fatalities
INJURIES: number of injuries
PROPDMG: Property damage in dollars
PROPDMGEXP: K=thousands, M=millions, B=billions
CROPDMG: Crop damage in dollars
CROPDMGEXP: H=hundreds, K=thousands, M=millions, B=billions
I proceed to calculate total values according to variables PROPDMG and PROPDMGEXP in new variable PROPDMGTOTAL
data <- data %>% mutate (
PROPDMGTOTAL = case_when(
PROPDMGEXP=='H' ~ PROPDMG * 10^2 ,
PROPDMGEXP=='K' ~ PROPDMG * 10^3 ,
PROPDMGEXP=='M' ~ PROPDMG * 10^6 ,
PROPDMGEXP=='B' ~ PROPDMG * 10^9 ,
TRUE ~ 0
)
)
I proceed to calculate total values according to variables CROPDMG and CROPDMGEXP in new variable CROPDMGTOTAL and I look at the new columns created
data <- data %>% mutate (
CROPDMGTOTAL = case_when(
CROPDMGEXP=='H' ~ CROPDMG * 10^2 ,
CROPDMGEXP=='K' ~ CROPDMG * 10^3 ,
CROPDMGEXP=='M' ~ CROPDMG * 10^6 ,
CROPDMGEXP=='B' ~ CROPDMG * 10^9 ,
TRUE ~ 0
)
)
str(data)
## 'data.frame': 902297 obs. of 39 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME : chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI : chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END : num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN : logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI : chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES : num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP : chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP : chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC : chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E : num 3051 0 0 0 0 ...
## $ LONGITUDE_ : num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
## $ PROPDMGTOTAL: num 25000 2500 25000 2500 2500 2500 2500 2500 25000 25000 ...
## $ CROPDMGTOTAL: num 0 0 0 0 0 0 0 0 0 0 ...
head(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM PROPDMGTOTAL CROPDMGTOTAL
## 1 3051 8806 1 25000 0
## 2 0 0 2 2500 0
## 3 0 0 3 25000 0
## 4 0 0 4 2500 0
## 5 0 0 5 2500 0
## 6 0 0 6 2500 0
I count total Fatalities by event type and graph on a histogram
totalFatalities <- aggregate(data$FATALITIES, by = list(data$EVTYPE), "sum")
names(totalFatalities) <- c("Event", "Fatalities")
totalFatalitiesSorted <- totalFatalities[order(-totalFatalities$Fatalities), ][1:20, ]
totalFatalitiesSorted
## Event Fatalities
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
## 972 WINTER STORM 206
## 586 RIP CURRENTS 204
## 278 HEAT WAVE 172
## 140 EXTREME COLD 160
## 760 THUNDERSTORM WIND 133
## 310 HEAVY SNOW 127
## 141 EXTREME COLD/WIND CHILL 125
## 676 STRONG WIND 103
## 30 BLIZZARD 101
## 350 HIGH SURF 101
I count total Injuries by event type and graph on a histogram
totalInjuries <- aggregate(data$INJURIES, by = list(data$EVTYPE), "sum")
names(totalInjuries) <- c("Event", "Injuries")
totalInjuriesSorted <- totalInjuries[order(-totalInjuries$Injuries), ][1:20, ]
totalInjuriesSorted
## Event Injuries
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
## 972 WINTER STORM 1321
## 411 HURRICANE/TYPHOON 1275
## 359 HIGH WIND 1137
## 310 HEAVY SNOW 1021
## 957 WILDFIRE 911
## 786 THUNDERSTORM WINDS 908
## 30 BLIZZARD 805
## 188 FOG 734
## 955 WILD/FOREST FIRE 545
## 117 DUST STORM 440
par(mfrow = c(1, 2), mar = c(10, 4, 2, 2), las = 3, cex = 0.7, cex.main = 1.4, cex.lab = 1.2)
barplot(totalFatalitiesSorted$Fatalities, names.arg = totalFatalitiesSorted$Event, col = 'blue',
main = 'Top 20 Weather Events for Fatalities', ylab = 'Number of Fatalities')
barplot(totalInjuriesSorted$Injuries, names.arg = totalInjuriesSorted$Event, col = 'orange',
main = 'Top 20 Weather Events for Injuries', ylab = 'Number of Injuries')
It is observed tornadoes, excess heat, floods , avalanches, cause more deaths and injuries, there are more than 5.000 deaths and more han 8.000 injuries in the last 60 years in US, due to tornadoes.
I calculate the cost of property and crop damage separately using the variables created PROPDMGTOTAL and CROPDMGTOTAL
totalProperty <- aggregate(data$PROPDMGTOTAL/1000000, by = list(data$EVTYPE), "sum")
names(totalProperty) <- c("Event", "Property")
totalPropertySorted <- totalProperty[order(-totalProperty$Property), ][1:20, ]
totalPropertySorted
## Event Property
## 170 FLOOD 144657.710
## 411 HURRICANE/TYPHOON 69305.840
## 834 TORNADO 56925.660
## 670 STORM SURGE 43323.536
## 153 FLASH FLOOD 16140.812
## 244 HAIL 15727.367
## 402 HURRICANE 11868.319
## 848 TROPICAL STORM 7703.891
## 972 WINTER STORM 6688.497
## 359 HIGH WIND 5270.046
## 590 RIVER FLOOD 5118.945
## 957 WILDFIRE 4765.114
## 671 STORM SURGE/TIDE 4641.188
## 856 TSTM WIND 4484.928
## 427 ICE STORM 3944.928
## 760 THUNDERSTORM WIND 3483.121
## 409 HURRICANE OPAL 3152.846
## 955 WILD/FOREST FIRE 3001.829
## 298 HEAVY RAIN/SEVERE WEATHER 2500.000
## 786 THUNDERSTORM WINDS 1733.455
totalCrop <- aggregate(data$CROPDMGTOTAL/1000000, by = list(data$EVTYPE), "sum")
names(totalCrop) <- c("Event", "Crop")
totalCropSorted <- totalCrop[order(-totalCrop$Crop), ][1:20, ]
totalCropSorted
## Event Crop
## 95 DROUGHT 13972.5660
## 170 FLOOD 5661.9685
## 590 RIVER FLOOD 5029.4590
## 427 ICE STORM 5022.1135
## 244 HAIL 3025.5374
## 402 HURRICANE 2741.9100
## 411 HURRICANE/TYPHOON 2607.8728
## 153 FLASH FLOOD 1421.3171
## 140 EXTREME COLD 1292.9730
## 212 FROST/FREEZE 1094.0860
## 290 HEAVY RAIN 733.3998
## 848 TROPICAL STORM 678.3460
## 359 HIGH WIND 638.5713
## 856 TSTM WIND 554.0073
## 130 EXCESSIVE HEAT 492.4020
## 192 FREEZE 446.2250
## 834 TORNADO 414.9531
## 760 THUNDERSTORM WIND 414.8431
## 275 HEAT 401.4615
## 957 WILDFIRE 295.4728
We visualize property damage costs and crop damage costs in the following graphs
par(mfrow = c(1, 2), mar = c(10, 4, 2, 2), las = 3, cex = 0.7, cex.main = 1.4, cex.lab = 1.2)
barplot(totalPropertySorted$Property, names.arg = totalPropertySorted$Event , col = 'Blue',
main = 'Property Damage in Million US dollars from 1995 - 2011 ', ylab = 'Cost of Property Damage', ylim = c(0, 150000))
barplot(totalCropSorted$Crop, names.arg = totalCropSorted$Event, col = 'Orange',
main = 'Crop Damage in million US dollars', ylab = 'Cost of Crop Damage', ylim = c(0, 20000))
Based on histograms we observed that floods, typhoons and hurricanes and tornadoes cause higher costs for property damage above DOL 56 billion, and we observed that floods, overflowing rivers, and seguias extremely affect crops over valores 5 billion