Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This report involves exploring the U.S. National Oceanic and Atmospheric Administration’s storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.This report will determine the events with maximum population hazard (Injuries and fatalaties) as well as the events with maximum economic hazard(property and crop damage).
From the National Oceanic and Atmospheric Administration website we obtained the data for storms starting from the year 1950 and ending in 2011.
The data is read from its csv.bz2 file and stored in the working directory as storm.csv file then read in R using the read,csv command.
#Reading the data
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "C:/Users/Mohsen/Documents/storm.csv.bz2")
storm_data<-read.csv("C:/Users/Mohsen/Documents/storm.csv.bz2")
We first activate the necessary package and viewing the first few rows and its structure.
#Activating the necessary libraries
library(tidyverse)
## -- Attaching packages ----------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1 v purrr 0.3.2
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 0.8.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts -------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
head(storm_data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
str(storm_data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
Selecting the necessary columns for our analysis
#To select only the columns associated with population health
population_health<-select(storm_data, EVTYPE , FATALITIES, INJURIES)
summary(population_health)
## EVTYPE FATALITIES INJURIES
## HAIL :288661 Min. : 0.0000 Min. : 0.0000
## TSTM WIND :219940 1st Qu.: 0.0000 1st Qu.: 0.0000
## THUNDERSTORM WIND: 82563 Median : 0.0000 Median : 0.0000
## TORNADO : 60652 Mean : 0.0168 Mean : 0.1557
## FLASH FLOOD : 54277 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## FLOOD : 25326 Max. :583.0000 Max. :1700.0000
## (Other) :170878
head(population_health)
## EVTYPE FATALITIES INJURIES
## 1 TORNADO 0 15
## 2 TORNADO 0 0
## 3 TORNADO 0 2
## 4 TORNADO 0 2
## 5 TORNADO 0 2
## 6 TORNADO 0 6
#To select only the columns associated with economic hazard
economic_hazard<-select(storm_data, EVTYPE, PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)
head(economic_hazard)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 25.0 K 0
## 2 TORNADO 2.5 K 0
## 3 TORNADO 25.0 K 0
## 4 TORNADO 2.5 K 0
## 5 TORNADO 2.5 K 0
## 6 TORNADO 2.5 K 0
We then examine the levels of EXP or expression to transform the numbers to actual numbers
#To examine the levels of PROPDMGEXP
levels(economic_hazard$PROPDMGEXP)
## [1] "" "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
#The documentation said H or h for hundreds, K for thousands, M or m for millions, and B for Billions, other symbols are negligible
economic_filtered_prop <-filter(economic_hazard, economic_hazard$PROPDMGEXP %in% c("B","h","H", "K","m", "M")) %>% select(EVTYPE, PROPDMG, PROPDMGEXP)
head(economic_filtered_prop)
## EVTYPE PROPDMG PROPDMGEXP
## 1 TORNADO 25.0 K
## 2 TORNADO 2.5 K
## 3 TORNADO 25.0 K
## 4 TORNADO 2.5 K
## 5 TORNADO 2.5 K
## 6 TORNADO 2.5 K
table(economic_filtered_prop$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5
## 0 0 0 0 0 0 0 0 0 0
## 6 7 8 B h H K m M
## 0 0 0 40 1 6 424665 7 11330
#To calculate the actual property damage
economic_filtered_prop$PROPDMGEXP<-replace(as.character(economic_filtered_prop$PROPDMGEXP),economic_filtered_prop$PROPDMGEXP=="K",1000)
economic_filtered_prop$PROPDMGEXP<-replace(as.character(economic_filtered_prop$PROPDMGEXP),economic_filtered_prop$PROPDMGEXP=="B",1000000000)
economic_filtered_prop$PROPDMGEXP<-replace(as.character(economic_filtered_prop$PROPDMGEXP),economic_filtered_prop$PROPDMGEXP=="h"|economic_filtered_prop$PROPDMGEXP=="H",100)
economic_filtered_prop$PROPDMGEXP<-replace(as.character(economic_filtered_prop$PROPDMGEXP),economic_filtered_prop$PROPDMGEXP=="m"|economic_filtered_prop$PROPDMGEXP=="M",1000000)
#To see that all variables are changed
table(economic_filtered_prop$PROPDMGEXP)
##
## 100 1000 1e+06 1e+09
## 7 424665 11337 40
str(economic_filtered_prop)
## 'data.frame': 436049 obs. of 3 variables:
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "1000" "1000" "1000" "1000" ...
#economic_filtered_prop$PROPDMGEXP is a character vector
economic_filtered_prop$PROPDMGEXP<-as.numeric(economic_filtered_prop$PROPDMGEXP)
str(economic_filtered_prop)
## 'data.frame': 436049 obs. of 3 variables:
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: num 1000 1000 1000 1000 1000 1000 1000 1000 1000 1000 ...
economic_filtered_prop$prop_damage<-economic_filtered_prop$PROPDMG*economic_filtered_prop$PROPDMGEXP
head(economic_filtered_prop)
## EVTYPE PROPDMG PROPDMGEXP prop_damage
## 1 TORNADO 25.0 1000 25000
## 2 TORNADO 2.5 1000 2500
## 3 TORNADO 25.0 1000 25000
## 4 TORNADO 2.5 1000 2500
## 5 TORNADO 2.5 1000 2500
## 6 TORNADO 2.5 1000 2500
#To examine the levels of CROPDMGEXP
levels(economic_hazard$CROPDMGEXP)
## [1] "" "?" "0" "2" "B" "k" "K" "m" "M"
#The documentation said H or h for hundreds, K for thousands, M or m for millions, and B for Billions, othersymbols are negligible
economic_filtered_crop <-filter(economic_hazard, economic_hazard$CROPDMGEXP %in% c("B","k","K","m","M")) %>% select(EVTYPE, CROPDMG, CROPDMGEXP)
head(economic_filtered_crop)
## EVTYPE CROPDMG CROPDMGEXP
## 1 HURRICANE OPAL/HIGH WINDS 10 M
## 2 THUNDERSTORM WINDS 500 K
## 3 HURRICANE ERIN 1 M
## 4 HURRICANE OPAL 4 M
## 5 HURRICANE OPAL 10 m
## 6 THUNDERSTORM WINDS 50 K
#Replace all expressions with their values
economic_filtered_crop$CROPDMGEXP<-replace(as.character(economic_filtered_crop$CROPDMGEXP),economic_filtered_crop$CROPDMGEXP=="K" | economic_filtered_crop$CROPDMGEXP=="k",1000)
table(economic_filtered_crop$CROPDMGEXP)
##
## 1000 B m M
## 281853 9 1 1994
economic_filtered_crop$CROPDMGEXP<-replace(as.character(economic_filtered_crop$CROPDMGEXP),economic_filtered_crop$CROPDMGEXP=="B",1000000000)
table(economic_filtered_crop$CROPDMGEXP)
##
## 1000 1e+09 m M
## 281853 9 1 1994
economic_filtered_crop$CROPDMGEXP<-replace(as.character(economic_filtered_crop$CROPDMGEXP),economic_filtered_crop$CROPDMGEXP=="m"|economic_filtered_crop$CROPDMGEXP=="M",1000000)
table(economic_filtered_crop$CROPDMGEXP)
##
## 1000 1e+06 1e+09
## 281853 1995 9
#Convert economic_filtered_crop$CROPDMGEXP to numeric vector
economic_filtered_crop$CROPDMGEXP<-as.numeric(economic_filtered_crop$CROPDMGEXP)
str(economic_filtered_crop)
## 'data.frame': 283857 obs. of 3 variables:
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 410 786 406 409 409 786 786 834 834 812 ...
## $ CROPDMG : num 10 500 1 4 10 50 50 5 50 15 ...
## $ CROPDMGEXP: num 1e+06 1e+03 1e+06 1e+06 1e+06 1e+03 1e+03 1e+03 1e+03 1e+03 ...
#To obtain the actual value of crop damage
economic_filtered_crop$crop_damage<-economic_filtered_crop$CROPDMG*economic_filtered_crop$CROPDMGEXP
head(economic_filtered_crop)
## EVTYPE CROPDMG CROPDMGEXP crop_damage
## 1 HURRICANE OPAL/HIGH WINDS 10 1e+06 1e+07
## 2 THUNDERSTORM WINDS 500 1e+03 5e+05
## 3 HURRICANE ERIN 1 1e+06 1e+06
## 4 HURRICANE OPAL 4 1e+06 4e+06
## 5 HURRICANE OPAL 10 1e+06 1e+07
## 6 THUNDERSTORM WINDS 50 1e+03 5e+04
We view the worst events for injuries and fatalities and then plotting the data
#To view the top 10 worst events regarding to fatalities
population_health %>% group_by(EVTYPE) %>% summarise (max_fatality=max(FATALITIES)) %>% arrange(desc(max_fatality)) %>% head(10)
## # A tibble: 10 x 2
## EVTYPE max_fatality
## <fct> <dbl>
## 1 HEAT 583
## 2 TORNADO 158
## 3 EXCESSIVE HEAT 99
## 4 EXTREME HEAT 57
## 5 HEAT WAVE 33
## 6 TSUNAMI 32
## 7 UNSEASONABLY WARM AND DRY 29
## 8 TORNADOES, TSTM WIND, HAIL 25
## 9 TROPICAL STORM 22
## 10 FLASH FLOOD 20
#To view the top 10 worst events regarding to injuiries
population_health %>% group_by(EVTYPE) %>% summarise (max_injury=max(INJURIES)) %>% arrange(desc(max_injury)) %>% head(10)
## # A tibble: 10 x 2
## EVTYPE max_injury
## <fct> <dbl>
## 1 TORNADO 1700
## 2 ICE STORM 1568
## 3 FLOOD 800
## 4 HURRICANE/TYPHOON 780
## 5 EXCESSIVE HEAT 519
## 6 BLIZZARD 385
## 7 HEAT 230
## 8 HEAT WAVE 200
## 9 TROPICAL STORM 200
## 10 HEAVY SNOW 185
#To view the top 10 events regarding injuries and fatalities
maximum_hazard<-population_health %>% group_by(EVTYPE) %>% summarise (max_fatality=max(FATALITIES), max_injury=max(INJURIES)) %>% arrange(desc(max_fatality,max_injury)) %>% head(10)
# To plot the data
data<-gather(maximum_hazard,type,value,-EVTYPE)
ggplot(data, aes(EVTYPE, value)) + geom_bar(aes(fill = type), stat = "identity")+theme(axis.text.x = element_text(angle = 45,hjust = 1))+facet_grid(type~.)+scale_y_log10()+xlab("Event type")+ylab("Health damage value")
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 2 rows containing missing values (geom_bar).
so the worst event for fatality is Heat amd the worst for injuries is Tornado.
We view the worst events for property and crop damage and then plotting the data
#To determine the top events with maximum property damage
max_prop_damage<-economic_filtered_prop %>% group_by(EVTYPE) %>% summarise(max_propdamage=max(prop_damage)) %>% arrange(desc(max_propdamage)) %>% head(10)
max_prop_damage
## # A tibble: 10 x 2
## EVTYPE max_propdamage
## <fct> <dbl>
## 1 FLOOD 115000000000
## 2 STORM SURGE 31300000000
## 3 HURRICANE/TYPHOON 16930000000
## 4 TROPICAL STORM 5150000000
## 5 RIVER FLOOD 5000000000
## 6 WINTER STORM 5000000000
## 7 STORM SURGE/TIDE 4000000000
## 8 HURRICANE 3000000000
## 9 TORNADO 2800000000
## 10 HEAVY RAIN/SEVERE WEATHER 2500000000
#To determine the top events with maximum crop damage
max_crop_damage<-economic_filtered_crop %>% group_by(EVTYPE) %>% summarise(max_cropdamage=max(crop_damage)) %>% arrange(desc(max_cropdamage)) %>% head(10)
max_crop_damage
## # A tibble: 10 x 2
## EVTYPE max_cropdamage
## <fct> <dbl>
## 1 ICE STORM 5000000000
## 2 RIVER FLOOD 5000000000
## 3 HURRICANE/TYPHOON 1510000000
## 4 DROUGHT 1000000000
## 5 EXTREME COLD 596000000
## 6 FLOOD 500000000
## 7 HURRICANE 500000000
## 8 EXCESSIVE HEAT 492400000
## 9 HEAT 400000000
## 10 FROST/FREEZE 286000000
max_eco_damage<- full_join(max_prop_damage,max_crop_damage,by="EVTYPE")
max_eco_damage
## # A tibble: 16 x 3
## EVTYPE max_propdamage max_cropdamage
## <fct> <dbl> <dbl>
## 1 FLOOD 115000000000 500000000
## 2 STORM SURGE 31300000000 NA
## 3 HURRICANE/TYPHOON 16930000000 1510000000
## 4 TROPICAL STORM 5150000000 NA
## 5 RIVER FLOOD 5000000000 5000000000
## 6 WINTER STORM 5000000000 NA
## 7 STORM SURGE/TIDE 4000000000 NA
## 8 HURRICANE 3000000000 500000000
## 9 TORNADO 2800000000 NA
## 10 HEAVY RAIN/SEVERE WEATHER 2500000000 NA
## 11 ICE STORM NA 5000000000
## 12 DROUGHT NA 1000000000
## 13 EXTREME COLD NA 596000000
## 14 EXCESSIVE HEAT NA 492400000
## 15 HEAT NA 400000000
## 16 FROST/FREEZE NA 286000000
#To plot the results
eco_data<-gather(max_eco_damage,type,value,-EVTYPE)
ggplot(eco_data, aes(EVTYPE, value)) + geom_bar(aes(fill = type), stat = "identity", position="dodge") +theme(axis.text.x = element_text(angle = 45,hjust = 1))+facet_grid(type~.)+scale_y_log10()+xlab("Event type")+ylab("Economic damage value")
## Warning: Removed 12 rows containing missing values (geom_bar).
so Ice storm is the worst cause of crop damage and Flood is the worst cause of property damage.