L.M. 15-10-2024
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
In particularly, we are going to address which types of events are most harmful with respect to population health and which types of events have the greatest economic consequences across the United States.
data <- read.csv("repdata_data_StormData.csv", header =T)
head(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
We identify the names of the column to be able to select the relevant ones with the dplyr package for further analysis
colnames(data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
sub_data <- data%>% select (EVTYPE, FATALITIES, INJURIES, contains("DMG"))
head(sub_data)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
We calculate the sum of the fatalities and injuries caused by each event and the total sum. We then arrange the data based on the most harmful events for the population health. We finally select the top 10.
sub_data2 <- sub_data %>% group_by(EVTYPE) %>% summarize(Tot_fat= sum(FATALITIES), Tot_inj= sum(INJURIES), TOT= sum(FATALITIES, INJURIES)) %>% filter(Tot_fat > 0 |Tot_inj > 0 | TOT> 0 ) %>% arrange(desc(TOT))
head(sub_data2)
## # A tibble: 6 × 4
## EVTYPE Tot_fat Tot_inj TOT
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
sub_data2 <- sub_data2[, c("EVTYPE", "TOT", "Tot_inj", "Tot_fat")]
head(sub_data2)
## # A tibble: 6 × 4
## EVTYPE TOT Tot_inj Tot_fat
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 96979 91346 5633
## 2 EXCESSIVE HEAT 8428 6525 1903
## 3 TSTM WIND 7461 6957 504
## 4 FLOOD 7259 6789 470
## 5 LIGHTNING 6046 5230 816
## 6 HEAT 3037 2100 937
TOP10_ph <- sub_data2[1:10, ]
TOP10_ph
## # A tibble: 10 × 4
## EVTYPE TOT Tot_inj Tot_fat
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 96979 91346 5633
## 2 EXCESSIVE HEAT 8428 6525 1903
## 3 TSTM WIND 7461 6957 504
## 4 FLOOD 7259 6789 470
## 5 LIGHTNING 6046 5230 816
## 6 HEAT 3037 2100 937
## 7 FLASH FLOOD 2755 1777 978
## 8 ICE STORM 2064 1975 89
## 9 THUNDERSTORM WIND 1621 1488 133
## 10 WINTER STORM 1527 1321 206
First we use the package “reshape2” to assign the columns to variables. Then, we plot the ten most harmful events for the population health. The barplot is created with ggplot2 package and includes the total fatalities, the total injuries and the sum of those (TOT)
library(ggplot2)
library(RColorBrewer)
library(reshape2)
public_health<- melt(TOP10_ph, id.vars = "EVTYPE")
BP <- ggplot(public_health, aes(reorder(EVTYPE, -value), value, fill = variable))
BP + geom_bar(stat = "identity", position = "dodge") + labs(x = "Event Type", y = "Harmful Event Counts", title = "Top 10 Harmful Events for Population Health") + scale_fill_brewer(palette = "Accent") + theme_light() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
We analyse the total property damage and total crop damage separately. We calculate the total damage and then we merge the results by event type. We rearrange the columns and select the TOP 10.
PROP.STORM_1 <- sub_data%>% select(EVTYPE, starts_with("PROP")) %>% group_by(EVTYPE, PROPDMGEXP) %>% summarize(DAMAGE=sum(PROPDMG))
## `summarise()` has grouped output by 'EVTYPE'. You can override using the
## `.groups` argument.
PROP.STORM_2 <- PROP.STORM_1 %>% mutate(PROP_DAMAGE= ifelse(PROPDMGEXP=="K", DAMAGE*(10^3), ifelse(PROPDMGEXP=="M", DAMAGE*(10^6), ifelse(PROPDMGEXP=="B", DAMAGE*(10^9), DAMAGE))))
PROP_STORM <- summarise(PROP.STORM_2, TOT_PROP_DMG= sum(PROP_DAMAGE))
CROP.STORM_1 <- sub_data%>% select(EVTYPE, starts_with("CROP")) %>% group_by(EVTYPE, CROPDMGEXP) %>% summarize(DAMAGE=sum(CROPDMG))
## `summarise()` has grouped output by 'EVTYPE'. You can override using the
## `.groups` argument.
CROP.STORM_2 <- CROP.STORM_1 %>% mutate(CROP_DAMAGE= ifelse(CROPDMGEXP=="K", DAMAGE*(10^3), ifelse(CROPDMGEXP=="M", DAMAGE*(10^6), ifelse(CROPDMGEXP=="B", DAMAGE*(10^9), DAMAGE))))
CROP_STORM <- summarise(CROP.STORM_2, TOT_CROP_DMG= sum(CROP_DAMAGE))
Eco_damage <- merge(PROP_STORM, CROP_STORM, by= "EVTYPE")
Eco_damage_TOT <- Eco_damage %>% group_by(EVTYPE) %>% summarize(TOT_ECO= sum(TOT_PROP_DMG, TOT_CROP_DMG))
Economic_Cons <- merge(Eco_damage, Eco_damage_TOT, by= "EVTYPE")
Economic_Cons <- Economic_Cons[, c("EVTYPE", "TOT_ECO", "TOT_PROP_DMG", "TOT_CROP_DMG")]
Economic_Cons <- arrange(Economic_Cons, desc(TOT_ECO))
TOP10_ec <- Economic_Cons[1:10, ]
First we use the package “reshape2” to assign the columns to variables. Then, we plot the ten events that have the greatest economic consequences. The barplot is created with ggplot2 package and includes the total property damages (TOT_PROP_DMG), the total crop damages (TOT_CROP_DMG) and the sum of those (TOT_ECO)
library(ggplot2)
library(RColorBrewer)
library(reshape2)
economic_consequences<- melt(TOP10_ec, id.vars = "EVTYPE")
EC <- ggplot(economic_consequences, aes(reorder(EVTYPE, -value), value, fill = variable))
EC + geom_bar(stat = "identity", position = "dodge") + labs(x = "Event Type", y = "Event Counts", title = "Top 10 Events for Economic Consequences") + scale_fill_brewer(palette = "Set2") + theme_light() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
From the analysis above, Tornado results as the most harmful event for population health, while Flood results as the worst event in terms of economic consequences.