This report explore the NOAA Storm Database and answer some basic questions about severe weather events. I what follows we present two basic information. The first is the human injuries caused by the weather events. Then, we discuss the economic consequences of the same events along the whole time series starting in 1950.
This section describes the loading and data processing in RStudio.
The first step is to downloa and load the data
#required libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
#Define the URL and destination of the files
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- "storm.bz2"
#Download the zip file
download.file(fileUrl, "storm.bz2", method = "curl")
#Read a bz file as data.frame
storm <- read.csv(bzfile("storm.bz2"))
sd <- storm
I will now convert the data variable to date format in rbase function
#converting date (month/day/year) and time to respective formats using lubridate
sd <- sd |>
mutate(BGN_DATE = mdy_hms(BGN_DATE))
The first topic to discuss is to see the most harmful events to the population health. To do so, I use the number of fatalities and the number of injuries as proxies to harmfulness of an event. The, the basic thing to look at is the total number of injuries and fatalities in all years.
harmful <- sd |>
group_by(EVTYPE) |>
summarize(total_harm = sum(FATALITIES+INJURIES)) |>
arrange(desc(total_harm))
print(harmful, n=20)
## # A tibble: 985 × 2
## EVTYPE total_harm
## <chr> <dbl>
## 1 TORNADO 96979
## 2 EXCESSIVE HEAT 8428
## 3 TSTM WIND 7461
## 4 FLOOD 7259
## 5 LIGHTNING 6046
## 6 HEAT 3037
## 7 FLASH FLOOD 2755
## 8 ICE STORM 2064
## 9 THUNDERSTORM WIND 1621
## 10 WINTER STORM 1527
## 11 HIGH WIND 1385
## 12 HAIL 1376
## 13 HURRICANE/TYPHOON 1339
## 14 HEAVY SNOW 1148
## 15 WILDFIRE 986
## 16 THUNDERSTORM WINDS 972
## 17 BLIZZARD 906
## 18 FOG 796
## 19 RIP CURRENT 600
## 20 WILD/FOREST FIRE 557
## # ℹ 965 more rows
We can graph the results to check how the tornado is by far the most harmful weather event in causing health consequences for the US population. For the sake of visualization we show only the 15 most important.
#Create a bar plot to make visualization easier
#selecting only the top 15 events
top_harm <- harmful |>
slice_max(total_harm, n = 15)
#creating the bar plot
plot1 <-ggplot(top_harm, aes(x = reorder(EVTYPE, total_harm), y = total_harm)) +
geom_bar(stat = "identity") +
coord_flip() + # Flip coordinates for better readability
labs(title = "Total Harm by Event Type",
x = "Event Type",
y = "Total Harm") +
theme_minimal()
print(plot1)
As we can see below, tornado is responsible for more than 60% percent of human harm.
#calculating the impact of each event in %
harmful_perc <- harmful |>
mutate(percentage = total_harm/sum(total_harm)*100) |>
mutate(percentage = scales::percent(percentage/100, accuracy = 0.1))
print(harmful_perc, n=15)
## # A tibble: 985 × 3
## EVTYPE total_harm percentage
## <chr> <dbl> <chr>
## 1 TORNADO 96979 62.3%
## 2 EXCESSIVE HEAT 8428 5.4%
## 3 TSTM WIND 7461 4.8%
## 4 FLOOD 7259 4.7%
## 5 LIGHTNING 6046 3.9%
## 6 HEAT 3037 2.0%
## 7 FLASH FLOOD 2755 1.8%
## 8 ICE STORM 2064 1.3%
## 9 THUNDERSTORM WIND 1621 1.0%
## 10 WINTER STORM 1527 1.0%
## 11 HIGH WIND 1385 0.9%
## 12 HAIL 1376 0.9%
## 13 HURRICANE/TYPHOON 1339 0.9%
## 14 HEAVY SNOW 1148 0.7%
## 15 WILDFIRE 986 0.6%
## # ℹ 970 more rows
Let’s look at fatalities and injuries separately creating a plot of top 10 by each one.
## plot graphs showing the top 10 fatalities and injuries
## Procedure = aggregate the top 10 fatalities by the event type and sort the output in descending order
fatalities <- aggregate(FATALITIES ~ EVTYPE, data = sd, FUN = sum)
Top10_Fatalities <- fatalities[order(-fatalities$FATALITIES), ][1:10, ]
Top10_Fatalities
## EVTYPE FATALITIES
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
#Reviewing events that cause the most injuries ( The Top-10 Injuries by Weather Event )
## Procedure = aggregate the top 10 injuries by the event type and sort the output in descending order
Injuries <- aggregate(INJURIES ~ EVTYPE, data = sd, FUN = sum)
Top10_Injuries <- Injuries[order(-Injuries$INJURIES), ][1:10, ]
Top10_Injuries
## EVTYPE INJURIES
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
#now the plot
par(mfrow=c(1,2),mar=c(10,2,2,3))
barplot(Top10_Fatalities$FATALITIES,names.arg=Top10_Fatalities$EVTYPE,las=2,col="sienna",ylab="fatalities",main="Top 10 fatalities")
barplot(Top10_Injuries$INJURIES,names.arg=Top10_Injuries$EVTYPE,las=2,col="sienna",ylab="injuries",main="Top 10 Injuries")
The procedures here are calculating the total economic damage couse by two different groups: crops and properties. To do so, we need to attribute diffente values to damages and crops.
# Assigning values for the property exponent Storm Data
unique(sd$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
strmdata <- storm
strmdata$PROPEXP[strmdata$PROPDMGEXP == "K"] <- 1000
strmdata$PROPEXP[strmdata$PROPDMGEXP == "M"] <- 1e+06
strmdata$PROPEXP[strmdata$PROPDMGEXP == ""] <- 1
strmdata$PROPEXP[strmdata$PROPDMGEXP == "B"] <- 1e+09
strmdata$PROPEXP[strmdata$PROPDMGEXP == "m"] <- 1e+06
strmdata$PROPEXP[strmdata$PROPDMGEXP == "0"] <- 1
strmdata$PROPEXP[strmdata$PROPDMGEXP == "5"] <- 1e+05
strmdata$PROPEXP[strmdata$PROPDMGEXP == "6"] <- 1e+06
strmdata$PROPEXP[strmdata$PROPDMGEXP == "4"] <- 10000
strmdata$PROPEXP[strmdata$PROPDMGEXP == "2"] <- 100
strmdata$PROPEXP[strmdata$PROPDMGEXP == "3"] <- 1000
strmdata$PROPEXP[strmdata$PROPDMGEXP == "h"] <- 100
strmdata$PROPEXP[strmdata$PROPDMGEXP == "7"] <- 1e+07
strmdata$PROPEXP[strmdata$PROPDMGEXP == "H"] <- 100
strmdata$PROPEXP[strmdata$PROPDMGEXP == "1"] <- 10
strmdata$PROPEXP[strmdata$PROPDMGEXP == "8"] <- 1e+08
#there are some invalid symbols
strmdata$PROPEXP[strmdata$PROPDMGEXP == "+"] <- 0
strmdata$PROPEXP[strmdata$PROPDMGEXP == "-"] <- 0
strmdata$PROPEXP[strmdata$PROPDMGEXP == "?"] <- 0
#Now I calculate the property damage value
strmdata$PROPDMGVAL <- strmdata$PROPDMG * strmdata$PROPEXP
unique(strmdata$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
# Assigning values for the crop exponent strmdata
strmdata$CROPEXP[strmdata$CROPDMGEXP == "M"] <- 1e+06
strmdata$CROPEXP[strmdata$CROPDMGEXP == "K"] <- 1000
strmdata$CROPEXP[strmdata$CROPDMGEXP == "m"] <- 1e+06
strmdata$CROPEXP[strmdata$CROPDMGEXP == "B"] <- 1e+09
strmdata$CROPEXP[strmdata$CROPDMGEXP == "0"] <- 1
strmdata$CROPEXP[strmdata$CROPDMGEXP == "k"] <- 1000
strmdata$CROPEXP[strmdata$CROPDMGEXP == "2"] <- 100
strmdata$CROPEXP[strmdata$CROPDMGEXP == ""] <- 1
# Assigning '0' to invalid exponent strmdata
strmdata$CROPEXP[strmdata$CROPDMGEXP == "?"] <- 0
# calculating the crop damage
strmdata$CROPDMGVAL <- strmdata$CROPDMG * strmdata$CROPEXP
#Procedure = aggregate the property damage by the event type and sort the output it in descending order
prop <- aggregate(PROPDMGVAL~EVTYPE,data=strmdata,FUN=sum,na.rm=TRUE)
prop <- prop[with(prop,order(-PROPDMGVAL)),]
prop <- head(prop,10)
print(prop)
## EVTYPE PROPDMGVAL
## 170 FLOOD 144657709807
## 411 HURRICANE/TYPHOON 69305840000
## 834 TORNADO 56947380616
## 670 STORM SURGE 43323536000
## 153 FLASH FLOOD 16822673978
## 244 HAIL 15735267513
## 402 HURRICANE 11868319010
## 848 TROPICAL STORM 7703890550
## 972 WINTER STORM 6688497251
## 359 HIGH WIND 5270046260
# Procedure = aggregate the crop damage by the event type and sort the output it in descending order
crop <- aggregate(CROPDMGVAL~EVTYPE,data=strmdata,FUN=sum,na.rm=TRUE)
crop <- crop[with(crop,order(-CROPDMGVAL)),]
crop <- head(crop,10)
print(crop)
## EVTYPE CROPDMGVAL
## 95 DROUGHT 13972566000
## 170 FLOOD 5661968450
## 590 RIVER FLOOD 5029459000
## 427 ICE STORM 5022113500
## 244 HAIL 3025954473
## 402 HURRICANE 2741910000
## 411 HURRICANE/TYPHOON 2607872800
## 153 FLASH FLOOD 1421317100
## 140 EXTREME COLD 1292973000
## 212 FROST/FREEZE 1094086000
In the plot below we the the final results of the top 10 property and crop damages by weather event types and its economic consequences.
#Now plotting the result
# Plot of Top 10 Property & Crop damages by Weather Event Types ( Economic Consequences )
##plot the graph showing the top 10 property and crop damages
par(mfrow=c(1,2),mar=c(11,3,3,2))
barplot(prop$PROPDMGVAL/(10^9),names.arg=prop$EVTYPE,las=2,col="blue",ylab="Prop.damage(billions)",main="Top10 Prop.Damages")
barplot(crop$CROPDMGVAL/(10^9),names.arg=crop$EVTYPE,las=2,col="blue",ylab="Crop damage(billions)",main="Top10 Crop.Damages")
We can see that floods are the main cause of economic damage, droughts cause maximum crop damage. Hurricanes is the second main responsible for damages in property while floods are in second in causing crop damage.