Severe weather events, such as storms, have significant implications for both public health and local economies. These events can lead to fatalities, injuries, and property damage, making prevention a top priority.
This project involves investigating the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks major weather events in the United States, recording when and where they occur, along with estimates of fatalities, injuries, and property damage.
The analysis will focus on identifying the main causes of public health issues, including injuries and fatalities, and determining which weather events have the greatest economic impact, particularly in terms of property and crop damage.
We will be first downloading the data set which has been compressed with the bz2 algorithm to reduce the size and reading it as a csv file to R.
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, "StormData.csv.bz2")
df <- read.csv("StormData.csv.bz2")
Viewing the head of the data frame
head(df)
Checking all columns and values in the dataframe
str(df)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
we will categorize and consolidate the different data columns related to public health, encompassing injuries and fatalities, as well as columns linked to economic impact, including property damage and crop damage. Our aim is to categorize them based on event types to discern which specific events have the most pronounced detrimental effects.
To ensure the data remains manageable and provides a comprehensive understanding of the overall pattern, we will employ the mean as a statistical metric. This choice is motivated by its ability to provide a balanced perspective on the data, preventing extreme values from unduly influencing the results while allowing us to grasp the overarching trends.
injuries_byEvent <- tapply(df$INJURIES, df$EVTYPE, FUN = mean, na.rm=TRUE)
fatalities_byEvent <- tapply(df$FATALITIES, df$EVTYPE, FUN = mean, na.rm=TRUE)
propertyDamage_byEvent <- tapply(df$PROPDMG, df$EVTYPE, FUN = mean, na.rm=TRUE)
CropDatage_byEvent <- tapply(df$CROPDMG, df$EVTYPE, FUN = mean, na.rm=TRUE)
print(names(tail(sort(injuries_byEvent), 10)))
## [1] "HURRICANE/TYPHOON" "GLAZE/ICE STORM"
## [3] "HEAT WAVE DROUGHT" "WINTER STORM HIGH WINDS"
## [5] "SNOW/HIGH WINDS" "HIGH WIND AND SEAS"
## [7] "THUNDERSTORMW" "WILD FIRES"
## [9] "TROPICAL STORM GORDON" "Heat Wave"
print(names(tail(sort(fatalities_byEvent), 10)))
## [1] "HIGH WIND AND SEAS" "WINTER STORMS"
## [3] "MARINE MISHAP" "HEAT WAVE DROUGHT"
## [5] "HIGH WIND/SEAS" "EXTREME HEAT"
## [7] "RECORD/EXCESSIVE HEAT" "TROPICAL STORM GORDON"
## [9] "COLD AND SNOW" "TORNADOES, TSTM WIND, HAIL"
print(names(tail(sort(propertyDamage_byEvent), 10)))
## [1] "HURRICANE GORDON" "SLEET/ICE STORM" "SNOW AND ICE STORM"
## [4] "SNOW/COLD" "SNOW/HEAVY SNOW" "TROPICAL STORM GORDON"
## [7] "Landslump" "HEAVY RAIN AND FLOOD" "RIVER AND STREAM FLOOD"
## [10] "COASTAL EROSION"
print(names(tail(sort(CropDatage_byEvent), 10)))
## [1] "TYPHOON" "Frost/Freeze" "EXCESSIVE WETNESS"
## [4] "WINTER STORMS" "River Flooding" "HURRICANE FELIX"
## [7] "HIGH WINDS/COLD" "DUST STORM/HIGH WINDS" "FOREST FIRES"
## [10] "TROPICAL STORM GORDON"
The analysis reveals that the top factors contributing to injuries in public health include “hurricane/typhoon,” “glaze/ice storm,” “heat wave drought,” “winter storm high winds,” “snow/high winds,” “high wind and seas,” “thunderstormw,” “wild fires,” and “tropical storm gordon.” Meanwhile, the leading causes of fatalities in public health encompass “high wind and seas,” “winter storms,” “marine mishap,” “heat wave drought,” “high wind/seas,” “extreme heat,” “record/excessive heat,” “tropical storm gordon,” “cold and snow,” and “tornadoes, tstm wind, hail.” These findings shed light on the significant impact of specific weather events on both injuries and fatalities within the realm of public health.
par(mfrow = c(1, 2), mar = c(14, 3, 4, 2))
plot(x=1:10 , y= tail(sort(injuries_byEvent),10), xlab = " ", pch=19, col="blue")
axis(1, at = 1:10, labels = names(tail(sort(injuries_byEvent),10)), las=2)
plot(x=1:10 , y= tail(sort(fatalities_byEvent),10), xlab = " ", pch=19, col="red")
axis(1, at = 1:10, labels = names(tail(sort(fatalities_byEvent),10)), las=2)
the analysis reveals that certain weather events have a substantial impact on property damage and crop damage. Notable factors contributing to property damage include “hurricane gordon,” “sleet/ice storm,” “snow and ice storm,” “snow/cold,” “snow/heavy snow,” “tropical storm gordon,” “landslump,” “heavy rain and flood,” “river and stream flood,” and “coastal erosion.” On the other hand, key contributors to crop damage encompass “typhoon,” “frost/freeze,” “excessive wetness,” “winter storms,” “river flooding,” “hurricane felix,” “high winds/cold,” “dust storm/high winds,” “forest fires,” and “tropical storm gordon.” These specific weather events significantly impact property and crop damage, underlining their economic repercussions.
par(mfrow = c(1, 2), mar = c(14, 3, 4, 2) + 0.1)
plot(x=1:10 , y= tail(sort(propertyDamage_byEvent),10), xlab = " ", pch=19, col="orange")
axis(1, at = 1:10, labels = names(tail(sort(propertyDamage_byEvent),10)), las=2)
plot(x=1:10 , y= tail(sort(CropDatage_byEvent),10), xlab = " ", pch=19, col="green")
axis(1, at = 1:10, labels = names(tail(sort(CropDatage_byEvent),10)), las=2)