The aim of this project is to present which weather events caused the highest economic damage and harm to population health in United States. For that, we based on Storm Data, from National Weather Service. The events in the database start in the year 1950 and end in November 2011. In this project, we present the top 10 most harmful weather conditions. Between them, two stands out, with damages both to the economy and population health: Tropical Storm Gordon; and High Wind and Seas.
Data was processed from the website:
# assign an URL to R object
URL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
The download date was:
DateDownload <- date()
DateDownload
## [1] "Sun Jan 24 22:46:19 2021"
So, first we are going to download data to our computer:
data <- download.file(URL, destfile = "./data.bz2")
And read data into our R workspace:
data <- read.csv("data.bz2")
To run the analysis, we need to have loaded these two packages: dplyr; and ggplot2.
packages <- c("dplyr","ggplot2")
So, we need to install them, if they have not been installed yet:
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
install.packages(packages[!installed_packages])
}
And, finally, we must load them:
invisible(lapply(packages, library, character.only = TRUE))
We can start understanding this dataset when we visualize its strucure:
str(data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
National Weather Service - Storm Data Documentation says that there are 48 types of severe weather events (variable Event Type): Astronomical Low Tide; Avalanche; Blizzard; Coastal Flood; Cold/Wind Chill; Debris Flow; Dense Fog; Dense Smoke; Drought; Dust Devil; Dust Storm; Excessive Heat; Extreme Cold/Wind Chill; Flash Flood; Flood; Freezing Fog; Frost/Freeze; Funnel Cloud; Hail; Heat; Heavy Rain; Heavy Snow; High Surf; High Wind; Hurricane/Typhoon; Ice Storm; Lakeshore Flood; Lake-Effect Snow; Lightning; Marine Hail; Marine High Wind; Marine Strong Wind; Marine Thunderstorm Wind; Rip Current; Seiche; Sleet; Storm Tide; Strong Wind; Thunderstorm Wind; Tornado; Tropical Depression; Tropical Storm; Tsunami; Volcanic Ash; Waterspout; Wildfire; Winter Storm; Winter Weather.
Let’s see how many levels variable Event Type has in our dataset:
nrow(data.frame(table(data$EVTYPE, exclude=NULL)))
## [1] 985
So, we have a problem here: while documentation points out 48 severe weather events, in this dataset there are 985. We will not deal with this issue.
The aim of this project is to answer two questions:
Accordingly to our point of view, population health is related to fatalities and injuries. So, for each weather condition, we will calculate the average: 1) number of fatalities that it caused; 2) number of injuries that it caused; 3) number of fatalities plus injuries (called here as variable Lives).
Similarly, economic consequences are related to property and crop damages. So, for each weather condition, we will calculate the average of: 1) each variable separately; 2) both variables’ summation (called here Cost).
Let’s prepare a new dataset, grouping the previous one by Event Type, with all these calculations:
event <- data %>%
group_by(EVTYPE) %>%
summarize(PROPERTY = mean(PROPDMG),
CROP = mean(CROPDMG),
COST = mean(PROPDMG + CROPDMG),
FATALITY = mean(FATALITIES),
INJURY = mean(INJURIES),
LIVES = mean(FATALITIES + INJURIES))
sample_n(event, 10)
## # A tibble: 10 x 7
## EVTYPE PROPERTY CROP COST FATALITY INJURY LIVES
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 URBAN AND SMALL STREAM FLOODIN 0 0 0 0.167 0 0.167
## 2 Wet Year 0 0 0 0 0 0
## 3 Winter Weather 0 0 0 0 0 0
## 4 VOG 0 0 0 0 0 0
## 5 Tidal Flooding 2 0 2 0 0 0
## 6 ICE PELLETS 0 0 0 0 0 0
## 7 URBAN FLOODS 0.5 0 0.5 0 0 0
## 8 BLACK ICE 0 0 0 0.0714 1.71 1.79
## 9 WINTER STORM/HIGH WINDS 0 0 0 0 0 0
## 10 HAIL/WINDS 250 25.0 275. 0 0 0
Now it is easy to plot the weather events most harmful to population health (with the average highest number of fatalities and injuries summated):
g <- ggplot(data=event %>% slice_max(LIVES, n=10),
aes(x=reorder(EVTYPE, LIVES), y=LIVES))
g +
geom_bar(stat = "identity", width = .7, fill = "#FC717F") +
coord_flip() +
labs(title = "Most Harmful Weather Events to Population Health",
x = "Event Types",
y = "Sum of Fatalities and Injuries") +
geom_text(aes(label = round(LIVES, 1)), hjust=1.5, vjust = 0.4, colour = "white", size = 4)
… and the top 10 weather events that causes more economic damages (with the average highest number of property damages and crop damages summated):
g <- ggplot(data=event %>% slice_max(COST, n=10),
aes(x=reorder(EVTYPE, COST), y=COST))
g +
geom_bar(stat = "identity", width = .7, fill = "#00B81F") +
coord_flip() +
labs(title = "Economic Consequences of Severe Weather Events",
x = "Event Types",
y = "Sum of Property and Crop Damages") +
geom_text(aes(label = COST), hjust=1.5, vjust = 0.4, colour = "white", size = 4)
We can extract from the two plots that there are two weather conditions that caused both huge sanitary and economic conditions: Tropical Storm Gordon; and High Wind and Seas.