Storms and other severe weather events can cause both public health and economic problems. Every year they result in hundreds of fatalities, thousands of injuries, multibillion damage. It can be seen from the weather records covering more than 60 years of observation that the most dangerous events for people health are tornadoes, high heat and floods. The highest property and crop damages are coming from floods, hurricanes and tornadoes.To minimize the losses it is important to have systems and processes to monitor and forecast these events, maintain dams and other protection constructions.
Data was downloaded from Storm Data into working directory on January, 11 2024. Data was uploaded into R without any modification.
setwd("~/Data Science materials/Reproducible Research/Rep_Data_Peer_Assessment2")
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","repdata_data_StormData.csv.bz2") ## download the file to working directory
## upload downloaded data into R without modification
df <- read.csv("repdata_data_StormData.csv.bz2")
The year of the event was calculated and column Date has been added
Total property and crop damage was calculated and column Damage has been added
library(dplyr)
library(lubridate)
## Data preparation for analysis
## add column Date with the year of event
df <- mutate(df,Date = year(as.Date(df$BGN_DATE,tryFormats = c("%m/%d/%Y"))))
## add column Damage with calculated property and crop damage
df <- mutate(df, Damage = 0)
df$Damage <- ifelse(df$PROPDMGEXP == "B", df$Damage + df$PROPDMG * 1000000000,df$Damage)
df$Damage <- ifelse(df$PROPDMGEXP == "M", df$Damage + df$PROPDMG * 1000000,df$Damage)
df$Damage <- ifelse(df$PROPDMGEXP == "K", df$Damage + df$PROPDMG * 1000,df$Damage)
df$Damage <- ifelse(df$CROPDMGEXP == "B", df$Damage + df$CROPDMG * 1000000000,df$Damage)
df$Damage <- ifelse(df$CROPDMGEXP == "M", df$Damage + df$CROPDMG * 1000000,df$Damage)
df$Damage <- ifelse(df$CROPDMGEXP == "K", df$Damage + df$CROPDMG * 1000,df$Damage)
Numbers below demonstrate that fatalities, injuries and damages that result from severe weather events stay very substantial over all years of observation. They tend to grow as the population and level of economical development grow.
## Calculate total numbers by year
by_year <- aggregate(FATALITIES ~ Date, df, sum) ## fatalities
by_year2 <- aggregate(INJURIES ~ Date, df, sum) ## injuries
by_year3 <- aggregate(Damage ~ Date, df, sum) ## damage
par(mfcol = c(1,3), mar = c(4,4,2,2)) ## set the plotting
plot(by_year$Date,by_year$FATALITIES, type="l", ylab = "Fatalities", xlab = "Years", main = "Total number of fatalities")
plot(by_year2$Date,by_year2$INJURIES, type="l", ylab = "Injuries", xlab = "Years", main = "Total number of injuries")
plot(by_year3$Date,by_year3$Damage/1000000000, type="l", ylab = "Damage ( $B)", xlab = "Years", main = "Total damage ($B)")
Numbers below demonstrate which weather events are most harmful for population health.
The presented top-10 event types are responsible for 80-90% of losses. Tornado is by far the #1 leader. Excessive heat, flood and strong wind are the next in the row.
## Plot fatalities and injuries
par(mfcol = c(1,2), mar = c(10,4,2,2)) ## set the plotting
## fatalities by type
by_type <- aggregate(FATALITIES ~ EVTYPE, df, sum)
by_type <- arrange(by_type,desc(FATALITIES))
by_type$EVTYPE <- factor(by_type$EVTYPE,c(by_type$EVTYPE[1:10]))
plot(by_type$EVTYPE[1:10],by_type$FATALITIES[1:10], las = 2, cex.axis = 0.8, xlab="", ylab="", main="")
title(main = "Number of fatalities by type - top 10", ylab = "Number of fatalities")
## injuries by type
by_type2 <- aggregate(INJURIES ~ EVTYPE, df, sum)
by_type2 <- arrange(by_type2,desc(INJURIES))
by_type2$EVTYPE <- factor(by_type2$EVTYPE,c(by_type2$EVTYPE[1:10]))
plot(by_type2$EVTYPE[1:10],by_type2$INJURIES[1:10], las =2, cex.axis = 0.8, xlab="", ylab="", main="")
title(main = "Number of injuries by type - top 10", ylab = "Number of injuries")
Numbers below demonstrate which types of events have the greatest economic consequences.
The presented top-10 event types are responsible for 90%+ of losses. Floods are by far the #1 leader. Hurricanes, tornadoes and storm surges are the next in the row.
## damage by type
par(mar = c(10,4,2,2))
by_type3 <- aggregate(Damage ~ EVTYPE, df, sum)
by_type3 <- arrange(by_type3,desc(Damage))
by_type3$EVTYPE <- factor(by_type3$EVTYPE,c(by_type3$EVTYPE[1:10]))
plot(by_type3$EVTYPE[1:10],by_type3$Damage[1:10]/1000000000, las =2, cex.axis = 0.8, xlab="", ylab="", main="")
title(main = "Damage by type - top 10", ylab = "Damage ($B)")