Synopsis

Storms and other severe weather events can cause both public health and economic problems. Every year they result in hundreds of fatalities, thousands of injuries, multibillion damage. It can be seen from the weather records covering more than 60 years of observation that the most dangerous events for people health are tornadoes, high heat and floods. The highest property and crop damages are coming from floods, hurricanes and tornadoes.To minimize the losses it is important to have systems and processes to monitor and forecast these events, maintain dams and other protection constructions.

Data processing

Data was downloaded from Storm Data into working directory on January, 11 2024. Data was uploaded into R without any modification.

        setwd("~/Data Science materials/Reproducible Research/Rep_Data_Peer_Assessment2")
        download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","repdata_data_StormData.csv.bz2") ## download the file to working directory
        ## upload downloaded data into R without modification
        df <- read.csv("repdata_data_StormData.csv.bz2")

Data preparation for analysis:

  • The year of the event was calculated and column Date has been added

  • Total property and crop damage was calculated and column Damage has been added

        library(dplyr)
        library(lubridate)
        
        ## Data preparation for analysis
        ## add column Date with the year of event
        df <- mutate(df,Date = year(as.Date(df$BGN_DATE,tryFormats = c("%m/%d/%Y"))))
        ## add column Damage with calculated property and crop damage
        df <- mutate(df, Damage = 0)
        df$Damage <- ifelse(df$PROPDMGEXP == "B", df$Damage + df$PROPDMG * 1000000000,df$Damage)
        df$Damage <- ifelse(df$PROPDMGEXP == "M", df$Damage + df$PROPDMG * 1000000,df$Damage)
        df$Damage <- ifelse(df$PROPDMGEXP == "K", df$Damage + df$PROPDMG * 1000,df$Damage)
        
        df$Damage <- ifelse(df$CROPDMGEXP == "B", df$Damage + df$CROPDMG * 1000000000,df$Damage)
        df$Damage <- ifelse(df$CROPDMGEXP == "M", df$Damage + df$CROPDMG * 1000000,df$Damage)
        df$Damage <- ifelse(df$CROPDMGEXP == "K", df$Damage + df$CROPDMG * 1000,df$Damage)

Results

Figure 1. Fatalities, injuries and damages by year across observation period

Numbers below demonstrate that fatalities, injuries and damages that result from severe weather events stay very substantial over all years of observation. They tend to grow as the population and level of economical development grow.

        ## Calculate total numbers by year
        by_year <- aggregate(FATALITIES ~ Date, df, sum) ## fatalities
        by_year2 <- aggregate(INJURIES ~ Date, df, sum) ## injuries
        by_year3 <- aggregate(Damage ~ Date, df, sum) ## damage
        
        par(mfcol = c(1,3), mar = c(4,4,2,2)) ## set the plotting
        plot(by_year$Date,by_year$FATALITIES, type="l", ylab = "Fatalities", xlab = "Years", main = "Total number of fatalities")
        plot(by_year2$Date,by_year2$INJURIES, type="l", ylab = "Injuries", xlab = "Years", main = "Total number of injuries")
        plot(by_year3$Date,by_year3$Damage/1000000000, type="l", ylab = "Damage ( $B)", xlab = "Years", main = "Total damage ($B)")

Figure2. Fatalities and injuries by event type (top 10 by damage)

Numbers below demonstrate which weather events are most harmful for population health.

The presented top-10 event types are responsible for 80-90% of losses. Tornado is by far the #1 leader. Excessive heat, flood and strong wind are the next in the row.

        ## Plot fatalities and injuries
        par(mfcol = c(1,2), mar = c(10,4,2,2)) ## set the plotting
        ## fatalities by type
        by_type <- aggregate(FATALITIES ~ EVTYPE, df, sum)
        by_type <- arrange(by_type,desc(FATALITIES))
        by_type$EVTYPE <- factor(by_type$EVTYPE,c(by_type$EVTYPE[1:10]))
        plot(by_type$EVTYPE[1:10],by_type$FATALITIES[1:10], las = 2, cex.axis = 0.8, xlab="", ylab="", main="")
        title(main = "Number of fatalities by type - top 10", ylab = "Number of fatalities")
        
        ## injuries by type
        by_type2 <- aggregate(INJURIES ~ EVTYPE, df, sum)
        by_type2 <- arrange(by_type2,desc(INJURIES))
        by_type2$EVTYPE <- factor(by_type2$EVTYPE,c(by_type2$EVTYPE[1:10]))
        plot(by_type2$EVTYPE[1:10],by_type2$INJURIES[1:10], las =2, cex.axis = 0.8, xlab="", ylab="", main="")
        title(main = "Number of injuries by type - top 10", ylab = "Number of injuries")

Figure 3. Property and crop damage by event type (top 10 by damage)

Numbers below demonstrate which types of events have the greatest economic consequences.

The presented top-10 event types are responsible for 90%+ of losses. Floods are by far the #1 leader. Hurricanes, tornadoes and storm surges are the next in the row.

        ## damage by type 
        par(mar = c(10,4,2,2))
        by_type3 <- aggregate(Damage ~ EVTYPE, df, sum)
        by_type3 <- arrange(by_type3,desc(Damage))
        by_type3$EVTYPE <- factor(by_type3$EVTYPE,c(by_type3$EVTYPE[1:10]))
        plot(by_type3$EVTYPE[1:10],by_type3$Damage[1:10]/1000000000, las =2, cex.axis = 0.8, xlab="", ylab="", main="")
        title(main = "Damage by type - top 10", ylab = "Damage ($B)")