Severe weather events have a negative impact on both public health and the economy. The damage caused in past events can be calculated which in turn can provide statistical information which can guide policy decisions and emergency management plans.

This report will focus on two key areas, public health and the economy, by specifically looking at which types of weather events have the greatest impact in each.

The data used in the analysis is the U.S. National Oceanic and Atmospheric Administration’s Storm database. The NOAA database compiles information on major storms and other weather events across the United States. The location, duration and date of events are recorded as well as fatalities, injuries, property damage and crop damage.The database covers a time period from 1950 to 2011.

The report is structured in two sections. The first ‘Data Processing’ details the steps taken to process the raw NOAA data and the calculations made to create analytical data. The R code chunks are given in sequence with text providing explanations and motivations for the methods used. The second section ‘Results’ analyze the data and provides a brief discussion of the results. Both sections are further divided in order to look at Public Health and Economic Impact separately.

Data Processing

The NOAA data is loaded and processed in R.

Packages

The majority of the data analysis is executed using base R but these additional packages are also needed

1: ‘tidyr’ : used to pivot data tables before plotting the data 2: ‘ggplot2’ : used for creating plots

    library(ggplot2)
    library(tidyr)

Loading raw data

The raw data was made available through the course website and loaded to the project directory. The raw data is in a csv file and is downloaded as a zip file in the .csv.bz2 format. The file is unzipped automatically when using the dataCSV function.

    dirW <- "C:/Users/annal/Dropbox/03_Education/07_Data Science/Coursera/Course 5 - Reproducible Research/Assignment 2"
    setwd(dirW)

    folderName <- "repdata_data_StormData.csv.bz2"

    dataCSV <- read.csv(folderName)

Calculating basic statistics regarding public health

There are two variables in the data set related to public health, the number of fatalities for each event and the number of injuries. To evaluate the impact of specific types of weather events it is helpful to start with summary statistics (total, median, mean). The first step is to look at the summary statistics for these two variables across all weather events.

print(summary(dataCSV$FATALITIES))
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   0.0000   0.0000   0.0000   0.0168   0.0000 583.0000
print(summary(dataCSV$INJURIES))
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##    0.0000    0.0000    0.0000    0.1557    0.0000 1700.0000

The next step is to calculate the summary statistics for each severe weather event type. This time we can leave median out since the previous process have shown that median is not helpful. We will calculate the total number of events (count) as well, the total value (sum) and the average (mean). The data is calculated using a for loop that is based on the number of unique values for weather events (EVTYPE)

#   first extract all the unique values for events

    events <- unique(dataCSV$EVTYPE)

#   Second create an empty data frame

    dataHealth <- data.frame(matrix(ncol = 6, nrow = length(events)))
    colnames(dataHealth) <- c("Events", "Count", "F_Sum", 
                              "F_Mean", "I_Sum", "I_Mean")

#   lastly calculate the total and mean values for each event using a 'for' loop
    
    for (rn in 1:length(events)) {

        dataHealth[rn, 1] <- events[rn]
        dataHealth[rn, 2] <- length(dataCSV$FATALITIES[dataCSV$EVTYPE==events[rn]])
        dataHealth[rn, 3] <- sum(subset(dataCSV$FATALITIES, dataCSV$EVTYPE == events[rn]), na.rm = TRUE)
        dataHealth[rn, 4] <- mean(subset(dataCSV$FATALITIES, dataCSV$EVTYPE == events[rn]), na.rm = TRUE)   
        dataHealth[rn, 5] <- sum(subset(dataCSV$INJURIES, dataCSV$EVTYPE == events[rn]), na.rm = TRUE)
        dataHealth[rn, 6] <- mean(subset(dataCSV$INJURIES, dataCSV$EVTYPE == events[rn]), na.rm = TRUE) 
    }

Weather Events with the highest impact on Public Health

There are over 900 unique events and many of them have no impact on public health. For this report we will focus on the 20 events that have the most significant impact in regards to fatalities and injuries.

#   Order the total fatalities and extract the top 20 rows

    dataHealth <- dataHealth[order(dataHealth$F_Sum, decreasing = TRUE),]
    resultsFatSum <- dataHealth[1:20, c(1, 3)]
    
#   Order the fatalities means and extract the top 20 rows
    
    dataHealth <- dataHealth[order(dataHealth$F_Mean, decreasing = TRUE),]
    resultsFatMean <- dataHealth[1:20, c(1, 4)]
    
#   Order the injuries sum and extract the top 20 rows
    
    dataHealth <- dataHealth[order(dataHealth$I_Sum, decreasing = TRUE),]
    resultsInjSum <- dataHealth[1:20, c(1, 5)]
    
#   Order the injuries means and extract the top 20 rows
    
    dataHealth <- dataHealth[order(dataHealth$I_Mean, decreasing = TRUE),]
    resultsInjMean <- dataHealth[1:20, c(1, 6)]

To display the results, create a data frame that combines the values calculated above

#   Combine all the events in the 4 individual results data frames into one list

    eventsHealth <- resultsFatSum$Events
    eventsHealth <- append(eventsHealth, resultsFatMean$Events)
    eventsHealth <- append(eventsHealth, resultsInjSum$Events) 
    eventsHealth <- append(eventsHealth, resultsInjMean$Events)
    eventsHealth <- unique(eventsHealth)
    
#   Create dataframe by subsetting dataHealth by only the events that have the
#   highest impact (eventsHealth)
    
    dataHealthHigh <- subset(dataHealth, Events %in% eventsHealth)

The highest mean values for fatalities and injuries can best be visualized as a plot.

#   To plot the events with the highest mean for fatalities and injuries we need
#   to first subset the data using using resultsFatMean and resultsInjMean and
#   and then remove all duplicate event values

    eventsMean <- resultsFatMean$Events
    eventsMean <- append(eventsMean, resultsInjMean$Events)
    eventsMean <- unique(eventsMean)
    
#   Create dataframe by subsetting dataHealth by only the events that have the
#   highest mean (eventsMean), only take the columns with events and mean values
    
    dataHealthPlot1 <- subset(dataHealth, Events %in% eventsMean)
    dataHealthPlot1 <- dataHealthPlot1[, c(1,4,6)]
    
#   Change the col names to values that will read better in the plot
    
    colnames(dataHealthPlot1) <- c("Events", "Fatalities", "Injuries" )
    
#   To create the plots the data frame needs to pivot in order to combine 3 variables
#   into 2 (Type and Value)

    dataHealthPlot2 <- dataHealthPlot1 %>% pivot_longer(!Events,  names_to = "Type", values_to = "Mean")

    plotHealthMean <-       ggplot(data = dataHealthPlot2, aes(x= Events, y=Mean, fill=Type)) +
                            geom_bar(stat = "identity") +
                            theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Calculating basic statistics regarding economic impact

There are a number of factors in the raw data which relates to the economic damage caused by weather events. PROPDMG provides property damage in a dollar amount, while CROPDMG records damage to crops, also in dollar amounts. Both of these factors are accompanied by a second factor that provides the magnitude (XP). K for thousands, M for millions, B for Billions and T for Trillions. Unfortunately, when looking at the unique values in PROPDMGEXP and CROPDMGEXP there are a number of entries which fall outside the given parameters. These need to be removed first before the XP code value is replaced by the corresponding numeric value.

#   Create data tables for property and crop damage that remove all rows with incorrect
#   XP values

    dataProp <- dataCSV[dataCSV$PROPDMGEXP %in% c("K", "M", "B", "T"),]
    
    dataCrop <- dataCSV[dataCSV$CROPDMGEXP  %in% c("K", "M", "B", "T"),]
    
#   Create a lookup table for the magnitude values
    
    XP <- c("K","M","B","T")
    X <- c(1e3, 1e6, 1e9, 1e12)

    lookup <- data.frame( XP = XP, X = X)

#   Replace the XP code with the numeric value using the lookup table

    dataProp$PROPDMGEXP <- lookup$X[match(dataProp$PROPDMGEXP, lookup$XP)]
    dataCrop$CROPDMGEXP <- lookup$X[match(dataCrop$CROPDMGEXP, lookup$XP)]

The property and crop damage is calculated by multiplying the value with the numeric value of the magnitude. The new value is added to a new column to each data table.

#   Create a new empty column to contain the multiplied values for property and crop damage

    dataProp[, "PropertyDamage"] = NA
    dataCrop[, "CropDamage"] = NA
    
#   Calculate the new property damage costs by multiplying the Cost with the Magnitude
    
    for (nr in 1:nrow(dataProp)) {

    dataProp$PropertyDamage[nr] <- dataProp$PROPDMG[nr] * dataProp$PROPDMGEXP[nr]
    
    }
    
    #   Calculate the new crop damage costs by multiplying the Cost with the Magnitude
    
    for (nr in 1:nrow(dataCrop)) {

    dataCrop$CropDamage[nr] <- dataCrop$CROPDMG[nr] * dataCrop$CROPDMGEXP[nr]
    
    }

The total cost for property and crop damage caused by each type of weather event is calculated and added to a new data table.

#   Now we create a data table where the total for each type of event is given
#   First step is to calculate all the unique values for EVTYPE

    events <- unique(dataCSV$EVTYPE)

#   then a blank data frame is created with the column headings already given

    dataCost <- data.frame(matrix(ncol = 3, nrow = length(dataCSV)))
    colnames(dataCost) <- c("Events", "Property", "Crop")

#   then the data frame is populated using a for loop
    
    for (rn in 1:length(events)) {

    dataCost[rn, 1] <- events[rn]
    dataCost[rn, 2] <- sum(dataProp[which(dataProp$EVTYPE == events[rn]), 38])  
    dataCost[rn, 3] <- sum(dataCrop[which(dataCrop$EVTYPE == events[rn]), 38])

}

Weather events with the highest economic impact

There are over 900 types of weather events, many of which have no property of crop damage recorded. To simplify we will extract only the 20 events with the highest values for both types of damage

#   Order the property damage and extract the top 20 values

    dataCost <- dataCost[order(dataCost$Property, decreasing = TRUE),]
    resultsP <- dataCost[1:20, c(1, 2)]
    
#   Order the crop damage and extract the top 20 values
    
    dataCost <- dataCost[order(dataCost$Crop, decreasing = TRUE),]
    resultsC <- dataCost[1:20, c(1, 3)]

The cost related to property and crop damage can best be visualized by a plot

#   Create dataframe by subsetting dataCost by only the events that have the
#   highest mean (eventsMean), only take the columns with events and mean values

    eventsHigh <- resultsP$Events
    eventsHigh <- append(eventsHigh, resultsC$Events)
    eventsHigh <- unique(eventsHigh)
    
#   Create a dataframe by subsetting dataCost by only the events that have the
#   highest total costs (eventsHigh)
    
    dataCostHigh <- subset(dataCost, Events %in% eventsHigh)

#   To create the plots the data frame needs to pivot in order to combine 3 variables
#   into 2 (Type and Value)
    
    dataPlot <- dataCostHigh %>% pivot_longer(!Events,  names_to = "Type", values_to = "Total")

#   Create a plot file using ggplot2

    plotCostSum <-  ggplot(data = dataPlot, aes(x= Events, y=Total, fill=Type)) +
                    geom_bar(stat = "identity") +
                    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Results

Impact of Severe Weather events on Public Health

There are over 900 specific types of weather events. To investigate the impact on public health, only the events with the highest mean or total fatalities (F) and Injuries (I) are shown. The number of instances of each type of weather event is also counted.

##                         Events  Count F_Sum    F_Mean I_Sum   I_Mean
## 779                  Heat Wave      1     0  0.000000    70 70.00000
## 210      TROPICAL STORM GORDON      1     8  8.000000    43 43.00000
## 114                 WILD FIRES      4     3  0.750000   150 37.50000
## 540              THUNDERSTORMW      1     0  0.000000    27 27.00000
## 554         HIGH WIND AND SEAS      1     3  3.000000    20 20.00000
## 524            SNOW/HIGH WINDS      2     0  0.000000    36 18.00000
## 486          HEAT WAVE DROUGHT      1     4  4.000000    15 15.00000
## 119    WINTER STORM HIGH WINDS      1     1  1.000000    15 15.00000
## 364            GLAZE/ICE STORM      1     0  0.000000    15 15.00000
## 973          HURRICANE/TYPHOON     88    64  0.727273  1275 14.48864
## 934         WINTER WEATHER MIX      6     0  0.000000    68 11.33333
## 90                EXTREME HEAT     22    96  4.363636   155  7.04545
## 920     NON-SEVERE WIND DAMAGE      1     0  0.000000     7  7.00000
## 179                      GLAZE     32     7  0.218750   216  6.75000
## 979                    TSUNAMI     20    33  1.650000   129  6.45000
## 121              WINTER STORMS      3    10  3.333333    17  5.66667
## 442                 TORNADO F2      3     0  0.000000    16  5.33333
## 200         WATERSPOUT/TORNADO      8     3  0.375000    42  5.25000
## 539         EXCESSIVE RAINFALL      4     2  0.500000    21  5.25000
## 182                  HEAT WAVE     74   172  2.324324   309  4.17568
## 99              EXCESSIVE HEAT   1678  1903  1.134088  6525  3.88856
## 27                        HEAT    767   937  1.221643  2100  2.73794
## 74               MARINE MISHAP      2     7  3.500000     5  2.50000
## 914                 ROUGH SEAS      3     8  2.666667     5  1.66667
## 1                      TORNADO  60652  5633  0.092874 91346  1.50607
## 276                        FOG    538    62  0.115242   734  1.36431
## 92                  DUST STORM    427    22  0.051522   440  1.03044
## 65                   ICE STORM   2006    89  0.044367  1975  0.98455
## 443               RIP CURRENTS    304   204  0.671053   297  0.97697
## 18                 RIP CURRENT    470   368  0.782979   232  0.49362
## 73                   AVALANCHE    386   224  0.580311   170  0.44041
## 227           WILD/FOREST FIRE   1457    12  0.008236   545  0.37406
## 43                EXTREME COLD    655   160  0.244275   231  0.35267
## 15                   LIGHTNING  15754   816  0.051796  5230  0.33198
## 221                   WILDFIRE   2761    75  0.027164   911  0.32995
## 47                    BLIZZARD   2719   101  0.037146   805  0.29606
## 36                       FLOOD  25326   470  0.018558  6789  0.26806
## 111                  HIGH SURF    725   101  0.139310   152  0.20966
## 8                 WINTER STORM  11433   206  0.018018  1321  0.11554
## 137                STRONG WIND   3566   103  0.028884   280  0.07852
## 53                  HEAVY SNOW  15708   127  0.008085  1021  0.06500
## 46                   HIGH WIND  20212   248  0.012270  1137  0.05625
## 10          THUNDERSTORM WINDS  20843    64  0.003071   908  0.04356
## 20                 FLASH FLOOD  54277   978  0.018019  1777  0.03274
## 2                    TSTM WIND 219940   504  0.002292  6957  0.03163
## 967    EXTREME COLD/WIND CHILL   1002   125  0.124750    24  0.02395
## 16           THUNDERSTORM WIND  82563   133  0.001611  1488  0.01802
## 3                         HAIL 288661    15  0.000052  1361  0.00471
## 207 TORNADOES, TSTM WIND, HAIL      1    25 25.000000     0  0.00000
## 786              COLD AND SNOW      1    14 14.000000     0  0.00000
## 409      RECORD/EXCESSIVE HEAT      3    17  5.666667     0  0.00000
## 82              HIGH WIND/SEAS      1     4  4.000000     0  0.00000
## 834        Heavy surf and wind      1     3  3.000000     0  0.00000
## 406    RIP CURRENTS/HEAVY SURF      2     5  2.500000     0  0.00000
## 410                 HEAT WAVES      2     5  2.500000     0  0.00000
## 490  UNSEASONABLY WARM AND DRY     13    29  2.230769     0  0.00000
## 9    HURRICANE OPAL/HIGH WINDS      1     2  2.000000     0  0.00000
## 561                 HEAVY SEAS      2     3  1.500000     0  0.00000
## 813       Hypothermia/Exposure      3     4  1.333333     0  0.00000

There are more than one way to calculate which weather event has the highest impact on Public health. In terms of the number of fatalities these three events have the highest overall impact TORNADO, EXCESSIVE HEAT, FLASH FLOOD. But when we consider the mean Fatalities, the three events with the highest values (TORNADOES, TSTM WIND, HAIL, COLD AND SNOW, TROPICAL STORM GORDON) are different. The reason is that a small number of one type of event with high fatalities might have a smaller total value but a higher mean compared than a large number of another type of event with a smaller mean, but a higher total due to the number of events.

When looking at the impact of specific types of events in the past it might be helpful to consider the total. But when planning for future events knowing the statistical likelihood that an upcoming weather event might impact public health is also important. The mean value is helpful in this regard. If we consider the mean values in more detail we can look at the following graph which plots the events with the highest mean for both fatalities and injuries

Plot 1: Weather events with the highest public health cost

Plot 1: Weather events with the highest public health cost

Economic impact of Severe Weather Events

There are two variables which relate to economic impact, property damage and damage to crops. The types of weather events with the highest damage (calculated in dollar amounts) in both are given in the table below.

##                        Events     Property        Crop
## 194                   DROUGHT   1046106000 13972566000
## 36                      FLOOD 144657709800  5661968450
## 52                RIVER FLOOD   5118945500  5029459000
## 65                  ICE STORM   3944927810  5022113500
## 3                        HAIL  15727366720  3025537450
## 226                 HURRICANE  11868319010  2741910000
## 973         HURRICANE/TYPHOON  69305840000  2607872800
## 20                FLASH FLOOD  16140811510  1421317100
## 43               EXTREME COLD     67737400  1292973000
## 960              FROST/FREEZE      9480000  1094086000
## 14                 HEAVY RAIN    694248090   733399800
## 209            TROPICAL STORM   7703890550   678346000
## 46                  HIGH WIND   5270046260   638571300
## 2                   TSTM WIND   4484928440   554007350
## 99             EXCESSIVE HEAT      7753700   492402000
## 54                     FREEZE       205000   446225000
## 1                     TORNADO  56925660480   414953110
## 16          THUNDERSTORM WIND   3483121140   414843050
## 27                       HEAT      1797000   401461500
## 221                  WILDFIRE   4765114000   295472800
## 10         THUNDERSTORM WINDS   1733452850   190650700
## 227          WILD/FOREST FIRE   3001829500   106796830
## 8                WINTER STORM   6688497250    26944000
## 13             HURRICANE OPAL   3152846000     9000000
## 976          STORM SURGE/TIDE   4641188000      850000
## 204               STORM SURGE  43323536000        5000
## 313 HEAVY RAIN/SEVERE WEATHER   2500000000           0

The three weather events which have caused the highest property damage: FLOOD, HURRICANE/TYPHOON, TORNADO

The three weather events which caused the most severe crop damage: DROUGHT, FLOOD, RIVER FLOOD

Events such as Drought have little effect in terms of property damage, but a significant impact on crops. Floods on the other hand cause significant damage to both.

Another way of visualizing the data in the table above is in a stacked bar plot:

Plot 2: Weather events with the highest economic impact

Plot 2: Weather events with the highest economic impact