The Economic and Health effects of severe weather events across the United States

This project explored the Storm Data database of the National Weather Service to answer two basic questions about the effects of severe weather events. Data from 1996 to 2011 wll be used since these years contain the most complete data. The first part of the project seeks to ascertain those severe weather events that are most harmful to the health of the population. The fatalities and injuries incurred by these events will be examined here. The second part of the project seeks to ascertain which severe weather events have the greatest economic consequences. Property and crop damage incurred by these events will be examined. This project assumes that the compressed data file “repdata_data_StormData.csv.bz2” is stored in the working directory.

Loaded Libraries

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lubridate)

Data Processing

Loading the Data

The compressed file was read and loaded into the stormData Variable with stringsAsFactors set to FALSE.

stormData <- read.csv("repdata_data_StormData.csv.bz2", header=TRUE, sep=",", stringsAsFactors = FALSE)

Cleaning the Data

The Date variable was created to format the BGN_DATE variable as a date. This will be used as the date of the occurrence of the severe storm weather event. The thinking is that the weather occurrence date will be taken as the day the occurrence began.

stormData2 <- mutate(stormData, Date = mdy_hms(BGN_DATE))

Only a subset of the data will be used for the analysis. According to the National Climatic Data Centre (https://www.ncdc.noaa.gov/stormevents/details.jsp) only the data from 1996 to 2011 is complete. Therefore, only this data will be used for the analysis for this project.

selectData <- filter(stormData2, year(Date)>= 1996 & year(Date)<=2011)

It is assumed that the PROPDMGEXP and CROPDMGEXP is the multiplier for the PROPDMG and CROPDMG fields respectively. Therefore the data was adjusted as follows:

- K was converted to 1000
- B was converted to 1000000000
- M was converted to 1000000
- other values  were converted to 1
    selectData$PROPDMGEXP[selectData$PROPDMGEXP == "K"] <- 1000
    selectData$PROPDMGEXP[selectData$PROPDMGEXP == "B"] <- 1000000000
    selectData$PROPDMGEXP[selectData$PROPDMGEXP == "M"] <- 1000000
    selectData$PROPDMGEXP[selectData$PROPDMGEXP == ""] <- 1
    selectData$PROPDMGEXP[selectData$PROPDMGEXP == "0"] <- 1
    selectData$CROPDMGEXP[selectData$CROPDMGEXP == "K"] <- 1000
    selectData$CROPDMGEXP[selectData$CROPDMGEXP == "B"] <- 1000000000
    selectData$CROPDMGEXP[selectData$CROPDMGEXP == "M"] <- 1000000
    selectData$CROPDMGEXP[selectData$CROPDMGEXP == ""] <- 1

The PROPDMGEXP and CROPDMGEXP fields were then converted to numeric

    selectData$PROPDMGEXP <- as.numeric(selectData$PROPDMGEXP)
    selectData$CROPDMGEXP <- as.numeric(selectData$CROPDMGEXP)

Population health data preparation

The following formula will be used to represent population health.
fatilities + injuries.

newData <- mutate(selectData, health = FATALITIES + INJURIES)

All records with a health of 0 is removed:

healthData <- filter(newData, health != 0)

The data is then cleaned by manually rectifying some key event labels with those that are offically recognised according to the data collection agency. After this combining, the events with the higest affect on population should be identified. Note that only those events that had a high impact on health were considered for rectification.

healthData$EVENT <- healthData$EVTYPE
healthData$EVENT <- toupper(healthData$EVENT)
healthData$EVENT[grepl("TSTM", healthData$EVENT)] <- "THUNDERSTORM WIND"
healthData$EVENT[grepl("THUNDERSTORM", healthData$EVENT)] <- "THUNDERSTORM WIND"
healthData$EVENT[grepl("RIP CURRENTS", healthData$EVENT)] <- "RIP CURRENT"
healthData$EVENT[grepl("EXTREME COLD", healthData$EVENT)] <- "EXTREME COLD/WIND CHILL"
healthData$EVENT[grepl("EXTREME WINDCHILL", healthData$EVENT)] <- "EXTREME COLD/WIND CHILL"
healthData$EVENT[grepl("COLD/WIND CHILL", healthData$EVENT)] <- "EXTREME COLD/WIND CHILL"
healthData$EVENT[grepl("FOG", healthData$EVENT)] <- "DENSE FOG"
healthData$EVENT[grepl("HURRICANE", healthData$EVENT)] <- "HURRICANE (TYPHOON)"
healthData$EVENT[grepl("WILD/FOREST FIRE", healthData$EVENT)] <- "WILDFIRE"

Economic Consequences data preparation

The following formula will be used to represent economic consequences PROPDMG * PROPDMGEXP + CROPDMG * CROPDMGEXP

newData <- mutate(newData, economic = PROPDMG * PROPDMGEXP + CROPDMG * CROPDMGEXP)

All records with economic = 0 removed.

economicData <- filter(newData, economic !=0)

The data is then cleaned by manually rectifying some key events with those that are offically recognised according to the data collection agency. After this combining, the events with the higest affect on population should be identified. Note that only those events that had a high economic consequence were considered for rectification.

economicData$EVENT <- economicData$EVTYPE
economicData$EVENT <- toupper(economicData$EVENT)
economicData$EVENT[grepl("TSTM", economicData$EVENT)] <- "THUNDERSTORM WIND"
economicData$EVENT[grepl("THUNDERSTORM", economicData$EVENT)] <- "THUNDERSTORM WIND"
economicData$EVENT[grepl("RIP CURRENTS", economicData$EVENT)] <- "RIP CURRENT"
economicData$EVENT[grepl("EXTREME COLD", economicData$EVENT)] <- "EXTREME COLD/WIND CHILL"
economicData$EVENT[grepl("EXTREME WINDCHILL", economicData$EVENT)] <- "EXTREME COLD/WIND CHILL"
economicData$EVENT[grepl("COLD/WIND CHILL", economicData$EVENT)] <- "EXTREME COLD/WIND CHILL"
economicData$EVENT[grepl("FOG", economicData$EVENT)] <- "DENSE FOG"
economicData$EVENT[grepl("HURRICANE", economicData$EVENT)] <- "HURRICANE (TYPHOON)"
economicData$EVENT[grepl("TYPHOON", economicData$EVENT)] <- "HURRICANE (TYPHOON)"
economicData$EVENT[grepl("WILD/FOREST FIRE", economicData$EVENT)] <- "WILDFIRE"
economicData$EVENT[grepl("STORM SURGE", economicData$EVENT)] <- "STORM SURGE/TIDE"
economicData$EVENT[grepl("FREEZE", economicData$EVENT)] <- "FROST/FREEZE"

Results

Across the United States, which types of events are most harmful with respect to population health

First the data is summarised

healthData$EVENT <- as.factor(healthData$EVENT)
healthSummary <- aggregate(healthData$health, by=list(Category=healthData$EVENT), FUN=sum, na.rm=TRUE)
print(healthSummary)
##                     Category     x
## 1                  AVALANCHE   379
## 2                  BLACK ICE    25
## 3                   BLIZZARD   455
## 4               BLOWING SNOW     2
## 5                 BRUSH FIRE     2
## 6              COASTAL FLOOD     5
## 7           COASTAL FLOODING     3
## 8   COASTAL FLOODING/EROSION     5
## 9              COASTAL STORM     5
## 10              COASTALSTORM     1
## 11                      COLD    30
## 12             COLD AND SNOW    14
## 13          COLD TEMPERATURE     2
## 14              COLD WEATHER     2
## 15                 DENSE FOG   924
## 16                   DROUGHT     4
## 17                  DROWNING     1
## 18            DRY MICROBURST    28
## 19                DUST DEVIL    41
## 20                DUST STORM   387
## 21            EXCESSIVE HEAT  8188
## 22            EXCESSIVE SNOW     2
## 23             EXTENDED COLD     1
## 24   EXTREME COLD/WIND CHILL   472
## 25          FALLING SNOW/ICE     2
## 26               FLASH FLOOD  2561
## 27                     FLOOD  7172
## 28          FREEZING DRIZZLE    15
## 29             FREEZING RAIN     2
## 30            FREEZING SPRAY     1
## 31                     FROST     4
## 32              FUNNEL CLOUD     1
## 33                     GLAZE   213
## 34                GUSTY WIND     2
## 35               GUSTY WINDS    14
## 36                      HAIL   720
## 37            HAZARDOUS SURF     1
## 38                      HEAT  1459
## 39                 HEAT WAVE    70
## 40                HEAVY RAIN   324
## 41                HEAVY SEAS     1
## 42                HEAVY SNOW   805
## 43         HEAVY SNOW SHOWER     2
## 44                HEAVY SURF    46
## 45       HEAVY SURF AND WIND     3
## 46      HEAVY SURF/HIGH SURF    90
## 47                 HIGH SEAS    10
## 48                 HIGH SURF   240
## 49               HIGH SWELLS     1
## 50                HIGH WATER     3
## 51                 HIGH WIND  1318
## 52       HURRICANE (TYPHOON)  1448
## 53     HYPERTHERMIA/EXPOSURE     1
## 54      HYPOTHERMIA/EXPOSURE     7
## 55               ICE ON ROAD     1
## 56                 ICE ROADS     1
## 57                 ICE STORM   400
## 58                 ICY ROADS    26
## 59                 LANDSLIDE    89
## 60                LANDSLIDES     2
## 61                LIGHT SNOW     3
## 62                 LIGHTNING  4792
## 63           MARINE ACCIDENT     3
## 64          MARINE HIGH WIND     2
## 65        MARINE STRONG WIND    36
## 66              MIXED PRECIP    28
## 67                  MUDSLIDE     6
## 68                 MUDSLIDES     1
## 69    NON-SEVERE WIND DAMAGE     7
## 70                     OTHER     4
## 71                 RAIN/SNOW     6
## 72               RECORD HEAT     2
## 73               RIP CURRENT  1045
## 74               RIVER FLOOD     1
## 75            RIVER FLOODING     2
## 76                ROGUE WAVE     2
## 77                ROUGH SEAS    13
## 78                ROUGH SURF     5
## 79                SMALL HAIL    10
## 80                      SNOW    14
## 81              SNOW AND ICE     1
## 82               SNOW SQUALL    37
## 83              SNOW SQUALLS     1
## 84               STORM SURGE    39
## 85          STORM SURGE/TIDE    16
## 86               STRONG WIND   381
## 87              STRONG WINDS    28
## 88         THUNDERSTORM WIND  5562
## 89            TIDAL FLOODING     1
## 90                   TORNADO 22178
## 91       TORRENTIAL RAINFALL     4
## 92            TROPICAL STORM   395
## 93                   TSUNAMI   162
## 94                   TYPHOON     5
## 95         UNSEASONABLY WARM    17
## 96      URBAN/SML STREAM FLD   107
## 97              WARM WEATHER     2
## 98                WATERSPOUT     4
## 99                 WHIRLWIND     1
## 100                 WILDFIRE  1543
## 101                     WIND   102
## 102                    WINDS     1
## 103             WINTER STORM  1483
## 104           WINTER WEATHER   376
## 105       WINTER WEATHER MIX    68
## 106       WINTER WEATHER/MIX   100
## 107               WINTRY MIX    78

Sort and Choose the top ten occurrences

healthSummarySorted <- arrange(healthSummary, desc(x))
selectedHealthSummary <- healthSummarySorted[1:10,]

Plot the graph

The graph shows the top ten events that have the highest impact on health

par(mar=c(15,6,4,2)) 
mgp=c(2,5,1)

barplot(selectedHealthSummary$x, las=2,col="green", main="Top Ten Events that affect health",
        xlab="", ylab="", names.arg=selectedHealthSummary$Category)
mtext("Event", side=1, line = 10)
mtext("Number of Injuries and Fatalities", side=2, line=4)

Across the United States, which types of events are most harmful with respect to the economy

First the data is summarised

economicData$EVENT <- as.factor(economicData$EVENT)
economicSummary <- aggregate(economicData$economic, by=list(Category=economicData$EVENT), FUN=sum, na.rm=TRUE)
print(economicSummary)
##                      Category            x
## 1          HIGH SURF ADVISORY       200000
## 2                 FLASH FLOOD        50000
## 3      ASTRONOMICAL HIGH TIDE      9425000
## 4       ASTRONOMICAL LOW TIDE       320000
## 5                   AVALANCHE      3711800
## 6               BEACH EROSION       100000
## 7                    BLIZZARD    532718950
## 8                BLOWING DUST        20000
## 9                BLOWING SNOW        15000
## 10  COASTAL  FLOODING/EROSION     15000000
## 11            COASTAL EROSION       766000
## 12              COASTAL FLOOD    251400560
## 13           COASTAL FLOODING    103809000
## 14   COASTAL FLOODING/EROSION     20030000
## 15              COASTAL STORM        50000
## 16                       COLD       554000
## 17                  DAM BREAK      1002000
## 18                  DENSE FOG     22646500
## 19                DENSE SMOKE       100000
## 20                  DOWNBURST         2000
## 21                    DROUGHT  14413667000
## 22             DRY MICROBURST      1747600
## 23                 DUST DEVIL       663630
## 24                 DUST STORM      8574000
## 25                EARLY FROST     42000000
## 26         EROSION/CSTL FLOOD     16200000
## 27             EXCESSIVE HEAT    500125700
## 28             EXCESSIVE SNOW      1935000
## 29              EXTENDED COLD       100000
## 30    EXTREME COLD/WIND CHILL   1357776400
## 31                FLASH FLOOD  16557105610
## 32          FLASH FLOOD/FLOOD         5000
## 33                      FLOOD 148919611950
## 34          FLOOD/FLASH/FLOOD        10000
## 35           FREEZING DRIZZLE       105000
## 36              FREEZING RAIN       626000
## 37                      FROST        15000
## 38               FROST/FREEZE   1345441000
## 39               FUNNEL CLOUD       134100
## 40                      GLAZE       150000
## 41              GRADIENT WIND        37000
## 42                 GUSTY WIND       370000
## 43            GUSTY WIND/HAIL        20000
## 44        GUSTY WIND/HVY RAIN         2000
## 45            GUSTY WIND/RAIN         2000
## 46                GUSTY WINDS      1476000
## 47                       HAIL  17071172870
## 48                       HEAT      1696500
## 49                 HEAVY RAIN   1313034240
## 50       HEAVY RAIN/HIGH SURF     15000000
## 51                 HEAVY SNOW    705539640
## 52          HEAVY SNOW SHOWER        10000
## 53                 HEAVY SURF      1390000
## 54       HEAVY SURF/HIGH SURF      9870000
## 55                  HIGH SEAS        15000
## 56                  HIGH SURF     83904500
## 57                HIGH SWELLS         5000
## 58                  HIGH WIND   5881421660
## 59            HIGH WIND (G40)        18000
## 60                 HIGH WINDS       500000
## 61        HURRICANE (TYPHOON)  87068996810
## 62       ICE JAM FLOOD (MINOR         1000
## 63                  ICE ROADS        12000
## 64                  ICE STORM   3657908810
## 65                  ICY ROADS       331200
## 66           LAKE-EFFECT SNOW     40115000
## 67           LAKE EFFECT SNOW        67000
## 68            LAKESHORE FLOOD      7540000
## 69                  LANDSLIDE    344595000
## 70                 LANDSLIDES         5000
## 71                  LANDSLUMP       570000
## 72                  LANDSPOUT         7000
## 73           LATE SEASON SNOW       180000
## 74        LIGHT FREEZING RAIN       451000
## 75                 LIGHT SNOW      2513000
## 76             LIGHT SNOWFALL        85000
## 77                  LIGHTNING    749975520
## 78            MARINE ACCIDENT        50000
## 79                MARINE HAIL         4000
## 80           MARINE HIGH WIND      1297010
## 81         MARINE STRONG WIND       418330
## 82                 MICROBURST        20000
## 83        MIXED PRECIPITATION       790000
## 84                  MUD SLIDE       100100
## 85                   MUDSLIDE      1225000
## 86     NON-SEVERE WIND DAMAGE         5000
## 87                      OTHER      1089900
## 88                       RAIN       550000
## 89                RIP CURRENT       163000
## 90                RIVER FLOOD     22157000
## 91             RIVER FLOODING    134175000
## 92                 ROCK SLIDE       150000
## 93                 ROUGH SURF        10000
## 94                     SEICHE       980000
## 95                 SMALL HAIL     20863000
## 96                       SNOW      2554000
## 97                SNOW SQUALL        30000
## 98               SNOW SQUALLS        70000
## 99           STORM SURGE/TIDE  47835579000
## 100               STRONG WIND    239712950
## 101              STRONG WINDS      2234790
## 102         THUNDERSTORM WIND   8936445880
## 103            TIDAL FLOODING        13000
## 104                   TORNADO  24900370720
## 105       TROPICAL DEPRESSION      1737000
## 106            TROPICAL STORM   8320186550
## 107                   TSUNAMI    144082000
## 108         UNSEASONABLE COLD      5100000
## 109         UNSEASONABLY COLD     25042500
## 110         UNSEASONABLY WARM        10000
## 111           UNSEASONAL RAIN     10000000
## 112      URBAN/SML STREAM FLD     66797750
## 113              VOLCANIC ASH       500000
## 114                WATERSPOUT      5730200
## 115            WET MICROBURST        35000
## 116                 WHIRLWIND        12000
## 117                  WILDFIRE   8162704630
## 118                      WIND      2589500
## 119             WIND AND WAVE      1000000
## 120               WIND DAMAGE        10000
## 121              WINTER STORM   1544687250
## 122            WINTER WEATHER     35866000
## 123        WINTER WEATHER MIX        60000
## 124        WINTER WEATHER/MIX      6372000
## 125                WINTRY MIX        12500

Sort and Choose the top ten occurrences

economicSummarySorted <- arrange(economicSummary, desc(x))
selectedEconomicSummary <- economicSummarySorted[1:10,]
selectedEconomicSummary$y <- selectedEconomicSummary$x/10000000000

Plot the graph

The graph shows the top ten events that have an impact on the economy

par(mar=c(15,10,4,2)) 
mgp=c(2,5,1)
options(scipen=5)

barplot(selectedEconomicSummary$y, las=2,col="green", main="Top Ten Events that affect the economy",
        xlab="", ylab="", names.arg=selectedEconomicSummary$Category)
mtext("Event", side=1, line = 10)
mtext("Economic Effect (in 10,000,000,000 dollars)", side=2, line=4)