Weather Events Causing The Greatest Impact on Public Health and The Economy From 1999 to 2011

Synopsis

This report tends to describe severe weather events which caused the greatest negative impact on the economy and public health across the united states between the years 1996 and 2011. The data for this project is obtained from the U.S. National Oceanic and Atmospheric Administration’s(NOAA) storm database. The dataset of concern/interest for us is the one showing the estimates of any fatalities, injuries and property and crop damage. The data obtained was specific from 1996 to 2011 because multiple weather patterns begun being recorded as from this date. From the analysis of the data, we identified the top ten weather event types from each variable of interest and the result is summarised in graphical form.

Loading and Processing the Raw Data

The data is obtained from the NOAA database, and we specifically obtained the 1996 to 2011 data files. This data contains the characteristics of major storms and weather events in the United States, as well as estimates of any fatalities, injuries, and property/crop damage.

Reading in the data

The data is in the form of .CSV file compressed via the bzip2 algorithm to reduce its size. Download the file into your working directory and unzip it ready for analysis.

stormdata <- read.csv("repdata_data_StormData.csv.bz2")
dim(stormdata)
## [1] 902297     37

Check the first few rows to get a feel of the dataset. (There are 902297 records on 37 variables). We convert the variable names to lower case for easier handling of the data.

head(stormdata)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6
names(stormdata) <- tolower(names(stormdata))
names(stormdata)
##  [1] "state__"    "bgn_date"   "bgn_time"   "time_zone"  "county"    
##  [6] "countyname" "state"      "evtype"     "bgn_range"  "bgn_azi"   
## [11] "bgn_locati" "end_date"   "end_time"   "county_end" "countyendn"
## [16] "end_range"  "end_azi"    "end_locati" "length"     "width"     
## [21] "f"          "mag"        "fatalities" "injuries"   "propdmg"   
## [26] "propdmgexp" "cropdmg"    "cropdmgexp" "wfo"        "stateoffic"
## [31] "zonenames"  "latitude"   "longitude"  "latitude_e" "longitude_"
## [36] "remarks"    "refnum"

Data Processing

We will subset the dataset to the variables which we are interested in.

stormdata1 <- stormdata[, c(2,8,23:28)]
head(stormdata1)
##             bgn_date  evtype fatalities injuries propdmg propdmgexp
## 1  4/18/1950 0:00:00 TORNADO          0       15    25.0          K
## 2  4/18/1950 0:00:00 TORNADO          0        0     2.5          K
## 3  2/20/1951 0:00:00 TORNADO          0        2    25.0          K
## 4   6/8/1951 0:00:00 TORNADO          0        2     2.5          K
## 5 11/15/1951 0:00:00 TORNADO          0        2     2.5          K
## 6 11/15/1951 0:00:00 TORNADO          0        6     2.5          K
##   cropdmg cropdmgexp
## 1       0           
## 2       0           
## 3       0           
## 4       0           
## 5       0           
## 6       0

Our dataset will only include data from 1996 to 2011. Therefore we will subset our data begining Jan, 1996, by first transforming the bgn_date variable to Date form/class using the POSIXct() function as follows.

class(stormdata1$bgn_date)
## [1] "factor"
stormdata1$bgn_date <- as.POSIXct(strptime(stormdata1$bgn_date, "%m/%d/%Y"))
stormdata2 <- subset(stormdata1, stormdata1$bgn_date > "1995-12-31", select = c(2:8))
dim(stormdata2)
## [1] 653530      7

The dataset now has 653530 records of 7 variables that will be used for analysis. Now we will process and analyse the data inorder to obtain the total population health impact and economic impact. The data will be analysed in relation to the weather event type

Population health impact data processing

# aggregate data by event type
fatal <- aggregate(fatalities ~ evtype, stormdata2, sum)
injury <- aggregate(injuries ~ evtype, stormdata2, sum)

Economic impact data processing

# property damage data
stormdata2$propdmgexp <- gsub("\\ |\\-|\\+|\\?", "0", stormdata2$propdmgexp)
stormdata2$propdmgexp <- gsub("[Hh]", "2", stormdata2$propdmgexp)
stormdata2$propdmgexp <- gsub("[Mm]", "3", stormdata2$propdmgexp)
stormdata2$propdmgexp <- gsub("[Kk]", "6", stormdata2$propdmgexp)
stormdata2$propdmgexp <- gsub("[Bb]", "9", stormdata2$propdmgexp)
stormdata2$propdmgexp <- as.numeric(stormdata2$propdmgexp)
head(stormdata2$propdmgexp)
## [1]  6  6  6  6  6 NA
sum(is.na(stormdata2$propdmgexp))
## [1] 276185
stormdata2$propdmgexp[is.na(stormdata2$propdmgexp)] <- 0

# compute total property damage value
stormdata2$total_propdmg <- stormdata2$propdmg * (10 ^ stormdata2$propdmgexp)

# crop damage data
stormdata2$cropdmgexp <- gsub("\\ |\\-|\\+|\\?", "0", stormdata2$cropdmgexp)
stormdata2$cropdmgexp <- gsub("[Hh]", "2", stormdata2$cropdmgexp)
stormdata2$cropdmgexp <- gsub("[Mm]", "3", stormdata2$cropdmgexp)
stormdata2$cropdmgexp <- gsub("[Kk]", "6", stormdata2$cropdmgexp)
stormdata2$cropdmgexp <- gsub("[Bb]", "9", stormdata2$cropdmgexp)
stormdata2$cropdmgexp <- as.numeric(stormdata2$cropdmgexp)
head(stormdata2$cropdmgexp)
## [1]  6 NA NA NA NA NA
sum(is.na(stormdata2$cropdmgexp))
## [1] 373069
stormdata2$cropdmgexp[is.na(stormdata2$cropdmgexp)] <- 0

# compute total crop damage value
stormdata2$total_cropdmg <- stormdata2$cropdmg * (10 ^ stormdata2$cropdmgexp)

## aggregate the economic impact data by event type
propdmg <- aggregate(total_propdmg ~ evtype, stormdata2, sum)
cropdmg <- aggregate(total_cropdmg ~ evtype, stormdata2, sum)

Result 1

Across the United States, whic types of events are most harmful with respect to population health?

# get top10 event with highest fatalities
fatal10 <- fatal[order(-fatal$fatalities), ][1:10, ]
print(fatal10)
##             evtype fatalities
## 81  EXCESSIVE HEAT       1797
## 426        TORNADO       1511
## 98     FLASH FLOOD        887
## 224      LIGHTNING        651
## 102          FLOOD        414
## 300    RIP CURRENT        340
## 434      TSTM WIND        241
## 147           HEAT        237
## 177      HIGH WIND        235
## 16       AVALANCHE        223
# get top10 event with highest injuries
injury10 <- injury[order(-injury$injuries), ][1:10, ]
print(injury10)
##                evtype injuries
## 426           TORNADO    20667
## 102             FLOOD     6758
## 81     EXCESSIVE HEAT     6391
## 224         LIGHTNING     4141
## 434         TSTM WIND     3629
## 98        FLASH FLOOD     1674
## 421 THUNDERSTORM WIND     1400
## 507      WINTER STORM     1292
## 185 HURRICANE/TYPHOON     1275
## 147              HEAT     1222
## plot a graph showing the results
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(fatal10$fatalities, las = 3, names.arg = fatal10$evtype, main = "Weather Events With The Top 10 Highest Fatalities", 
    ylab = "number of fatalities", col = "red")
barplot(injury10$injuries, las = 3, names.arg = injury10$evtype, main = "Weather Events With the Top 10 Highest Injuries", 
    ylab = "number of injuries", col = "blue")

From the plots above, we see that Excessive Heat and Tornado are the leading event types causing most fatalities and Tornado and floods cause most injuries to the populations across the U.S.

Result 2

Across the United States, which types of events have the greatest economic consequences?

# get top 10 events with highest property damage
propdmg10 <- propdmg[order(-propdmg$total_propdmg), ][1:10, ]
propdmg10
##                evtype total_propdmg
## 434         TSTM WIND  1.327560e+12
## 98        FLASH FLOOD  1.235587e+12
## 426           TORNADO  1.175044e+12
## 102             FLOOD  9.266942e+11
## 421 THUNDERSTORM WIND  8.597370e+11
## 142              HAIL  5.648957e+11
## 224         LIGHTNING  4.883073e+11
## 177         HIGH WIND  3.127640e+11
## 507      WINTER STORM  1.255047e+11
## 161        HEAVY SNOW  8.884809e+10
# get top 10 events with highest crop damage
cropdmg10 <- cropdmg[order(-cropdmg$total_cropdmg), ][1:10, ]
cropdmg10
##                evtype total_cropdmg
## 142              HAIL  496361429670
## 98        FLASH FLOOD  159892875010
## 102             FLOOD  147003227780
## 434         TSTM WIND  108665795250
## 426           TORNADO   89935203490
## 421 THUNDERSTORM WIND   66331332000
## 63            DROUGHT   21958346620
## 177         HIGH WIND   16651916910
## 154        HEAVY RAIN   10260517910
## 122      FROST/FREEZE    5947088140
## plot a graph showing the results
barplot(propdmg10$total_propdmg, las = 3, names.arg = propdmg10$evtype, main = "Events With Highest Property Damage", 
    ylab = "Total Property damage", col = "red")

barplot(cropdmg10$total_cropdmg, las = 3, names.arg = cropdmg10$evtype, main = "Events With Highest Crop damage", 
    ylab = "Total Crop damage", col = "blue")

The plots above show that TSTM wind and Flash floods caused the highest property damage, while Hail and Flash floods caused the highest Crop damages.