Impact of Weather Phenomena on Population Health and Economy

Synopsis

In this report we aim to determine which weather phenomena have caused the greatest amount of damage to population health and the economy in the United States. To determine this, we used the National Oceanic and Atmospheric Admisitration (NOAA) Storm Database, which has data on weather events in the United States from 1950 to November of 2011. In order to determine which phenomena had the greatest impact on population health, we looked at which event types caused the most injuries and fatalities. In order to determine which phenomena had the greatest impact on the economy, we looked at which event types caused the most property and crop damage. From the data, we found that in the US, Tornados have done the most damage to population health, and Floods have done the most damage to the economy.

Data Processing

The data for this analysis was obtained from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The storms in the database start in 1950 and end in November 2011. It should be noted that there are fewer records, and many more events for more recent years. This is fine, as it will weight the data so that more recent years are better represented in the data, which will reflect current climate conditions better. However, do keep in mind that because there is missing data from the early years, the sums obtained in this analysis should not be taken as total amounts of damage, but as minimum amounts of damage that set benchmarks for events relative to each other.

Reading the Data

First, we read in the storm data from the compressed csv file (.csv.bz2).

stormData<-read.csv("repdata_data_StormData.csv.bz2")

We check the dimensions of the stormData dataset.

dim(stormData)

## [1] 902297     37

We can see that there are 902297 observations on 37 variables.

For the purpose of this analysis, we will not be using all 37 variables, as many of them will not serve any purpose in our analysis. Therefore, we can take a subset of the stormData containing only the variables we need, BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, and CROPDMGEXP.

relevantVars<-c("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
subsetStormData<- stormData[relevantVars]

The event type (EVTYPE) data from the storm data is very sloppy. While the NOAA has made a listing of just 48 event types, this data set has 985 unique event types.

length(unique(subsetStormData$EVTYPE))

## [1] 985

There are many repeats and synonyms in this field. However, most of these event types do not have any injuries, fatalities, property damage, or crop damage associated with them. As these are the factors we are concerned with, it is impractical and computationally expensive to clean up all the EVTYPE data. Instead, the EVTYPES will be condensed after the data is simplified into which ones had any impact on population health and the economy. They will be simplified into just 14 different categories, as it looks like the NOAA classifications provide a little more granularity than is needed to see which event types are the most harmful (ex. Extreme Cold/Wind and Cold/Wind are separate NOAA categories). This will be done by categorizing EVTYPES that contain certain string fragments into 1 of the 14 classifications. After that, we will only be looking at the 10 with the most impact in each field.

The reason to do this is that this way we avoid spending time cleaning data that does not have any impact on our analysis, and we focus on the information that we actually need.

fragment<-c("AVALA","TORNAD", "THUNDERST","TSTM", "WINT", "FLOOD", "COLD", "SNOW","WIND", "ICE", "FOG", "FREEZ", "HURRICANE", "CURRENT", "HEAT")
full<-c("AVALANCHE","TORNADO", "THUNDERSTORM", "THUNDERSTORM", "WINTER WEATHER", "FLOOD", "COLD", "SNOW","STRONG WINDS", "ICE", "FOG", "FREEZING RAIN", "HURRICANE", "CURRENT", "EXCESSIVE HEAT")

Unlike Fatalities and Injuries, Property and Crop damage are not expressed in their pure form in the dataset. In dealing with Property Damage and Crop Damage, both are expressed in the original data set as numeric dollar values given a magnitude with either K (thousand), M(million), or B(billion).

unique(subsetStormData$PROPDMGEXP)

##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M

unique(subsetStormData$CROPDMGEXP)

## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M

If we look at the unique values in columns PROPDMGEXP and CROPDMGEXP, we can see that apart from K, M, and B, there are invalid “garbage” values present as well. We can eliminate all values that will not be relevant to our analysis by subsetting only nonzero values of property and crop damage. Then, we subset records with valid magnitude indicators.

propertyStormData<-subset(subsetStormData, subsetStormData$PROPDMG>0)
propertyStormData<-subset(propertyStormData, 
                          propertyStormData$PROPDMGEXP == "B" | propertyStormData$PROPDMGEXP == "M" | propertyStormData$PROPDMGEXP == "K")

cropStormData<-subset(subsetStormData, subsetStormData$CROPDMG>0)
cropStormData<-subset(cropStormData, 
                          cropStormData$CROPDMGEXP == "B" | cropStormData$CROPDMGEXP == "M" | cropStormData$CROPDMGEXP == "m" | cropStormData$CROPDMGEXP == "K")

Next, we loop through both PROPDMG and CROPDMG and multiply each value by either a thousand, a million, or a billion depending on its magnitude indicator.

for(i in 1:length(propertyStormData$PROPDMG)){
        if(propertyStormData$PROPDMGEXP[i] == "B"){
                propertyStormData$PROPDMG[i] <- (propertyStormData$PROPDMG[i]*1000000000)
        }
        if(propertyStormData$PROPDMGEXP[i] == "M"){
                propertyStormData$PROPDMG[i] <- (propertyStormData$PROPDMG[i]*1000000)
        }
        if(propertyStormData$PROPDMGEXP[i] == "K"){
                propertyStormData$PROPDMG[i] <- (propertyStormData$PROPDMG[i]*1000)
        }
}

for(i in 1:length(cropStormData$CROPDMG)){
        if(cropStormData$CROPDMGEXP[i] == "B"){
                cropStormData$CROPDMG[i] <- (cropStormData$CROPDMG[i]*1000000000)
        }
        if(cropStormData$CROPDMGEXP[i] == "M" | cropStormData$CROPDMGEXP[i] == "m"){
                cropStormData$CROPDMG[i] <- (cropStormData$CROPDMG[i]*1000000)
        }
        if(cropStormData$CROPDMGEXP[i] == "K"){
                cropStormData$CROPDMG[i] <- (cropStormData$CROPDMG[i]*1000)
        }
}

Results

Impact of Weather Phenomena on Population Health

Weather Induced Fatalities in the US

To calculate the number of fatalities caused by any one event type, we split up the storm data by FATALITIES in each EVTYPE and sum the FATALITIES within each EVTYPE. As we are not concerned with events that did not cause any fatalities, we can remove them from the data.

fatalitiesByEVTYPE<-split(subsetStormData$FATALITIES,subsetStormData$EVTYPE)
fatalitiesByEVTYPE<-sapply(fatalitiesByEVTYPE, sum, na.rm = TRUE)
fatalitiesByEVTYPE<-as.data.frame(fatalitiesByEVTYPE)
fatalitiesByEVTYPE$EVTYPE<-row.names(fatalitiesByEVTYPE)
names(fatalitiesByEVTYPE)<-c("FATALITIES", "EVTYPE")
fatalitiesByEVTYPE<-fatalitiesByEVTYPE[order(-fatalitiesByEVTYPE$FATALITIES),]
fatalitiesByEVTYPE <- subset(fatalitiesByEVTYPE, fatalitiesByEVTYPE$FATALITIES>0)

Now that we have a list of EVTYPEs and the sum of FATALITIES for each type, we must combine similar/redundant EVTYPEs. We do this here instead of in the preprocessing as it is much less computationally expensive, and only filters the data relevant to our analysis.

for(i in 1:length(fragment)){
        temp<-grepl(fragment[i], fatalitiesByEVTYPE$EVTYPE, ignore.case = TRUE)
        for(j in 1:length(fatalitiesByEVTYPE$EVTYPE)){
                if(temp[j]==TRUE){
                        fatalitiesByEVTYPE$EVTYPE[j]<-full[i] 
                }
        }
}

Now that we have reduced the variety of EVTYPEs, we must once again sum the data by FATALITIES per each EVTYPE.

fatalitiesByEVTYPE<-split(fatalitiesByEVTYPE$FATALITIES,fatalitiesByEVTYPE$EVTYPE)
fatalitiesByEVTYPE<-sapply(fatalitiesByEVTYPE, sum, na.rm = TRUE)
fatalitiesByEVTYPE<-as.data.frame(fatalitiesByEVTYPE)
fatalitiesByEVTYPE$EVTYPE<-row.names(fatalitiesByEVTYPE)
names(fatalitiesByEVTYPE)<-c("FATALITIES", "EVTYPE")
fatalitiesByEVTYPE<-fatalitiesByEVTYPE[order(-fatalitiesByEVTYPE$FATALITIES),]
row.names(fatalitiesByEVTYPE)<-NULL

We finally have an ordered list giving us which weather phenomena have caused the greatest number of fatalities within the US between 1950 and November of 2011. As we are primarily concerned with what is causing the most damage, we will view the top 10 most damaging event types

top10Fatalities<-head(fatalitiesByEVTYPE, 10)
top10Fatalities

##    FATALITIES         EVTYPE
## 1        5661        TORNADO
## 2        3138 EXCESSIVE HEAT
## 3        1525          FLOOD
## 4         816      LIGHTNING
## 5         729   THUNDERSTORM
## 6         577        CURRENT
## 7         471   STRONG WINDS
## 8         451           COLD
## 9         279 WINTER WEATHER
## 10        225      AVALANCHE

As we can see, the top three weather event types that have caused the most fatalities are Tornados (5661 fatalities), Excessive Heat (3138 fatalities), and Floods (1525 fatalities). Following these three are Lightning, Thunderstorms, and Currents which killed between 500 and 1000 people, and Strong Winds, Cold, Winter Weather, and Avalanches which killed less than 500 people.

Weather Induced Injuries in the US

To calculate the number of injuries caused by any one event type, we use the same process we did to calculate fatalities, except we use data from the INJURIES column instead. We split up the storm data by INJURIES in each EVTYPE and sum the INJURIES within each EVTYPE, we remove events that did not cause any injuries, combine similar/redundat EVTYPES, and sum the INJURIES by EVTYPE again.

injuriesByEVTYPE<-split(subsetStormData$INJURIES,subsetStormData$EVTYPE)
injuriesByEVTYPE<-sapply(injuriesByEVTYPE, sum, na.rm = TRUE)
injuriesByEVTYPE<-as.data.frame(injuriesByEVTYPE)
injuriesByEVTYPE$EVTYPE<-row.names(injuriesByEVTYPE)
names(injuriesByEVTYPE)<-c("INJURIES", "EVTYPE")
injuriesByEVTYPE<-injuriesByEVTYPE[order(-injuriesByEVTYPE$INJURIES),]
injuriesByEVTYPE <- subset(injuriesByEVTYPE, injuriesByEVTYPE$INJURIES>0)

for(i in 1:length(fragment)){
        temp<-grepl(fragment[i], injuriesByEVTYPE$EVTYPE, ignore.case = TRUE)
        for(j in 1:length(injuriesByEVTYPE$EVTYPE)){
                if(temp[j]==TRUE){
                        injuriesByEVTYPE$EVTYPE[j]<-full[i] 
                }
        }
}

injuriesByEVTYPE<-split(injuriesByEVTYPE$INJURIES,injuriesByEVTYPE$EVTYPE)
injuriesByEVTYPE<-sapply(injuriesByEVTYPE, sum, na.rm = TRUE)
injuriesByEVTYPE<-as.data.frame(injuriesByEVTYPE)
injuriesByEVTYPE$EVTYPE<-row.names(injuriesByEVTYPE)
names(injuriesByEVTYPE)<-c("INJURIES", "EVTYPE")
injuriesByEVTYPE<-injuriesByEVTYPE[order(-injuriesByEVTYPE$INJURIES),]
row.names(injuriesByEVTYPE)<-NULL

We once again are mainly concerned with which events are causing the most damage to population health, so we look at the top 10 injury causing events.

top10Injuries<-head(injuriesByEVTYPE, 10)
top10Injuries

##    INJURIES         EVTYPE
## 1     91407        TORNADO
## 2      9544   THUNDERSTORM
## 3      9224 EXCESSIVE HEAT
## 4      8604          FLOOD
## 5      5230      LIGHTNING
## 6      2152            ICE
## 7      1968 WINTER WEATHER
## 8      1896   STRONG WINDS
## 9      1361           HAIL
## 10     1328      HURRICANE

Tornados take the number 1 spot for the most injuries caused by a large margin, having caused 91407 injuries. The next most injurious events are Thunderstorms (9544), Excessive Heat (9224), Flood (8608), Lightning (5230), and Ice (2152). The remaining events, Winter Weather, Strong Winds, Hail, and Hurricanes all caused less than 2000 injuries.

Overall Weather Induced Population Health Damage in the US

From our previous results, we can see that:

Most Fatalities = Tornados (5661)

Most Injuries = Tornados (91407)

It is clear from this that Tornados are the most harmful event type with respect to population health.

However, if we wish to see which other event type causes the next most health damage, we should combine the fatality and injury amounts of the top 10 fatality and injury causing events. We must keep in mind that fatalities are of course more severe than an injury. However, for the sake of analysis, we will be giving both fatalities and injuries an equal weight.

names(top10Fatalities)<-c("DMG", "EVTYPE")
names(top10Injuries)<-c("DMG", "EVTYPE")
allHealth<-rbind(top10Fatalities, top10Injuries)

Just as we have done previously, we can now consolidate EVTYPEs and generate the list of top 10 health damaging event types.

for(i in 1:length(fragment)){
        temp<-grepl(fragment[i], allHealth$EVTYPE, ignore.case = TRUE)
        for(j in 1:length(allHealth$EVTYPE)){
                if(temp[j]==TRUE){
                        allHealth$EVTYPE[j]<-full[i] 
                }
        }
}
allHealth<-split(allHealth$DMG,allHealth$EVTYPE)
allHealth<-sapply(allHealth, sum, na.rm = TRUE)
allHealth<-as.data.frame(allHealth)
allHealth$EVTYPE<-row.names(allHealth)
names(allHealth)<-c("DMG", "EVTYPE")
allHealth<-allHealth[order(-allHealth$DMG),]
row.names(allHealth)<-NULL

top10allHealth<-head(allHealth, 10)
top10allHealth

##      DMG         EVTYPE
## 1  97068        TORNADO
## 2  12362 EXCESSIVE HEAT
## 3  10273   THUNDERSTORM
## 4  10129          FLOOD
## 5   6046      LIGHTNING
## 6   2367   STRONG WINDS
## 7   2247 WINTER WEATHER
## 8   2152            ICE
## 9   1361           HAIL
## 10  1328      HURRICANE

barplot(top10allHealth$DMG, names.arg = top10allHealth$EVTYPE, las=2, ylab = "People Hurt/Killed", main = "Overall Weather Damage to Population Health", cex.names=0.5, cex.axis=0.5)

As can be seen in the plot, Tornados are by far the most damaging event type to population health. The other events in the top 10 are still harmful to people, but do not pose a threat nearly as great as tornados do in the US.

Impact of Weather Phenomena on Economy

Weather Induced Property Damage in the US

During the preprocessing phase, we eliminated the need for the PROPDMGEXP variable and converted the values in the PROPDMG field to their full numeric value. We use the same method we have used to calculate total injuries and fatalities to calculate the total amount of property damage per event type.

pdmgByEVTYPE<-split(propertyStormData$PROPDMG,propertyStormData$EVTYPE)
pdmgByEVTYPE<-sapply(pdmgByEVTYPE, sum, na.rm = TRUE)
pdmgByEVTYPE<-as.data.frame(pdmgByEVTYPE)
pdmgByEVTYPE$EVTYPE<-row.names(pdmgByEVTYPE)
names(pdmgByEVTYPE)<-c("PROPDMG", "EVTYPE")
pdmgByEVTYPE<-pdmgByEVTYPE[order(-pdmgByEVTYPE$PROPDMG),]
for(i in 1:length(fragment)){
        temp<-grepl(fragment[i], pdmgByEVTYPE$EVTYPE, ignore.case = TRUE)
        for(j in 1:length(pdmgByEVTYPE$EVTYPE)){
                if(temp[j]==TRUE){
                        pdmgByEVTYPE$EVTYPE[j]<-full[i] 
                }
        }
}
pdmgByEVTYPE<-split(pdmgByEVTYPE$PROPDMG,pdmgByEVTYPE$EVTYPE)
pdmgByEVTYPE<-sapply(pdmgByEVTYPE, sum, na.rm = TRUE)
pdmgByEVTYPE<-as.data.frame(pdmgByEVTYPE)
pdmgByEVTYPE$EVTYPE<-row.names(pdmgByEVTYPE)
names(pdmgByEVTYPE)<-c("PROPDMG", "EVTYPE")
pdmgByEVTYPE<-pdmgByEVTYPE[order(-pdmgByEVTYPE$PROPDMG),]
row.names(pdmgByEVTYPE)<-NULL

We once again are mainly concerned with which events are causing the most damage to property, so we look at the top 10 property damaging events.

top10pdmg<-head(pdmgByEVTYPE, 10)
top10pdmg

##         PROPDMG         EVTYPE
## 1  167529215320          FLOOD
## 2   84636180010      HURRICANE
## 3   58581597730        TORNADO
## 4   43323536000    STORM SURGE
## 5   15727366720           HAIL
## 6   10973619030   THUNDERSTORM
## 7    7703890550 TROPICAL STORM
## 8    6777307750 WINTER WEATHER
## 9    6185735990   STRONG WINDS
## 10   4765114000       WILDFIRE

We can see from the results that the event type most damaging to property is Floods at an astounding $167,529,215,320, nearly double the next highest event Hurricanes ($84,636,180,010). These are followed by Tornados ($58,581,597,730), Storm Surges ($43,323,536,000), Hail ($15,727,366,720), and Thunderstorms($10,973,619,030), which did significant damage, but not quite as much as individual event types. The remaining events, Tropical Storms, Winter Weather, Strong Winds, and Wildfire all did less than $10,000,000,000 in property damage. While these amounts seem very high, do keep in mind that this is the amount of damage done over the course of many decades.

Weather Induced Property Damage in the US

During the preprocessing phase, we eliminated the need for the CROPDMGEXP variable and converted the values in the CPROPDMG field to their full numeric value. We use the same method we have used to calculate total injuries and fatalities to calculate the total amount of crop damage per event type.

cdmgByEVTYPE<-split(cropStormData$CROPDMG,cropStormData$EVTYPE)
cdmgByEVTYPE<-sapply(cdmgByEVTYPE, sum, na.rm = TRUE)
cdmgByEVTYPE<-as.data.frame(cdmgByEVTYPE)
cdmgByEVTYPE$EVTYPE<-row.names(cdmgByEVTYPE)
names(cdmgByEVTYPE)<-c("CROPDMG", "EVTYPE")
cdmgByEVTYPE<-cdmgByEVTYPE[order(-cdmgByEVTYPE$CROPDMG),]
for(i in 1:length(fragment)){
        temp<-grepl(fragment[i], cdmgByEVTYPE$EVTYPE, ignore.case = TRUE)
        for(j in 1:length(cdmgByEVTYPE$EVTYPE)){
                if(temp[j]==TRUE){
                        cdmgByEVTYPE$EVTYPE[j]<-full[i] 
                }
        }
}
cdmgByEVTYPE<-split(cdmgByEVTYPE$CROPDMG,cdmgByEVTYPE$EVTYPE)
cdmgByEVTYPE<-sapply(cdmgByEVTYPE, sum, na.rm = TRUE)
cdmgByEVTYPE<-as.data.frame(cdmgByEVTYPE)
cdmgByEVTYPE$EVTYPE<-row.names(cdmgByEVTYPE)
names(cdmgByEVTYPE)<-c("CROPDMG", "EVTYPE")
cdmgByEVTYPE<-cdmgByEVTYPE[order(-cdmgByEVTYPE$CROPDMG),]
row.names(cdmgByEVTYPE)<-NULL

We once again are mainly concerned with which events are causing the most damage to crops, so we look at the top 10 crop damaging events.

top10cdmg<-head(cdmgByEVTYPE, 10)
top10cdmg

##        CROPDMG         EVTYPE
## 1  13972566000        DROUGHT
## 2  12380069100          FLOOD
## 3   5505292800      HURRICANE
## 4   5022114300            ICE
## 5   3025537450           HAIL
## 6   1889061000  FREEZING RAIN
## 7   1416765500           COLD
## 8   1271704900   THUNDERSTORM
## 9    904469280 EXCESSIVE HEAT
## 10   777875550   STRONG WINDS

We can see that Drought ($13,972,566,000) and Floods ($12,380,069,100) cause the most damage to crops. These are followed by Hurricanes ($5,505,292,800), Ice ($5,022,114,300), Hail ($3,025,537,450), Freezing Rain ($1,889,061,000), Cold ($1,416,765,500), and Thunderstorms ($1,271,704,900), which did significant damage, but not quite as much as individual event types. The remaining events, Excessive Heat and Strong Winds, both did less than $1,000,000,000 in crop damage. Once again, we need to remember that while these amounts seem very high, this is the amount of damage done over the course of many decades.

Overall Weather Induced Economy Damage in the US

From our previous results, we can see that:

Highest Property Damage = Floods ($167,529,215,320)

Highest Crop Damage = Drought ($13,972,566,000)

However, if we wish to see which event type causes the most damage and has the greatest impact on the economy, we should combine the dollar amounts of the top 10 Property and Crop damaging events.

names(top10pdmg)<-c("DMG", "EVTYPE")
names(top10cdmg)<-c("DMG", "EVTYPE")
allDmg<-rbind(top10pdmg, top10cdmg)

Just as we have done previously, we can now consolidate EVTYPEs and generate the list of top 10 financially damaging event types.

for(i in 1:length(fragment)){
        temp<-grepl(fragment[i], allDmg$EVTYPE, ignore.case = TRUE)
        for(j in 1:length(allDmg$EVTYPE)){
                if(temp[j]==TRUE){
                        allDmg$EVTYPE[j]<-full[i] 
                }
        }
}
allDmg<-split(allDmg$DMG,allDmg$EVTYPE)
allDmg<-sapply(allDmg, sum, na.rm = TRUE)
allDmg<-as.data.frame(allDmg)
allDmg$EVTYPE<-row.names(allDmg)
names(allDmg)<-c("DMG", "EVTYPE")
allDmg<-allDmg[order(-allDmg$DMG),]
row.names(allDmg)<-NULL

top10alldmg<-head(allDmg, 10)
top10alldmg

##             DMG         EVTYPE
## 1  179909284420          FLOOD
## 2   90141472810      HURRICANE
## 3   58581597730        TORNADO
## 4   43323536000    STORM SURGE
## 5   18752904170           HAIL
## 6   13972566000        DROUGHT
## 7   12245323930   THUNDERSTORM
## 8    7703890550 TROPICAL STORM
## 9    6963611540   STRONG WINDS
## 10   6777307750 WINTER WEATHER

top10alldmg$DMG<-(top10alldmg$DMG/1000000000)
barplot(top10alldmg$DMG, names.arg = top10alldmg$EVTYPE, las=2, ylab = "Damage (Billions of $)", main = "Overall Weather Damage to Economy", cex.names=0.5, cex.axis=0.5)

As can be seen in the plot, Floods are by far the most damaging event type to the economy. Hurricanes and Tornados also cause a lot of damage, although not as much as Floods. The other events listed have caused a lot of damage over the years, but not nearly as much as floods, hurricane, or tornados.

Reproducible Research Peer Assessment 2

Impact of Weather Phenomena on Population Health and Economy

Synopsis

Data Processing

Reading the Data

Results

Impact of Weather Phenomena on Population Health

Weather Induced Fatalities in the US

Weather Induced Injuries in the US

Overall Weather Induced Population Health Damage in the US

Impact of Weather Phenomena on Economy

Weather Induced Property Damage in the US

Weather Induced Property Damage in the US

Overall Weather Induced Economy Damage in the US