The platform specification used:

Spec Description
OS Windows 10 Pro - 64 bit
CPU AMD Ryzen 5 - 3400G
RAM 16GB DDR4 3000MHz
Storage 500GB SSD - M.2 NVMe (PCIe)

Severe Weather Events in USA Impact on Health and Economy

Sypnosis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The database between 1950 and 2011 can be downloaded at the following link: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

An additional database documentation on how some of the variables are constructed/defined can be found here: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

This report will use the NOAA Storm Database to answer the following two questions:

  1. Across the United States, which types of events are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

The result with graphs and conclusion will be displayed in the last part.


Data Processing

To process the data we will be dividing this section into three seperate processes.

Firstly we will need to download the data directly from the link provided. Once the data is downloaded we will look at the at different variables needed to solve the two questions on the impact on health and economy.

Secondly to solve the question on the health impact there are two variables that we need to analyse namely FATALITIES and INJURIES. This will be elaborated further in the processes below

Thirdly to solve the question on economic impact there are four variables that we need to analyse namely CROPDMGEXP, CROPDMG, PROPDMGEXP and PROPDMG. ‘CROPDMGEXP’ is the exponent values for ‘CROPDMG’ (crop damage). In the same way, ‘PROPDMGEXP’ is the exponent values for ‘PROPDMG’ (property damage). Both are needed to get the total values for crops and property damage. (B or b = Billion, M or m = Million, K or k = Thousand, H or h = Hundred). This will be elaborated further in the processes below

Part 1:

  • Download data from the link provided in the sypnosis.
  • Read csv bz2 file and assign it to variable named data.
  • Use Dim function to check the number of observation and variables.
## Set the URL
url <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'  

## Download the file and checked that the bz2 file is 46.8 MB (49,177,144 bytes) according to the info in week 4 forum discussion.
download.file(url, destfile='repdata-data-StormData.csv.bz2', mode='wb')  

## Read the csv bz file and assign to data.
rawData <- read.csv("repdata-data-StormData.csv.bz2", header = TRUE, sep = ",", na.strings = "NA")

## Look at the total number of observations and variables whether it is the same as the Week 4 Forum discussion which is 902297 records and 37 variables.
dim(rawData)

[1] 902297 37

Based on the raw data downloaded there are 902297 observations and 37 variables.


As there are close to a million observations in the raw data, it is important that we do not waste any unnecessary resources to process the data.Therefore we will need to Prepare only the relevant data to answer the questions on health and economic impact.

## Create a Subset of FATALITIES or INJURIES data that contains more than 0.
healthData <- rawData[(rawData$FATALITIES >0) | (rawData$INJURIES > 0),] 
dim(healthData)

[1] 21929 37

## Create a Subset of CROPDMG or PROPDMG data that contains more than 0.
economicData <- rawData[(rawData$CROPDMG > 0) | (rawData$PROPDMG > 0),]
dim(economicData)

[1] 245031 37

The relevant health data contains 21929 observations and 37 variables where as the relevant economic data contains 245031 observations and 37 variables.


Part 2:

  • Use the Aggregate and CBIND function to get the total number of FATALITIES and INJURIES based on the EVTYPE.
  • Sum up the FATALITIES + INJURIES based on the EVTYPE.
  • Use the MERGE function to combine these two data together.
## Using Aggregate and CBIND to sum the total FATALITIES and INJURIES based on EVTYPE and assign it to fatInj
fatInj <- aggregate(cbind(FATALITIES,INJURIES)~EVTYPE,data = healthData, FUN = sum, na.rm = TRUE)

## Assign the column with names for the new table.
names(fatInj) <- c("EVENT_TYPE", "FATALITIES", "INJURIES")

## Rank the top INJURIES based on EVTYPE
injuriesRank <- fatInj[order(-fatInj$INJURIES), ]

## Rank the top FATALITIES based on EVTYPE
fatalitiesRank <- fatInj[order(-fatInj$FATALITIES), ]

## Top 10 INJURIES causing EVTYPE
topInjuries <- kable(injuriesRank[1:10,])
topInjuries
EVENT_TYPE FATALITIES INJURIES
184 TORNADO 5633 91346
191 TSTM WIND 504 6957
47 FLOOD 470 6789
32 EXCESSIVE HEAT 1903 6525
123 LIGHTNING 816 5230
69 HEAT 937 2100
117 ICE STORM 89 1975
42 FLASH FLOOD 978 1777
173 THUNDERSTORM WIND 133 1488
67 HAIL 15 1361
## Top 10 FATALITIES causing EVTYPE
topFatalities <- kable(fatalitiesRank[1:10,])
topFatalities
EVENT_TYPE FATALITIES INJURIES
184 TORNADO 5633 91346
32 EXCESSIVE HEAT 1903 6525
42 FLASH FLOOD 978 1777
69 HEAT 937 2100
123 LIGHTNING 816 5230
191 TSTM WIND 504 6957
47 FLOOD 470 6789
147 RIP CURRENT 368 232
93 HIGH WIND 248 1137
2 AVALANCHE 224 170
## Using Aggregate function to sum the FATALITIES + INJURIES based on EVTYPE and assign it to totalImpact
totalImpact <- aggregate((FATALITIES + INJURIES) ~ EVTYPE, data = healthData, FUN = sum, na.rm = TRUE)

## Assign the column with name the new table of the total FATALITIES and INJURIES combined.
names(totalImpact) <- c("EVENT_TYPE", "Total_FATALITIES_&_INJURIES")

totalImpact = totalImpact[order(-totalImpact$`Total_FATALITIES_&_INJURIES`), ]
## Using the Merge function to combine the two data together to show the health impact
healthImpact <- merge(fatInj, totalImpact)

## Rank the top health impact on the new combined data
healthImpact <- healthImpact[order(-healthImpact$`Total_FATALITIES_&_INJURIES`), ]

## Top 10 Total Health Impact (FATALITIES + INJURIES) causing EVTYPE
topHealthImpact <- kable(healthImpact[1:10,])

## Please go to the Result section to look at the top 10 health impact based on EVTYPE.
healthPlot <- ggplot(totalImpact[1:10, ], aes(x= reorder('EVENT_TYPE', 'Total_FATALITIES_&_INJURIES'), y='Total_FATALITIES_&_INJURIES')) + geom_bar(stat = "identity",fill="blue") + coord_flip() + labs(x = "Event types", y = "Fatalities & Injuries", title = "Top 10 Fatalities & Injuries Weather Events")

healthPlot
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA


Part 3:

  • Convert exponent value DMGEXP of both the crop and property for total damage comparison
  • Use the Aggregate and CBIND function to get the total number of the converted damage of both crop and property based on the EVTYPE.
  • Combine both the new crop and property damage to assess the total economic impact
## Retrieving values of exponents 
expData <- economicData[economicData$PROPDMGEXP %in% c("", "K", "M", "B") & economicData$CROPDMGEXP %in% c("", "K", "M", "B"), ]

## Create function to convert exponent values to calculate total damages using DMG * Exponent
convExponent <- function(dmg, exp) {
    if (exp == "K") {
        dmg * 1000
    } else if (exp == "M") {
        dmg * 10^6
    } else if (exp == "B") {
        dmg * 10^9
    } else if (exp == "") {
        dmg
    } else {
        stop("NOT VALID DATA")
    }
}

## Using MAPPLY function to get the new property damage value in million dollars value
expData$PROP_DMG <- mapply(convExponent, expData$PROPDMG, expData$PROPDMGEXP)/10^6

## Using MAPPLY function to get the new crop damage value in million dollar value
expData$CROP_DMG <- mapply(convExponent, expData$CROPDMG, expData$CROPDMGEXP)/10^6

## Using Aggregate and CBIND to sum the total property and crop damage based on EVTYPE
cropProp <- aggregate(cbind(expData$PROP_DMG, expData$CROP_DMG)~EVTYPE,data = expData, FUN = sum, na.rm = TRUE)

## Assign the column with names for the new table.
names(cropProp) <- c("EVENT_TYPE", "Property Damage in $million", "Crop Damage in $million")
## Rank the top crop damage based on EVTYPE
cropRank <- cropProp[order(-cropProp$'Crop Damage'), ]
topCrop <- kable(cropRank[1:10,])
topCrop
EVENT_TYPE Property Damage in $million Crop Damage in $million
38 DROUGHT 1046.1060 13972.566
70 FLOOD 144657.7098 5661.968
257 RIVER FLOOD 5118.9455 5029.459
201 ICE STORM 3944.9278 5022.110
113 HAIL 15727.1658 3000.537
184 HURRICANE 11868.3190 2741.910
192 HURRICANE/TYPHOON 69305.8400 2607.873
58 FLASH FLOOD 16140.8117 1420.727
53 EXTREME COLD 67.7374 1292.973
95 FROST/FREEZE 9.4800 1094.086
## Rank the top property damage based on EVTYPE
propRank <- cropProp[order(-cropProp$'Property Damage'), ]
topProp <- kable(propRank[1:10,])
topProp
EVENT_TYPE Property Damage in $million Crop Damage in $million
70 FLOOD 144657.710 5661.9685
192 HURRICANE/TYPHOON 69305.840 2607.8728
348 TORNADO 56925.485 364.9501
294 STORM SURGE 43323.536 0.0050
58 FLASH FLOOD 16140.812 1420.7271
113 HAIL 15727.166 3000.5375
184 HURRICANE 11868.319 2741.9100
357 TROPICAL STORM 7703.891 678.3460
418 WINTER STORM 6688.497 26.9440
170 HIGH WIND 5270.046 638.5713
## Using Aggregate function to sum the 'CROP Damage' + 'Property Damage' based on EVTYPE and assign it to totalImpact
totalEconomic <- aggregate((expData$PROP_DMG + expData$CROP_DMG) ~ EVTYPE, data = expData, FUN = sum, na.rm = TRUE)

## Assign the column with name the new table of the total FATALITIES and INJURIES combined.
names(totalEconomic) <- c("EVENT_TYPE", "Total_Crop_&_Property_Damage")
## Using the Merge function to combine the two economic data together to show the economic impact
economicImpact <- merge(cropProp, totalEconomic)

## Rank the top economic impact on the new combined data
economicImpactRank <- economicImpact[order(-economicImpact$'Total_Crop_&_Property_Damage'), ]

## Top 10 Total Health Impact (CROP + PROPERTY DAMAGE) causing EVTYPE
topEconomicImpact <- kable(economicImpactRank[1:10, ])

## Please go to the Result section to look at the top 10 economic impact based on EVTYPE.
economyPlot <- ggplot(economicImpact[1:10, ], aes(x= reorder('EVENT_TYPE', 'Total_Crop_&_Property_Damage'), y='Total_Crop_&_Property_Damage')) + geom_bar(stat = "identity",fill="blue") + coord_flip() + labs(x = "Event types", y = "Crop & Property Damage", title = "Top 10 Crop & Property Damaging Weather Events")

economyPlot
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA


Results

This is based on the data collected from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database between 1950 and 2011.


Question 1: Across the United States, which types of events are most harmful with respect to population health?

ANS: Tornado is the most harmful as it caused a total of 96979 fatalities and injuries. there are other eventsin the evidence followed after tornado.

The following table and plot will show you the top 10 event type that are most harmful to the population health.

EVENT_TYPE FATALITIES INJURIES Total_FATALITIES_&_INJURIES
184 TORNADO 5633 91346 96979
32 EXCESSIVE HEAT 1903 6525 8428
191 TSTM WIND 504 6957 7461
47 FLOOD 470 6789 7259
123 LIGHTNING 816 5230 6046
69 HEAT 937 2100 3037
42 FLASH FLOOD 978 1777 2755
117 ICE STORM 89 1975 2064
173 THUNDERSTORM WIND 133 1488 1621
214 WINTER STORM 206 1321 1527

Question 2: Across the United States, which types of events have the greatest economic consequences?

ANS: Flood caused the greatest economic consequences with a total of more than $150 billion worth of damage from property and flood. There are other events in the evidence below.

The following table and plot will show you the top 10 event type that has the greatest economic consequences.

EVENT_TYPE Property Damage in $million Crop Damage in $million Total_Crop_&_Property_Damage
70 FLOOD 144657.710 5661.9685 150319.678
192 HURRICANE/TYPHOON 69305.840 2607.8728 71913.713
348 TORNADO 56925.485 364.9501 57290.436
294 STORM SURGE 43323.536 0.0050 43323.541
113 HAIL 15727.166 3000.5375 18727.703
58 FLASH FLOOD 16140.812 1420.7271 17561.539
38 DROUGHT 1046.106 13972.5660 15018.672
184 HURRICANE 11868.319 2741.9100 14610.229
257 RIVER FLOOD 5118.945 5029.4590 10148.405
201 ICE STORM 3944.928 5022.1100 8967.038