The platform specification used:
| Spec | Description |
|---|---|
| OS | Windows 10 Pro - 64 bit |
| CPU | AMD Ryzen 5 - 3400G |
| RAM | 16GB DDR4 3000MHz |
| Storage | 500GB SSD - M.2 NVMe (PCIe) |
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The database between 1950 and 2011 can be downloaded at the following link: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
An additional database documentation on how some of the variables are constructed/defined can be found here: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf
This report will use the NOAA Storm Database to answer the following two questions:
Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
The result with graphs and conclusion will be displayed in the last part.
To process the data we will be dividing this section into three seperate processes.
Firstly we will need to download the data directly from the link provided. Once the data is downloaded we will look at the at different variables needed to solve the two questions on the impact on health and economy.
Secondly to solve the question on the health impact there are two variables that we need to analyse namely FATALITIES and INJURIES. This will be elaborated further in the processes below
Thirdly to solve the question on economic impact there are four variables that we need to analyse namely CROPDMGEXP, CROPDMG, PROPDMGEXP and PROPDMG. ‘CROPDMGEXP’ is the exponent values for ‘CROPDMG’ (crop damage). In the same way, ‘PROPDMGEXP’ is the exponent values for ‘PROPDMG’ (property damage). Both are needed to get the total values for crops and property damage. (B or b = Billion, M or m = Million, K or k = Thousand, H or h = Hundred). This will be elaborated further in the processes below
## Set the URL
url <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
## Download the file and checked that the bz2 file is 46.8 MB (49,177,144 bytes) according to the info in week 4 forum discussion.
download.file(url, destfile='repdata-data-StormData.csv.bz2', mode='wb')
## Read the csv bz file and assign to data.
rawData <- read.csv("repdata-data-StormData.csv.bz2", header = TRUE, sep = ",", na.strings = "NA")
## Look at the total number of observations and variables whether it is the same as the Week 4 Forum discussion which is 902297 records and 37 variables.
dim(rawData)
[1] 902297 37
Based on the raw data downloaded there are 902297 observations and 37 variables.
As there are close to a million observations in the raw data, it is important that we do not waste any unnecessary resources to process the data.Therefore we will need to Prepare only the relevant data to answer the questions on health and economic impact.
## Create a Subset of FATALITIES or INJURIES data that contains more than 0.
healthData <- rawData[(rawData$FATALITIES >0) | (rawData$INJURIES > 0),]
dim(healthData)
[1] 21929 37
## Create a Subset of CROPDMG or PROPDMG data that contains more than 0.
economicData <- rawData[(rawData$CROPDMG > 0) | (rawData$PROPDMG > 0),]
dim(economicData)
[1] 245031 37
The relevant health data contains 21929 observations and 37 variables where as the relevant economic data contains 245031 observations and 37 variables.
## Using Aggregate and CBIND to sum the total FATALITIES and INJURIES based on EVTYPE and assign it to fatInj
fatInj <- aggregate(cbind(FATALITIES,INJURIES)~EVTYPE,data = healthData, FUN = sum, na.rm = TRUE)
## Assign the column with names for the new table.
names(fatInj) <- c("EVENT_TYPE", "FATALITIES", "INJURIES")
## Rank the top INJURIES based on EVTYPE
injuriesRank <- fatInj[order(-fatInj$INJURIES), ]
## Rank the top FATALITIES based on EVTYPE
fatalitiesRank <- fatInj[order(-fatInj$FATALITIES), ]
## Top 10 INJURIES causing EVTYPE
topInjuries <- kable(injuriesRank[1:10,])
topInjuries
| EVENT_TYPE | FATALITIES | INJURIES | |
|---|---|---|---|
| 184 | TORNADO | 5633 | 91346 |
| 191 | TSTM WIND | 504 | 6957 |
| 47 | FLOOD | 470 | 6789 |
| 32 | EXCESSIVE HEAT | 1903 | 6525 |
| 123 | LIGHTNING | 816 | 5230 |
| 69 | HEAT | 937 | 2100 |
| 117 | ICE STORM | 89 | 1975 |
| 42 | FLASH FLOOD | 978 | 1777 |
| 173 | THUNDERSTORM WIND | 133 | 1488 |
| 67 | HAIL | 15 | 1361 |
## Top 10 FATALITIES causing EVTYPE
topFatalities <- kable(fatalitiesRank[1:10,])
topFatalities
| EVENT_TYPE | FATALITIES | INJURIES | |
|---|---|---|---|
| 184 | TORNADO | 5633 | 91346 |
| 32 | EXCESSIVE HEAT | 1903 | 6525 |
| 42 | FLASH FLOOD | 978 | 1777 |
| 69 | HEAT | 937 | 2100 |
| 123 | LIGHTNING | 816 | 5230 |
| 191 | TSTM WIND | 504 | 6957 |
| 47 | FLOOD | 470 | 6789 |
| 147 | RIP CURRENT | 368 | 232 |
| 93 | HIGH WIND | 248 | 1137 |
| 2 | AVALANCHE | 224 | 170 |
## Using Aggregate function to sum the FATALITIES + INJURIES based on EVTYPE and assign it to totalImpact
totalImpact <- aggregate((FATALITIES + INJURIES) ~ EVTYPE, data = healthData, FUN = sum, na.rm = TRUE)
## Assign the column with name the new table of the total FATALITIES and INJURIES combined.
names(totalImpact) <- c("EVENT_TYPE", "Total_FATALITIES_&_INJURIES")
totalImpact = totalImpact[order(-totalImpact$`Total_FATALITIES_&_INJURIES`), ]
## Using the Merge function to combine the two data together to show the health impact
healthImpact <- merge(fatInj, totalImpact)
## Rank the top health impact on the new combined data
healthImpact <- healthImpact[order(-healthImpact$`Total_FATALITIES_&_INJURIES`), ]
## Top 10 Total Health Impact (FATALITIES + INJURIES) causing EVTYPE
topHealthImpact <- kable(healthImpact[1:10,])
## Please go to the Result section to look at the top 10 health impact based on EVTYPE.
healthPlot <- ggplot(totalImpact[1:10, ], aes(x= reorder('EVENT_TYPE', 'Total_FATALITIES_&_INJURIES'), y='Total_FATALITIES_&_INJURIES')) + geom_bar(stat = "identity",fill="blue") + coord_flip() + labs(x = "Event types", y = "Fatalities & Injuries", title = "Top 10 Fatalities & Injuries Weather Events")
healthPlot
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Retrieving values of exponents
expData <- economicData[economicData$PROPDMGEXP %in% c("", "K", "M", "B") & economicData$CROPDMGEXP %in% c("", "K", "M", "B"), ]
## Create function to convert exponent values to calculate total damages using DMG * Exponent
convExponent <- function(dmg, exp) {
if (exp == "K") {
dmg * 1000
} else if (exp == "M") {
dmg * 10^6
} else if (exp == "B") {
dmg * 10^9
} else if (exp == "") {
dmg
} else {
stop("NOT VALID DATA")
}
}
## Using MAPPLY function to get the new property damage value in million dollars value
expData$PROP_DMG <- mapply(convExponent, expData$PROPDMG, expData$PROPDMGEXP)/10^6
## Using MAPPLY function to get the new crop damage value in million dollar value
expData$CROP_DMG <- mapply(convExponent, expData$CROPDMG, expData$CROPDMGEXP)/10^6
## Using Aggregate and CBIND to sum the total property and crop damage based on EVTYPE
cropProp <- aggregate(cbind(expData$PROP_DMG, expData$CROP_DMG)~EVTYPE,data = expData, FUN = sum, na.rm = TRUE)
## Assign the column with names for the new table.
names(cropProp) <- c("EVENT_TYPE", "Property Damage in $million", "Crop Damage in $million")
## Rank the top crop damage based on EVTYPE
cropRank <- cropProp[order(-cropProp$'Crop Damage'), ]
topCrop <- kable(cropRank[1:10,])
topCrop
| EVENT_TYPE | Property Damage in $million | Crop Damage in $million | |
|---|---|---|---|
| 38 | DROUGHT | 1046.1060 | 13972.566 |
| 70 | FLOOD | 144657.7098 | 5661.968 |
| 257 | RIVER FLOOD | 5118.9455 | 5029.459 |
| 201 | ICE STORM | 3944.9278 | 5022.110 |
| 113 | HAIL | 15727.1658 | 3000.537 |
| 184 | HURRICANE | 11868.3190 | 2741.910 |
| 192 | HURRICANE/TYPHOON | 69305.8400 | 2607.873 |
| 58 | FLASH FLOOD | 16140.8117 | 1420.727 |
| 53 | EXTREME COLD | 67.7374 | 1292.973 |
| 95 | FROST/FREEZE | 9.4800 | 1094.086 |
## Rank the top property damage based on EVTYPE
propRank <- cropProp[order(-cropProp$'Property Damage'), ]
topProp <- kable(propRank[1:10,])
topProp
| EVENT_TYPE | Property Damage in $million | Crop Damage in $million | |
|---|---|---|---|
| 70 | FLOOD | 144657.710 | 5661.9685 |
| 192 | HURRICANE/TYPHOON | 69305.840 | 2607.8728 |
| 348 | TORNADO | 56925.485 | 364.9501 |
| 294 | STORM SURGE | 43323.536 | 0.0050 |
| 58 | FLASH FLOOD | 16140.812 | 1420.7271 |
| 113 | HAIL | 15727.166 | 3000.5375 |
| 184 | HURRICANE | 11868.319 | 2741.9100 |
| 357 | TROPICAL STORM | 7703.891 | 678.3460 |
| 418 | WINTER STORM | 6688.497 | 26.9440 |
| 170 | HIGH WIND | 5270.046 | 638.5713 |
## Using Aggregate function to sum the 'CROP Damage' + 'Property Damage' based on EVTYPE and assign it to totalImpact
totalEconomic <- aggregate((expData$PROP_DMG + expData$CROP_DMG) ~ EVTYPE, data = expData, FUN = sum, na.rm = TRUE)
## Assign the column with name the new table of the total FATALITIES and INJURIES combined.
names(totalEconomic) <- c("EVENT_TYPE", "Total_Crop_&_Property_Damage")
## Using the Merge function to combine the two economic data together to show the economic impact
economicImpact <- merge(cropProp, totalEconomic)
## Rank the top economic impact on the new combined data
economicImpactRank <- economicImpact[order(-economicImpact$'Total_Crop_&_Property_Damage'), ]
## Top 10 Total Health Impact (CROP + PROPERTY DAMAGE) causing EVTYPE
topEconomicImpact <- kable(economicImpactRank[1:10, ])
## Please go to the Result section to look at the top 10 economic impact based on EVTYPE.
economyPlot <- ggplot(economicImpact[1:10, ], aes(x= reorder('EVENT_TYPE', 'Total_Crop_&_Property_Damage'), y='Total_Crop_&_Property_Damage')) + geom_bar(stat = "identity",fill="blue") + coord_flip() + labs(x = "Event types", y = "Crop & Property Damage", title = "Top 10 Crop & Property Damaging Weather Events")
economyPlot
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
This is based on the data collected from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database between 1950 and 2011.
ANS: Tornado is the most harmful as it caused a total of 96979 fatalities and injuries. there are other eventsin the evidence followed after tornado.
The following table and plot will show you the top 10 event type that are most harmful to the population health.
| EVENT_TYPE | FATALITIES | INJURIES | Total_FATALITIES_&_INJURIES | |
|---|---|---|---|---|
| 184 | TORNADO | 5633 | 91346 | 96979 |
| 32 | EXCESSIVE HEAT | 1903 | 6525 | 8428 |
| 191 | TSTM WIND | 504 | 6957 | 7461 |
| 47 | FLOOD | 470 | 6789 | 7259 |
| 123 | LIGHTNING | 816 | 5230 | 6046 |
| 69 | HEAT | 937 | 2100 | 3037 |
| 42 | FLASH FLOOD | 978 | 1777 | 2755 |
| 117 | ICE STORM | 89 | 1975 | 2064 |
| 173 | THUNDERSTORM WIND | 133 | 1488 | 1621 |
| 214 | WINTER STORM | 206 | 1321 | 1527 |
ANS: Flood caused the greatest economic consequences with a total of more than $150 billion worth of damage from property and flood. There are other events in the evidence below.
The following table and plot will show you the top 10 event type that has the greatest economic consequences.
| EVENT_TYPE | Property Damage in $million | Crop Damage in $million | Total_Crop_&_Property_Damage | |
|---|---|---|---|---|
| 70 | FLOOD | 144657.710 | 5661.9685 | 150319.678 |
| 192 | HURRICANE/TYPHOON | 69305.840 | 2607.8728 | 71913.713 |
| 348 | TORNADO | 56925.485 | 364.9501 | 57290.436 |
| 294 | STORM SURGE | 43323.536 | 0.0050 | 43323.541 |
| 113 | HAIL | 15727.166 | 3000.5375 | 18727.703 |
| 58 | FLASH FLOOD | 16140.812 | 1420.7271 | 17561.539 |
| 38 | DROUGHT | 1046.106 | 13972.5660 | 15018.672 |
| 184 | HURRICANE | 11868.319 | 2741.9100 | 14610.229 |
| 257 | RIVER FLOOD | 5118.945 | 5029.4590 | 10148.405 |
| 201 | ICE STORM | 3944.928 | 5022.1100 | 8967.038 |