========================================================================================================
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This report contains the results of an analysis where the goal was to identify the most hazardous weather events with respect to population health and those with the greatest economic impact in the U.S. based on data collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA).
The storm database includes weather events from 1950 through the year 2011 and contains data estimates such as the number fatalities and injuries for each weather event as well as economic cost damage to properties and crops for each weather event.
The estimates for fatalities and injuries were used to determine weather events with the most harmful impact to population health. Property damage and crop damage cost estimates were used to determine weather events with the greatest economic consequences.
library(ggplot2)
library(xtable)
library(knitr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
stormURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
storm <- "data/storm-data.csv.bz2"
if (!file.exists('data')) {
dir.create('data')}
if (!file.exists(storm)) {
download.file(url = stormURL, destfile = storm)}
stormData <- read.csv(storm, sep = ",", header = TRUE)
names(stormData)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
str(stormData)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
In case of a large data set like the storm data, it is better to create a subset of the data that contains only the needed columns required for analysis and related to the desired output of the analysis. Subset Data should include the necessary columns:
| Variable | Description |
|---|---|
| EVTYPE | Event type (Flood, Heat, Hurricane, Tornado, …) |
| FATALITIES | Number of fatalities resulting from event |
| INJURIES | Number of injuries resulting from event |
| PROPDMG | Property damage in USD |
| PROPDMGEXP | Unit multiplier for property damage (K, M, or B) |
| CROPDMG | Crop damage in USD |
| CROPDMGEXP | Unit multiplier for property damage (K, M, or B) |
| BGN_DATE | Begin date of the event |
| END_DATE | End date of the event |
| STATE | State where the event occurred |
stormTidy <- subset(stormData, EVTYPE != '?' & (FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0), select = c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP", "BGN_DATE", "END_DATE", "STATE"))
summary(stormTidy)
## EVTYPE FATALITIES INJURIES PROPDMG
## Length:254632 Min. : 0.0000 Min. : 0.0000 Min. : 0.00
## Class :character 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 2.00
## Mode :character Median : 0.0000 Median : 0.0000 Median : 5.00
## Mean : 0.0595 Mean : 0.5519 Mean : 42.75
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 25.00
## Max. :583.0000 Max. :1700.0000 Max. :5000.00
## PROPDMGEXP CROPDMG CROPDMGEXP BGN_DATE
## Length:254632 Min. : 0.000 Length:254632 Length:254632
## Class :character 1st Qu.: 0.000 Class :character Class :character
## Mode :character Median : 0.000 Mode :character Mode :character
## Mean : 5.411
## 3rd Qu.: 0.000
## Max. :990.000
## END_DATE STATE
## Length:254632 Length:254632
## Class :character Class :character
## Mode :character Mode :character
##
##
##
The tidy storm data set contains 254632 observations with 10 variables and no missing values.
| Variable | Description |
|---|---|
| DATE_START | Begin date of the event stored as a date type |
| DATE_END | End date of the event stored as a date type |
| YEAR | Year the event started |
| DURATION | Duration (in hours) of the event |
stormTidy$DATE_START <- as.Date(stormTidy$BGN_DATE, format = "%m/%d/%Y")
stormTidy$DATE_END <- as.Date(stormTidy$END_DATE, format = "%m/%d/%Y")
stormTidy$YEAR <- as.integer(format(stormTidy$DATE_START, "%Y"))
stormTidy$DURATION <- as.numeric(stormTidy$DATE_END - stormTidy$DATE_START)/3600
multiplier <- function(exp) {
exp <- toupper(exp);
if (exp == "") return (10^0);
if (exp == "-") return (10^0);
if (exp == "?") return (10^0);
if (exp == "+") return (10^0);
if (exp == "0") return (10^0);
if (exp == "1") return (10^1);
if (exp == "2") return (10^2);
if (exp == "3") return (10^3);
if (exp == "4") return (10^4);
if (exp == "5") return (10^5);
if (exp == "6") return (10^6);
if (exp == "7") return (10^7);
if (exp == "8") return (10^8);
if (exp == "9") return (10^9);
if (exp == "H") return (10^2);
if (exp == "K") return (10^3);
if (exp == "M") return (10^6);
if (exp == "B") return (10^9);
return (NA);
}
Create New Columns for Property Cost and Crop Cost : PR_COST, CR_COST
# Compute the property damage and crop damage costs (in billions) using sapply
stormTidy$PR_COST <- with(stormTidy, as.numeric(PROPDMG) * sapply(PROPDMGEXP, multiplier))/10^9
stormTidy$CR_COST <- with(stormTidy, as.numeric(CROPDMG) * sapply(CROPDMGEXP, multiplier))/10^9
The project needs to address through the use of data analysis using the following question below:
The raw data has been processed and tidied. The only thing to do is to create a summarized dataset for desired output. * Health Impact Summarized Data
healthImpactData <- aggregate(x = list(HEALTH_IMPACT = stormTidy$FATALITIES + stormTidy$INJURIES), by = list(EVENT_TYPE = stormTidy$EVTYPE),FUN = sum, na.rm = TRUE)
healthImpactData <- healthImpactData[order(healthImpactData$HEALTH_IMPACT, decreasing = TRUE),]
damageCostImpactData <- aggregate(x = list(DAMAGE_IMPACT = stormTidy$PR_COST + stormTidy$CR_COST), by = list(EVENT_TYPE = stormTidy$EVTYPE), FUN = sum,na.rm = TRUE)
damageCostImpactData <- damageCostImpactData[order(damageCostImpactData$DAMAGE_IMPACT, decreasing = TRUE),]
library(knitr)
data <- head(healthImpactData, 10)
caption <- "Top 10 US Weather Events that are Most Harmful to Health Population"
kable(data, format = "html", caption = caption, table.attr = 'class="table-bordered"')
| EVENT_TYPE | HEALTH_IMPACT | |
|---|---|---|
| 406 | TORNADO | 96979 |
| 60 | EXCESSIVE HEAT | 8428 |
| 422 | TSTM WIND | 7461 |
| 85 | FLOOD | 7259 |
| 257 | LIGHTNING | 6046 |
| 150 | HEAT | 3037 |
| 72 | FLASH FLOOD | 2755 |
| 237 | ICE STORM | 2064 |
| 363 | THUNDERSTORM WIND | 1621 |
| 480 | WINTER STORM | 1527 |
HIGraph <- ggplot(head(healthImpactData, 10),
aes(x = reorder(EVENT_TYPE, HEALTH_IMPACT), y = HEALTH_IMPACT, fill = EVENT_TYPE)) +
geom_bar(stat = "identity") +
xlab("Event Type") +
ylab("Total Health Impacts [Fatalities + Injuries]") +
theme(plot.title = element_text(size = 14, hjust = 0.5),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
ggtitle("Top 10 Weather Events Most Harmful to\nPopulation Health")
print(HIGraph)
data2 <- head(damageCostImpactData, 10)
caption <- "Top 10 Weather Events with Greatest Economic Consequences"
kable(data2, format = "html", caption = caption, table.attr ='class="table-bordered"')
| EVENT_TYPE | DAMAGE_IMPACT | |
|---|---|---|
| 85 | FLOOD | 150.319678 |
| 223 | HURRICANE/TYPHOON | 71.913713 |
| 406 | TORNADO | 57.362334 |
| 349 | STORM SURGE | 43.323541 |
| 133 | HAIL | 18.761222 |
| 72 | FLASH FLOOD | 18.243991 |
| 48 | DROUGHT | 15.018672 |
| 214 | HURRICANE | 14.610229 |
| 309 | RIVER FLOOD | 10.148404 |
| 237 | ICE STORM | 8.967041 |
DCIGraph <- ggplot(head(damageCostImpactData, 10),
aes(x = reorder(EVENT_TYPE, DAMAGE_IMPACT), y = DAMAGE_IMPACT, fill = EVENT_TYPE)) +
geom_bar(stat = "identity") +
xlab("Event Type") +
ylab("Total Property / Crop Damage Cost\n(in Billions)") +
theme(plot.title = element_text(size = 14, hjust = 0.5),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + # Set angle to 90 degrees
ggtitle("Top 10 Weather Events with\nGreatest Economic Consequences")
print(DCIGraph)
Based on the generated outputs based from the data analysis done above, the following inferences and conclusions can be drawn:
a. Which types of weather events are most harmful to population health?
The greatest number of fatalities and injuries are mostly caused by the weather event, Tornadoes.
b. Which types of weather events have the greatest economic consequences?
The weather event that have the greatest economic consequences based on property damage and crop damage costs is Flood.