This analysis of U.S. NOAA Storm Data from January 1950 through November 2011 proposes answers to the following two (2) questions:
Across the United States, which types of events are most harmful with respect to population health? Using fatality and injury counts provided in the storm data, tornadoes are identified as most harmful to population health. See detailed chart in the POPULATION HEALTH IMPACT section of the RESULTS section.
Across the United States, which types of events have the greatest economic consequences? Using property and crop damage estimates, floods are identified as having the greatest economic impact. See detailed chart in the ECONOMIC IMPACT section of the RESULTS section.
Data processing consists of data loading, cleansing and analysis. The environment requires several packages (see code) for analysis and reporting.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(scales)
The NOAA Storm Data is downloaded, extracted and read into the analysis program.
stormDataURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
temp <- tempfile() ## Set up temporary file for download
download.file(stormDataURL, destfile=temp)
stormData_raw <- read.csv(temp, stringsAsFactors = FALSE)
unlink(temp) ## Remove temporary file used in download
str(stormData_raw)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
The following steps are taken to clean (“tidy”) the dataset before analysis activities begin.
Variables (columns) not required to answer the questions are removed from the dataset. 37 columns are reduced to 10 columns for analysis.
The beginning date of the storm event (BEGN_DATE) is converted to date format. The field “event_year” is created from the BEGN_DATE field.
stormData <- select(stormData_raw,BGN_DATE, TIME_ZONE,
STATE,
EVTYPE,
FATALITIES, INJURIES,
PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
stormData$BGN_DATE <- as.Date(stormData$BGN_DATE,format="%m/%d/%Y")
stormData$event_year <- year(stormData$BGN_DATE)
The following actions were taken to normalize ‘EVTYPE’ variable including:
NOTE: The original count of 985 ‘EVTYPE’ have been consolidated to 165 values Detailed transformations can be seen in the R-code below:
stormData$event_type <- toupper(stormData$EVTYPE)
stormData <- stormData[!grepl("SUMMARY", stormData$event_type),]
stormData$event_type[grep("THUNDERSTORM*|THUNER*|THUD*|*MICROBURST*|*DOWNBURST*|*MICOBURST*|
*GUSTNADO*|*TSTM*",stormData$event_type)] <- "THUNDERSTORM.WINDS"
stormData$event_type[grep("*TROPICAL STORM*",stormData$event_type)] <- "TROPICAL.STORM"
stormData$event_type[grep("*TORNADO*|*TORNDAO*",stormData$event_type)] <- "TORNADO"
stormData$event_type[grep("*HIGH WINDS*|* WIND*|*WIND AND WAVE*|*WIND DAMAGE*|*WIND GUSTS*|
*WIND STORM*|*WINDS*|^WND*",stormData$event_type)] <- "HIGH.WIND"
stormData$event_type[grep("^HURRICANE*|*REMNANTS OF FLOYD*|*TYPHOON*", stormData$event_type)] <- "HURRICANE(TYPHOON)"
stormData$event_type[grep("*HAIL*",stormData$event_type)] <- "HAIL"
stormData$event_type[grep("HEAVY PRECIPATATION*|*HEAVY PRECIPITATION*|*HEAVY RAIN*|*HEAVY SHOWER*|*HVY RAIN*|
*EXCESSIVE RAIN*|*EXCESSIVE RAINFALL*|EXCESSIVE WETNESS*|EXTREMELY WET*|
*RECORD PRECIPITATION*|*RAINFALL*|*RAINSTORM*|*RAIN*",
stormData$event_type)] <- "HEAVY.RAIN"
## THE TRANSFORMATION OF 'FLOOD' ITEMS MUST BE IN THE SEQUENCE BELOW
stormData$event_type[grep("*FLASH FLOOD*|*FLOOD FLASH*|*FLOOD/FLASH*|*FLASHFLOOD*|
*URBAN/SML*|*URBAN AND SMALL*|*URBAN SMALL*|*URBAN/SMALL*|
*SMALL STREAM*",stormData$event_type)] <- "FLASHFL--D"
stormData$event_type[grep("*COASTAL FLOOD*|*BEACH FLOOD*|*COASTAL FLOODING*|*COASTAL FLOODING*|
*COASTAL/TIDAL FLOOD*|*COASTALFLOOD*|*COASTAL EROSION*|*CSTL FLOODING/EROSION*",
stormData$event_type)] <- "COASTALFL--D"
stormData$event_type[grep("*FLOOD*",stormData$event_type)] <- "FLOOD"
stormData$event_type[grep("*FLASHFL--D*",stormData$event_type)] <- "FLASH.FLOOD"
stormData$event_type[grep("*COASTALFL--D*",stormData$event_type)] <- "COASTAL.FLOOD"
stormData$event_type[grep("*HEAVY SNOW*|*HEAVY WET SNOW*|*RECORD SNOW*|*SNOW*",stormData$event_type)] <- "HEAVY.SNOW"
stormData$event_type[grep("*BLIZZARD*",stormData$event_type)] <- "BLIZZARD"
stormData$event_type[grep("*COLD*|*COOL*|*RECORD LOW*|*LOW TEMPERATURE*|*EXTREME WIND CHILL*|^EXTREME COLD*|^EXTREME/RECORD COLD*|
^EXCESSIVE COLD*|^EXCESSIVE COLD*|^EXTENDED COLD*|^UNSEASONABLE COLD*|^UNSEASONABLY COLD*|
^UNUSUALLY COLD*|^WIND CHILL|*FREEZE*|*UNSEASONAL LOW*|*LOW TEMPERATURE*", stormData$event_type)] <- "COLD/WIND.CHILL"
stormData$event_type[grep("*SLEET*|*FREEZING*",stormData$event_type)] <- "SLEET"
stormData$event_type[grep("*HEAT*|*RECORD HIGH*|*WARM*|*HIGH TEMPERATURE*|*RECORD TEMPERATURE*|*HOT *",stormData$event_type)] <- "HEAT"
stormData$event_type[grep("*LIGHTNING*|*LIGHTING*|*LIGNTNING*",stormData$event_type)] <- "LIGHTNING"
stormData$event_type[grep("*FUNNEL*",stormData$event_type)] <- "FUNNEL.CLOUD"
stormData$event_type[grep("*WATER SPOUT*|*WATERSPOUT*|*WAYTERSPOUT*",stormData$event_type)] <- "WATERSPOUT"
stormData$event_type[grep("*VOLCANIC*",stormData$event_type)] <- "VOLCANIC.ASH"
stormData$event_type[grep("*GLAZE*|*ICE STORM*|*ICY ROADS*|*PATCHY ICE*",stormData$event_type)] <- "ICE.STORM"
stormData$event_type[grep("*DUST DEVIL*|*DUST DEVEL*",stormData$event_type)] <- "DUST.DEVIL"
stormData$event_type[grep("*DUST STORM*|*DUSTSTORM*",stormData$event_type)] <- "DUST.STORM"
Analysis begins with preparation of 2 metrics which can be used to identify the high impact storm events: economic impact and population health impact. A summary table is prepared to generate reports and graphs.
exp_Code <- c("H","h","K","k","M","m","B","b",
"+","-","?","0","1","2","3","4","5","6","7","8",""," ")
exp_Value <- c(100,100,1000,1000,1000000,1000000,1000000000,1000000000,
1,0,0,10,10,10,10,10,10,10,10,10,0,0)
exp_Table <- data.frame(exp_Code,exp_Value)
stormData$prty_damage <- stormData$PROPDMG *
exp_Table[match(stormData$PROPDMGEXP,exp_Table$exp_Code),"exp_Value"]
stormData$crop_damage <- stormData$CROPDMG *
exp_Table[match(stormData$CROPDMGEXP,exp_Table$exp_Code),"exp_Value"]
stormData$economic_impact <- stormData$prty_damage + stormData$crop_damage
stormData$mFWI = (600*stormData$FATALITIES) + (stormData$INJURIES)
A summary table for reporting is prepared:
* Storm data is grouped by the normalized 'event_type'
* Economic impact and modified Fatality/Weighted Injuries index are summed stormSummary <- stormData %>% group_by(event_type) %>%
summarize(economic_impact = sum(economic_impact),
mFWI = sum(mFWI))
## `summarise()` ungrouping output (override with `.groups` argument)
To answer the 2 questions, summary storm data is sorted in descending order by the key metric to identify the “top 10” impacting storm events.
As indicated in the graph below which uses a modified Fatality/Weighted Injury index to assign impact level, tornadoes have the greatest impact on population health due to number of deaths and injuries across the United States.
top10rank_h <- stormSummary[order(-stormSummary$mFWI, stormSummary$mFWI),]
top10rank_h <- top10rank_h[1:10,]
ggplot(data = top10rank_h, aes(x=reorder(event_type,mFWI), mFWI,fill=event_type, main="1950-2011 Top 10 Storm Events Impacting Population Health")) +
geom_bar(stat = "identity", show.legend=FALSE) +
geom_text(aes(label=mFWI, hjust = +0.25)) +
coord_flip() +
scale_y_continuous(name="Fatality/Weighted Injury Calculation", labels=comma) +
scale_x_discrete(name="Storm Event")
As indicated in the graph below which uses the combination of property and crop damage assessment, floods have the greatest economic impact. In the period 1950-2011, floods have caused over $160 billion in damage.
top10rank_e <- stormSummary[order(-stormSummary$economic_impact, stormSummary$economic_impact),]
top10rank_e <- top10rank_e[1:10,]
ggplot(data = top10rank_e, aes(x=reorder(event_type,economic_impact),
economic_impact/1000000,fill=event_type,
main="1950-2011 Economic Impact from 10 Storm Events")) +
geom_bar(stat = "identity", show.legend=FALSE) +
geom_text(aes(label=sprintf("$%.2f", economic_impact/1000000), hjust = +0.25)) +
coord_flip() +
scale_y_continuous(name="Economic Impact in USD (millions)", label=dollar_format()) +
scale_x_discrete(name="Storm Event")