#Coursera: Reproducible Research Assignment 2 - NOAA Storm Database analysis for severe weather events Synopsis This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. We shall cleanse the data; and produce the top 5 severe harmful events where focus needs to be driven. Observation (s) There are 985 uniqEvents There are 37 variables and 902297 observations CROPDMGEXP, and PROPDMGEXP requires cleansing for uniformity on units The CROPDMG, PROPDMG values would need to scaled appropriately
Work Details Loading and preprocessing the data Assumptions
Working directory is set to current local clone of the github repository for this assignment.
The dataset,repdata-data-StormData.csv.bz2, required for the reproducible research is already downloaded to the repository.
setwd("C:/Users/Helga/Documents/gabriel/cursos/")
data.local = "C:/Users/Helga/Documents/gabriel/cursos/repdata_data_StormData.csv"
# load the data
stormData <- read.csv(data.local, na.strings = "NA", stringsAsFactors = FALSE)
# take a quick look at the data culture
summary(stormData)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## Min. : 1.0 Length:902297 Length:902297 Length:902297
## 1st Qu.:19.0 Class :character Class :character Class :character
## Median :30.0 Mode :character Mode :character Mode :character
## Mean :31.2
## 3rd Qu.:45.0
## Max. :95.0
##
## COUNTY COUNTYNAME STATE EVTYPE
## Min. : 0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 31 Class :character Class :character Class :character
## Median : 75 Mode :character Mode :character Mode :character
## Mean :101
## 3rd Qu.:131
## Max. :873
##
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE
## Min. : 0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 0 Class :character Class :character Class :character
## Median : 0 Mode :character Mode :character Mode :character
## Mean : 1
## 3rd Qu.: 1
## Max. :3749
##
## END_TIME COUNTY_END COUNTYENDN END_RANGE
## Length:902297 Min. :0 Mode:logical Min. : 0
## Class :character 1st Qu.:0 NA's:902297 1st Qu.: 0
## Mode :character Median :0 Median : 0
## Mean :0 Mean : 1
## 3rd Qu.:0 3rd Qu.: 0
## Max. :0 Max. :925
##
## END_AZI END_LOCATI LENGTH WIDTH
## Length:902297 Length:902297 Min. : 0.0 Min. : 0
## Class :character Class :character 1st Qu.: 0.0 1st Qu.: 0
## Mode :character Mode :character Median : 0.0 Median : 0
## Mean : 0.2 Mean : 8
## 3rd Qu.: 0.0 3rd Qu.: 0
## Max. :2315.0 Max. :4400
##
## F MAG FATALITIES INJURIES
## Min. :0 Min. : 0 Min. : 0 Min. : 0.0
## 1st Qu.:0 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0.0
## Median :1 Median : 50 Median : 0 Median : 0.0
## Mean :1 Mean : 47 Mean : 0 Mean : 0.2
## 3rd Qu.:1 3rd Qu.: 75 3rd Qu.: 0 3rd Qu.: 0.0
## Max. :5 Max. :22000 Max. :583 Max. :1700.0
## NA's :843563
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0 Length:902297 Min. : 0.0 Length:902297
## 1st Qu.: 0 Class :character 1st Qu.: 0.0 Class :character
## Median : 0 Mode :character Median : 0.0 Mode :character
## Mean : 12 Mean : 1.5
## 3rd Qu.: 0 3rd Qu.: 0.0
## Max. :5000 Max. :990.0
##
## WFO STATEOFFIC ZONENAMES LATITUDE
## Length:902297 Length:902297 Length:902297 Min. : 0
## Class :character Class :character Class :character 1st Qu.:2802
## Mode :character Mode :character Mode :character Median :3540
## Mean :2875
## 3rd Qu.:4019
## Max. :9706
## NA's :47
## LONGITUDE LATITUDE_E LONGITUDE_ REMARKS
## Min. :-14451 Min. : 0 Min. :-14455 Length:902297
## 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0 Class :character
## Median : 8707 Median : 0 Median : 0 Mode :character
## Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. : 17124 Max. :9706 Max. :106220
## NA's :40
## REFNUM
## Min. : 1
## 1st Qu.:225575
## Median :451149
## Mean :451149
## 3rd Qu.:676723
## Max. :902297
##
str(stormData)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
uniqEvents <- unique(stormData$EVTYPE)
numEvents <- length(uniqEvents)
Data Cleansing/subsetting for expenses:
Narrow down the window to where the problems are
sData <- stormData[stormData$FATALITIES > 0 | stormData$INJURIES > 0 | stormData$PROPDMG >
0 | stormData$CROPDMG > 0, ]
propDmgEXP <- unique (sData$PROPDMGEXP)
cropDmgEXP <- unique (sData$CROPDMGEXP)
table(sData$CROPDMGEXP)
##
## ? 0 B k K m M
## 152664 6 17 7 21 99932 1 1985
table(sData$PROPDMGEXP)
##
## - + 0 2 3 4 5 6 7
## 11585 1 5 210 1 1 4 18 3 3
## B h H K m M
## 40 1 6 231428 7 11320
#Approach:
# Make the scale Uniform: need to scale the data value accordingly
# B/b --> billion : 10e(9)
# M/m --> million : 10e(6)
# K/k --> thousand: 10e(3)
# H/h --> hundred : 10e(2)
# "-" --> 10e0
# "?" --> 10e0
# "number" -> 10e(number)
# function to convert DMGEXP character to multiplication number
# but suppress NA warnings caused by unnecessary 10^character attempts
exp_factor <- function(x){suppressWarnings(
ifelse(x %in% as.character(0:8), 10^as.numeric(x),
ifelse(x %in% c("b","B"), 10^9, # billion
ifelse(x %in% c("m","M"), 10^6, # million/mega
ifelse(x %in% c("k","K"), 10^3, # kilo
ifelse(x %in% c("h","H"), 10^2, # hecto
1))))))
}
sData$PROPDMG <- sData$PROPDMG*exp_factor(sData$PROPDMGEXP)
sData$CROPDMG <- sData$CROPDMG*exp_factor(sData$CROPDMGEXP)
vPropDmg <-tapply(sData$PROPDMG, sData$EVTYPE, sum);
vPropDmg <-vPropDmg[order(vPropDmg, decreasing=TRUE)]
# Top 5 Property Damage Cost in Millions USD
vPropDmgTop5 <- head (vPropDmg/10^6,5)
vPropDmgTop5
## FLOOD HURRICANE/TYPHOON TORNADO STORM SURGE
## 144658 69306 56947 43324
## FLASH FLOOD
## 16823
vCropDmg <-tapply(sData$CROPDMG, sData$EVTYPE, sum);
vCropDmg <-vPropDmg[order(vCropDmg, decreasing=TRUE)]
# Top 5 Crops Damage Cost in Millions USD
vCropDmgTop5 <- head (vCropDmg/10^6, 5)
vCropDmgTop5
## WINTER STORM HIGH WINDS Coastal Flooding WATERSPOUT-TORNADO
## 60.000 6.325 0.015
## THUNDERSTORM WIND 60 MPH SNOW/COLD
## 0.070 1.000
Plot depicting Top 5 events causing most of expense
par(mfrow = c(2, 1))
barplot(vCropDmgTop5, xlab = "Event Type", ylab = "Crops Damage in Millions USD")
barplot(vPropDmgTop5, xlab = "Event Type", ylab = "Properties Damage in Millions USD")
plot of chunk unnamed-chunk-3 Top 5 Events causing the most Fatalities/Injuries
vFatal <- tapply(sData$FATALITIES, sData$EVTYPE, sum)
vFatal <- vFatal[order(vFatal, decreasing = TRUE)]
vFatalTop5 <- head(vFatal, 5)
vFatalTop5
## TORNADO EXCESSIVE HEAT FLASH FLOOD HEAT LIGHTNING
## 5633 1903 978 937 816
vInjury <- tapply(sData$INJURIES, sData$EVTYPE, sum)
vInjury <- vInjury[order(vInjury, decreasing = TRUE)]
vInjuryTop5 <- head(vInjury, 5)
vInjuryTop5
## TORNADO TSTM WIND FLOOD EXCESSIVE HEAT LIGHTNING
## 91346 6957 6789 6525 5230
Plot depicting Top 5 events causing Fatalities/Injuries
par(mfrow = c(2, 1))
barplot(vFatalTop5, xlab = "Event Type", ylab = "Number of Fatalities")
barplot(vInjuryTop5, xlab = "Event Type", ylab = "Number of Inuries")
plot of chunk unnamed-chunk-5 Results
Tornado is the single event that's most damaging. It caused 5633 incidence deaths, and 91346 incidences of injuries.
Flood is causing the most damage to the Properites. This is tune of 144.6 billion USD.
WINTER STORM HIGH WINDS is causing the most damage to the Crops. This is tune of 60 million USD.